fix(serve): emit all stream_chunk deltas to fix concurrent tool-call streaming by lvhan028 · Pull Request #4622 · InternLM/lmdeploy

lvhan028 · 2026-05-26T08:58:32Z

The following is test script
test_concurrent_tools.py

…streaming

Resolve api_server streaming conflict by combining multi-delta stream_chunk emission from fix-parser with OpenAI-aligned usage chunk handling from fix-parser-main. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

This PR updates streaming response parsing so a single engine chunk can emit multiple ordered deltas, addressing concurrent tool-call streaming where reasoning/content/tool-call segments could previously remain buffered.

Changes:

Changes ResponseParser.stream_chunk to return a list of parsed deltas and updates OpenAI/Anthropic streaming consumers.
Updates GPT-OSS Harmony and default parser behavior for empty/no-op chunks under the new contract.
Adjusts parser and endpoint tests to use the new list-return API and adds coverage for multi-delta ordering.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`lmdeploy/serve/parsers/response_parser.py`	Changes the base parser streaming contract and emits all parsed deltas from one chunk.
`lmdeploy/serve/parsers/_openai_harmony.py`	Wraps GPT-OSS Harmony streaming output in the new list-return format.
`lmdeploy/serve/openai/api_server.py`	Iterates over parsed deltas and attaches finish/logprob/token metadata to the last delta per engine chunk.
`lmdeploy/serve/anthropic/streaming.py`	Iterates over parsed deltas for Anthropic SSE streaming.
`lmdeploy/version.py`	Bumps version to `0.14.0a0`.
`tests/test_lmdeploy/serve/parsers/helpers.py`	Adds a helper for tests that still assert first-delta behavior.
`tests/test_lmdeploy/serve/parsers/test_qwen3_parser.py`	Updates Qwen3 parser tests and adds direct multi-delta assertions.
`tests/test_lmdeploy/serve/parsers/test_qwen3_5_parser.py`	Updates Qwen3.5 parser tests for the list-return contract.
`tests/test_lmdeploy/serve/parsers/test_llama3_parser.py`	Updates Llama3 parser tests for the list-return contract.
`tests/test_lmdeploy/serve/parsers/test_interns1_parser.py`	Updates InternS1 parser tests for the list-return contract.
`tests/test_lmdeploy/serve/parsers/test_gpt_oss_parser.py`	Updates GPT-OSS parser tests for the list-return contract.
`tests/test_lmdeploy/serve/parsers/test_glm47_parser.py`	Updates GLM4.7 parser tests for the list-return contract.
`tests/test_lmdeploy/serve/parsers/test_deepseek_v3_parser.py`	Updates DeepSeek V3 parser tests for the list-return contract.
`tests/test_lmdeploy/serve/anthropic/test_endpoints.py`	Updates fake Anthropic parsers to return lists of deltas.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lvhan028 · 2026-05-26T09:06:22Z

cc @zhulinJulia24

Copilot

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 1 comment.

…sage

RunningLeon

LGTM

lvhan028 and others added 7 commits May 21, 2026 12:13

bump version v0.14.0a0

8c7668d

solution 1: drain _queued_deltas

5c75244

fix(serve): emit all stream_chunk deltas to fix concurrent tool-call …

2226ecf

…streaming

fix(serve): emit output_ids once on last stream delta per engine step

04282ae

fix(serve): delta_token_ids failed to emit on partial protocol tags

a2d58ff

fix

beda60a

Merge branch 'fix-parser' into fix-parser-main

f589c84

Resolve api_server streaming conflict by combining multi-delta stream_chunk emission from fix-parser with OpenAI-aligned usage chunk handling from fix-parser-main. Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot AI review requested due to automatic review settings May 26, 2026 08:58

Copilot started reviewing on behalf of lvhan028 May 26, 2026 08:58 View session

revert version

9276676

lvhan028 requested review from RunningLeon and irexyc May 26, 2026 09:00

lvhan028 added the Bug:P0 label May 26, 2026

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread lmdeploy/serve/anthropic/streaming.py Outdated

Comment thread lmdeploy/serve/anthropic/streaming.py Outdated

fix according to reviewer comment

22a271e

lvhan028 requested a review from Copilot May 26, 2026 10:05

Copilot started reviewing on behalf of lvhan028 May 26, 2026 10:05 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread lmdeploy/serve/parsers/response_parser.py Outdated

fix(serve): tighten stream_chunk return type to non-optional DeltaMes…

485ef6c

…sage

RunningLeon approved these changes May 27, 2026

View reviewed changes

lvhan028 merged commit b95cd7d into InternLM:main May 27, 2026
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(serve): emit all stream_chunk deltas to fix concurrent tool-call streaming#4622

fix(serve): emit all stream_chunk deltas to fix concurrent tool-call streaming#4622
lvhan028 merged 10 commits into
InternLM:mainfrom
lvhan028:fix-parser-main

lvhan028 commented May 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

lvhan028 commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

RunningLeon left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lvhan028 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

lvhan028 commented May 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

RunningLeon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lvhan028 commented May 26, 2026 •

edited

Loading