Fix some minor issues and provide tests for Pipeline by windreamer · Pull Request #4365 · InternLM/lmdeploy

windreamer · 2026-02-24T07:48:15Z

Based on the commit history and code changes, I'll write a professional PR description in English.

PR Description

Motivation

This PR addresses several minor issues in the LMDeploy inference pipeline and engine to improve robustness, fix edge cases in token generation, and establish comprehensive test coverage for the pipeline functionality.

Modification

1. Fix kwargs forwarding in pipeline chat method
The chat method in Pipeline class was not properly forwarding keyword arguments to stream_infer, which could cause unexpected behavior when users passed custom generation parameters. This fix ensures all kwargs are correctly propagated through the call chain.

2. Fix off-by-one error when max_new_tokens=0
Previously, when max_new_tokens was set to 0, the engine would still generate 1 extra token due to incorrect boundary checking logic. This PR refactors the token limit validation in AsyncEngine to handle the zero-token case correctly, ensuring immediate termination without producing any output tokens when explicitly requested.

3. Provide default engine outputs for empty generators
Added defensive initialization of EngineOutput with INTERNAL_ENGINE_ERROR status as a fallback when async_stream_infer yields an empty generator. This prevents potential unbound variable errors and provides clearer error signaling in edge cases where the engine fails to produce outputs.

4. Add comprehensive pipeline test suite
Introduced a new test file tests/test_lmdeploy/test_pipeline.py with extensive coverage of:

Single and batch inference across both PyTorch and Turbomind backends
OpenAI-style message format handling
Streaming and non-streaming generation modes
Multi-turn conversation with session management
Perplexity calculation (get_ppl)
Edge cases including max_new_tokens=0 and varying generation configurations

The test suite uses pytest parametrization to ensure consistent behavior across both backend implementations.

Copilot

Pull request overview

This PR improves robustness and API consistency in LMDeploy’s inference pipeline/engine, and adds an end-to-end test suite intended to cover key Pipeline behaviors and edge cases (notably max_new_tokens=0).

Changes:

Update AsyncEngine.generate() to early-exit when max_new_tokens resolves to 0, and add a default EngineOutput fallback when the engine yields no outputs.
Forward **kwargs from Pipeline.chat() into stream_infer() for better parameter propagation.
Add a new tests/test_lmdeploy/test_pipeline.py integration test suite for Pipeline infer/stream/chat/session/ppl behaviors, including max_new_tokens=0.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File	Description
tests/test_lmdeploy/test_pipeline.py	Adds Pipeline integration tests across backends, streaming/chat flows, and `max_new_tokens=0` edge case.
lmdeploy/serve/core/async_engine.py	Refactors gen-config determination/early-exit and initializes default `EngineOutput` to handle empty engine generators.
lmdeploy/pipeline.py	Fixes `Pipeline.chat()` kwarg propagation to `stream_infer()`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_lmdeploy/test_pipeline.py

lmdeploy/serve/core/async_engine.py

lmdeploy/pipeline.py

tests/test_lmdeploy/test_pipeline.py

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/test_lmdeploy/test_pipeline.py

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/serve/core/async_engine.py

Copilot · 2026-02-27T09:33:29Z

lmdeploy/serve/core/async_engine.py

+        gen_config = self._determine_gen_config(session, input_ids, gen_config=gen_config)
+
+        if gen_config.max_new_tokens == 0:
+            logger.error(f'run out of tokens. session={session_id}.')


Logging run out of tokens at error level is misleading when the user explicitly sets max_new_tokens=0 (a valid request). Consider lowering the level (info/debug) and/or changing the message to reflect an intentional zero-token generation.

Suggested change

logger.error(f'run out of tokens. session={session_id}.')

logger.info(f'no tokens requested (max_new_tokens=0). session={session_id}.')

tests/test_lmdeploy/test_pipeline.py

lvhan028 · 2026-02-27T09:50:10Z

lmdeploy/serve/core/async_engine.py

                req_stats = RequestStats(prompt_tokens=input_len)  # per-request stats
+
+                # We use this as default outputs in case the async_stream_infer of the Engine yields empty generator.
+                outputs = EngineOutput(ResponseType.INTERNAL_ENGINE_ERROR, [])


May let us know the case that yields empty generator

lmdeploy/lmdeploy/turbomind/turbomind.py

Lines 784 to 786 in 59e8944

except (GeneratorExit, asyncio.CancelledError) as e:

logger.info(f'[async_stream_infer] {type(e).__name__}')

self.model_inst.cancel()

If we cancel a request before the generation of the first token, I think we will trigger this case.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

windreamer force-pushed the pipeline_test branch 5 times, most recently from 91d7db2 to fb67ef8 Compare February 27, 2026 08:18

windreamer changed the title ~~Add tests for Pipeline~~ Fix some minor issue and provide tests for Pipeline Feb 27, 2026

windreamer marked this pull request as ready for review February 27, 2026 09:03

Copilot AI review requested due to automatic review settings February 27, 2026 09:03

windreamer changed the title ~~Fix some minor issue and provide tests for Pipeline~~ Fix some minor issues and provide tests for Pipeline Feb 27, 2026

Copilot started reviewing on behalf of windreamer February 27, 2026 09:03 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

windreamer requested a review from Copilot February 27, 2026 09:15

Copilot started reviewing on behalf of windreamer February 27, 2026 09:15 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

windreamer force-pushed the pipeline_test branch 2 times, most recently from 553f0aa to 3d325fa Compare February 27, 2026 09:26

windreamer requested a review from Copilot February 27, 2026 09:26

Copilot started reviewing on behalf of windreamer February 27, 2026 09:26 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

windreamer force-pushed the pipeline_test branch from 3d325fa to 86e3be7 Compare February 27, 2026 09:37

lvhan028 reviewed Feb 27, 2026

View reviewed changes

windreamer force-pushed the pipeline_test branch from 86e3be7 to b86d027 Compare February 27, 2026 10:08

windreamer and others added 4 commits February 28, 2026 10:09

test: add tests for pipeline class

e6d696e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix: forward kwargs of chat to stream_infer

e9e9c1a

fix: fix 1 extra token when max_new_tokens=0

bcfc746

fix: provide default engine outputs

6b3181f

windreamer force-pushed the pipeline_test branch from fc5f4a7 to 6b3181f Compare February 28, 2026 02:09

lvhan028 mentioned this pull request Feb 28, 2026

bump version to v0.12.2 #4378

Open

8 tasks

lvhan028 added the Bug:P1 label Feb 28, 2026

lvhan028 approved these changes Feb 28, 2026

View reviewed changes

lvhan028 merged commit e5cd040 into InternLM:main Feb 28, 2026
5 checks passed

	logger.error(f'run out of tokens. session={session_id}.')
	logger.info(f'no tokens requested (max_new_tokens=0). session={session_id}.')

	except (GeneratorExit, asyncio.CancelledError) as e:
	logger.info(f'[async_stream_infer] {type(e).__name__}')
	self.model_inst.cancel()

Conversation

windreamer commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Motivation

Modification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lvhan028 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

windreamer Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

windreamer commented Feb 24, 2026 •

edited

Loading