Skip to content

Fix some minor issues and provide tests for Pipeline#4365

Merged
lvhan028 merged 4 commits intoInternLM:mainfrom
windreamer:pipeline_test
Feb 28, 2026
Merged

Fix some minor issues and provide tests for Pipeline#4365
lvhan028 merged 4 commits intoInternLM:mainfrom
windreamer:pipeline_test

Conversation

@windreamer
Copy link
Collaborator

@windreamer windreamer commented Feb 24, 2026

Based on the commit history and code changes, I'll write a professional PR description in English.


PR Description

Motivation

This PR addresses several minor issues in the LMDeploy inference pipeline and engine to improve robustness, fix edge cases in token generation, and establish comprehensive test coverage for the pipeline functionality.

Modification

1. Fix kwargs forwarding in pipeline chat method
The chat method in Pipeline class was not properly forwarding keyword arguments to stream_infer, which could cause unexpected behavior when users passed custom generation parameters. This fix ensures all kwargs are correctly propagated through the call chain.

2. Fix off-by-one error when max_new_tokens=0
Previously, when max_new_tokens was set to 0, the engine would still generate 1 extra token due to incorrect boundary checking logic. This PR refactors the token limit validation in AsyncEngine to handle the zero-token case correctly, ensuring immediate termination without producing any output tokens when explicitly requested.

3. Provide default engine outputs for empty generators
Added defensive initialization of EngineOutput with INTERNAL_ENGINE_ERROR status as a fallback when async_stream_infer yields an empty generator. This prevents potential unbound variable errors and provides clearer error signaling in edge cases where the engine fails to produce outputs.

4. Add comprehensive pipeline test suite
Introduced a new test file tests/test_lmdeploy/test_pipeline.py with extensive coverage of:

  • Single and batch inference across both PyTorch and Turbomind backends
  • OpenAI-style message format handling
  • Streaming and non-streaming generation modes
  • Multi-turn conversation with session management
  • Perplexity calculation (get_ppl)
  • Edge cases including max_new_tokens=0 and varying generation configurations

The test suite uses pytest parametrization to ensure consistent behavior across both backend implementations.

@windreamer windreamer force-pushed the pipeline_test branch 5 times, most recently from 91d7db2 to fb67ef8 Compare February 27, 2026 08:18
@windreamer windreamer changed the title Add tests for Pipeline Fix some minor issue and provide tests for Pipeline Feb 27, 2026
@windreamer windreamer marked this pull request as ready for review February 27, 2026 09:03
Copilot AI review requested due to automatic review settings February 27, 2026 09:03
@windreamer windreamer changed the title Fix some minor issue and provide tests for Pipeline Fix some minor issues and provide tests for Pipeline Feb 27, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves robustness and API consistency in LMDeploy’s inference pipeline/engine, and adds an end-to-end test suite intended to cover key Pipeline behaviors and edge cases (notably max_new_tokens=0).

Changes:

  • Update AsyncEngine.generate() to early-exit when max_new_tokens resolves to 0, and add a default EngineOutput fallback when the engine yields no outputs.
  • Forward **kwargs from Pipeline.chat() into stream_infer() for better parameter propagation.
  • Add a new tests/test_lmdeploy/test_pipeline.py integration test suite for Pipeline infer/stream/chat/session/ppl behaviors, including max_new_tokens=0.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
tests/test_lmdeploy/test_pipeline.py Adds Pipeline integration tests across backends, streaming/chat flows, and max_new_tokens=0 edge case.
lmdeploy/serve/core/async_engine.py Refactors gen-config determination/early-exit and initializes default EngineOutput to handle empty engine generators.
lmdeploy/pipeline.py Fixes Pipeline.chat() kwarg propagation to stream_infer().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gen_config = self._determine_gen_config(session, input_ids, gen_config=gen_config)

if gen_config.max_new_tokens == 0:
logger.error(f'run out of tokens. session={session_id}.')
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging run out of tokens at error level is misleading when the user explicitly sets max_new_tokens=0 (a valid request). Consider lowering the level (info/debug) and/or changing the message to reflect an intentional zero-token generation.

Suggested change
logger.error(f'run out of tokens. session={session_id}.')
logger.info(f'no tokens requested (max_new_tokens=0). session={session_id}.')

Copilot uses AI. Check for mistakes.
req_stats = RequestStats(prompt_tokens=input_len) # per-request stats

# We use this as default outputs in case the async_stream_infer of the Engine yields empty generator.
outputs = EngineOutput(ResponseType.INTERNAL_ENGINE_ERROR, [])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May let us know the case that yields empty generator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except (GeneratorExit, asyncio.CancelledError) as e:
logger.info(f'[async_stream_infer] {type(e).__name__}')
self.model_inst.cancel()

If we cancel a request before the generation of the first token, I think we will trigger this case.

@lvhan028 lvhan028 merged commit e5cd040 into InternLM:main Feb 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants