[None][fix] Hide trailing EOS from generated text#14336
Conversation
📝 WalkthroughWalkthroughThis PR modifies EOS token handling in TensorRT-LLM's generation result processing. It documents that ChangesEOS Token Handling in Generation Results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
b4b6ed7 to
51b9c06
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #49351 [ run ] triggered by Bot. Commit: |
Superjomn
left a comment
There was a problem hiding this comment.
Please add some unittests for it, or it tends to be broken in the following iterations.
|
PR_Github #49351 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #49567 [ run ] triggered by Bot. Commit: |
|
PR_Github #49567 [ run ] completed with state
|
Summary
When a request stopped on the natural EOS (
FinishReason.END_ID), TRT-LLM previously relied solely onskip_special_tokens=Trueto keep the EOS string out ofoutput.text. Whenever a caller setskip_special_tokens=False(e.g. some tool / structured-output parsers that need raw special tokens), or whenever the tokenizer did not flag the EOS id as a special token, the EOS string leaked into the OpenAI response text. vLLM does not exhibit this because it always slices the trailing stop token from the detokenized text. Observed on Nemotron-Super-V3 (EOS =<|im_end|>, id 11).Before vs After
Repro (programmatic; the example file is untouched):
Before fix —
<|im_end|>leaks intooutput.text:After fix —
output.textis clean,output.token_idsis unchanged:Invariants preserved:
output.token_ids[-1] == sampling_params.end_idwhenever the request ended on natural EOS — useful for distinguishing "ended on EOS" (stop_reason is None+ tail id matchesend_id) from "ended on a configured stop_token_id" (stop_reasonset). Matches vLLM.len(output.token_ids) == len(output.logprobs)for callers that requested logprobs.Default-path control (the unchanged case where the EOS happens to be a tokenizer-marked special token AND
skip_special_tokens=True):— i.e. no regression on the previously-working default path; the fix only changes behavior on the path that was leaking.
Files changed
tensorrt_llm/executor/result.pyonly (+19 / −3 lines):GenerationResultBase._handle_sequenceEND_ID branch: comment only (no behavior change here).DetokenizedGenerationResultBase._handle_response: before eachtokenizer.decode/decode_incrementallycall, exclude the trailingend_idfrom the slice when:finish_reason == 'stop'stop_reason is Nonenot include_stop_str_in_outputtoken_idsortoken_ids_diff) ends withend_idoutput.token_idsis left intact.The pre-existing
STOP_WORDSbranch already strips its matched ids fromoutput.token_idsupstream, so the new slice is a no-op for that path (itsstop_reasonis non-None, which gates the check off).Test plan
skip_special_tokens=True(default) andskip_special_tokens=False. Before fix: EOS leaks into text underFalse. After fix: no leak under either setting;output.token_idsends withend_idin both cases.textandtoken_ids.test_generate_with_stop_words(tests/unittest/llmapi/test_llm.py:835) covers theend_id=stop_idpath withsimilar(..., threshold=0.8). The fix only tightens the match (text no longer contains the trailing end_id glyph).Summary by CodeRabbit
Bug Fixes
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.