Skip to content

[None][fix] Avoid pre-window SWA over-allocation on disagg gen server#13845

Merged
Shixiaowei02 merged 1 commit into
NVIDIA:feat/deepseek_v4from
Shixiaowei02:feat/deepseek_v4_mem
May 7, 2026
Merged

[None][fix] Avoid pre-window SWA over-allocation on disagg gen server#13845
Shixiaowei02 merged 1 commit into
NVIDIA:feat/deepseek_v4from
Shixiaowei02:feat/deepseek_v4_mem

Conversation

@Shixiaowei02
Copy link
Copy Markdown
Collaborator

@Shixiaowei02 Shixiaowei02 commented May 7, 2026

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@Shixiaowei02 Shixiaowei02 requested a review from chuangz0 May 7, 2026 08:57
@Shixiaowei02 Shixiaowei02 marked this pull request as ready for review May 7, 2026 09:33
@Shixiaowei02 Shixiaowei02 requested review from a team as code owners May 7, 2026 09:33
@Shixiaowei02 Shixiaowei02 requested review from joyang-nv and removed request for a team May 7, 2026 09:33
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
@Shixiaowei02 Shixiaowei02 force-pushed the feat/deepseek_v4_mem branch from 9a3f789 to a6a74b3 Compare May 7, 2026 09:33
@Shixiaowei02
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47183 [ run ] triggered by Bot. Commit: a6a74b3 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47183 [ run ] completed with state SUCCESS. Commit: a6a74b3
/LLM/main/L0_MergeRequest_PR pipeline #37139 completed with status: 'SUCCESS'

CI Report

Link to invocation

@Shixiaowei02 Shixiaowei02 merged commit 67b0b17 into NVIDIA:feat/deepseek_v4 May 7, 2026
6 of 7 checks passed
@Shixiaowei02 Shixiaowei02 deleted the feat/deepseek_v4_mem branch May 7, 2026 11:06
lfr-0531 pushed a commit that referenced this pull request May 7, 2026
…#13845)

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
lfr-0531 pushed a commit that referenced this pull request May 14, 2026
…#13845)

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Shixiaowei02 added a commit to Shixiaowei02/TensorRT-LLM that referenced this pull request May 27, 2026
For a disagg gen init request entering the V2 scheduler,
context_current_position is still 0 — it is only advanced to
prompt_len in _prepare_disagg_gen_transmission_complete after KV
transfer completes. Passing history_length=context_current_position
degenerates to history_length=0, the SWA stale range collapses to an
empty interval, and pre-window blocks are still allocated before KV
transfer — defeating the over-allocation fix from NVIDIA#13845/NVIDIA#14377.

Use req.prompt_len, which is known statically at scheduler time and
correctly conveys that the entire prompt is history.

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants