Skip to content

[TRTLLM-12026][feat] Support MTP with block reuse enabled for hybrid models#12896

Merged
VALLIS-NERIA merged 27 commits into
NVIDIA:mainfrom
VALLIS-NERIA:user/xiweny/mamba_cache_reuse
May 10, 2026
Merged

[TRTLLM-12026][feat] Support MTP with block reuse enabled for hybrid models#12896
VALLIS-NERIA merged 27 commits into
NVIDIA:mainfrom
VALLIS-NERIA:user/xiweny/mamba_cache_reuse

Conversation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator

@VALLIS-NERIA VALLIS-NERIA commented Apr 9, 2026

Summary

  • Extend KVCacheManager C++ layer with mutable cache block ID accessors and logging for linear cache memory budget calculation
  • Refactor MambaCacheManager to support block-level cache operations (add/remove/reuse) instead of flat tensor management
  • Update scheduler to handle linear attention cache reuse alongside KV cache reuse
  • Wire through enable_cache_reuse flag for mamba cache in executor and resource manager
  • Add integration test for mamba2 hybrid model cache reuse
  • Add unit tests for KV cache manager with linear attention metadata

Test plan

  • Integration test for mamba2 hybrid model with cache reuse
  • Existing KV cache reuse tests still pass

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added speculative decoding support for Mamba models with improved cache management.
  • Improvements

    • Enhanced KV cache estimation for linear-attention layers.
    • Better error reporting and validation messaging for cache block operations.
    • Optimized cache block handling for recurrent state scenarios.
  • Tests

    • Added MTP speculative decoding test configurations and block reuse validation.

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42776 [ run ] triggered by Bot. Commit: 95923ce Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42776 [ run ] completed with state FAILURE. Commit: 95923ce
/LLM/main/L0_MergeRequest_PR pipeline #33453 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42778 [ run ] triggered by Bot. Commit: 95923ce Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42778 [ run ] completed with state FAILURE. Commit: 95923ce
/LLM/main/L0_MergeRequest_PR pipeline #33455 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42779 [ run ] triggered by Bot. Commit: 95923ce Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42779 [ run ] completed with state FAILURE. Commit: 95923ce
/LLM/main/L0_MergeRequest_PR pipeline #33456 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42782 [ run ] triggered by Bot. Commit: 95923ce Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42782 [ run ] completed with state FAILURE. Commit: 95923ce
/LLM/main/L0_MergeRequest_PR pipeline #33459 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@VALLIS-NERIA VALLIS-NERIA changed the title [None][feat] Enable mamba/linear attention cache reuse in scheduler [None][feat] MTP + cache reuse for hybrid linear models Apr 13, 2026
@VALLIS-NERIA VALLIS-NERIA force-pushed the user/xiweny/mamba_cache_reuse branch from 95923ce to 29a1d27 Compare April 14, 2026 12:46
@VALLIS-NERIA VALLIS-NERIA added the Release Blocker PRs that blocking the final release build or branching out the release branch label Apr 20, 2026
@VALLIS-NERIA VALLIS-NERIA force-pushed the user/xiweny/mamba_cache_reuse branch from f3f5eba to 51ffff3 Compare April 20, 2026 05:47
@VALLIS-NERIA VALLIS-NERIA changed the title [None][feat] MTP + cache reuse for hybrid linear models [TRTLLM-12026][feat] Support MTP with block reuse enabled for hybrid models Apr 20, 2026
@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44336 [ run ] triggered by Bot. Commit: 51ffff3 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44336 [ run ] completed with state FAILURE. Commit: 51ffff3
/LLM/main/L0_MergeRequest_PR pipeline #34754 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44352 [ run ] triggered by Bot. Commit: 51ffff3 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44352 [ run ] completed with state FAILURE. Commit: 51ffff3
/LLM/main/L0_MergeRequest_PR pipeline #34770 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run

@VALLIS-NERIA VALLIS-NERIA force-pushed the user/xiweny/mamba_cache_reuse branch from 51ffff3 to ff89391 Compare April 21, 2026 03:53
@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44611 [ run ] triggered by Bot. Commit: 3b49991 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44611 [ run ] completed with state SUCCESS. Commit: 3b49991
/LLM/main/L0_MergeRequest_PR pipeline #34995 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@VALLIS-NERIA VALLIS-NERIA removed the Release Blocker PRs that blocking the final release build or branching out the release branch label Apr 24, 2026
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47273 [ run ] triggered by Bot. Commit: 274fd8a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47273 [ run ] completed with state SUCCESS. Commit: 274fd8a
/LLM/main/L0_MergeRequest_PR pipeline #37220 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47387 [ run ] triggered by Bot. Commit: 274fd8a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47387 [ run ] completed with state SUCCESS. Commit: 274fd8a
/LLM/main/L0_MergeRequest_PR pipeline #37318 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47454 [ run ] triggered by Bot. Commit: 274fd8a Link to invocation

@VALLIS-NERIA VALLIS-NERIA enabled auto-merge (squash) May 9, 2026 00:36
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47454 [ run ] completed with state SUCCESS. Commit: 274fd8a
/LLM/main/L0_MergeRequest_PR pipeline #37376 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47496 [ run ] triggered by Bot. Commit: 274fd8a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47496 [ run ] completed with state FAILURE. Commit: 274fd8a
/LLM/main/L0_MergeRequest_PR pipeline #37416 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47527 [ run ] triggered by Bot. Commit: 7d49440 Link to invocation

Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47568 [ run ] triggered by Bot. Commit: 752f6a8 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47568 [ run ] completed with state SUCCESS. Commit: 752f6a8
/LLM/main/L0_MergeRequest_PR pipeline #37480 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@VALLIS-NERIA
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47620 [ run ] triggered by Bot. Commit: 752f6a8 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47620 [ run ] completed with state SUCCESS. Commit: 752f6a8
/LLM/main/L0_MergeRequest_PR pipeline #37525 completed with status: 'SUCCESS'

CI Report

Link to invocation

@VALLIS-NERIA VALLIS-NERIA merged commit 091ad7b into NVIDIA:main May 10, 2026
6 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026
…models (NVIDIA#12896)

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.