Skip to content

[https://nvbugs/6035425][fix] Fix host memory usage regression with spec dec#13130

Merged
mikeiovine merged 9 commits intoNVIDIA:mainfrom
mikeiovine:fix-host-budget
Apr 30, 2026
Merged

[https://nvbugs/6035425][fix] Fix host memory usage regression with spec dec#13130
mikeiovine merged 9 commits intoNVIDIA:mainfrom
mikeiovine:fix-host-budget

Conversation

@mikeiovine
Copy link
Copy Markdown
Collaborator

@mikeiovine mikeiovine commented Apr 16, 2026

Description

A regression was introduced that doubled KV cache GPU memory usage when spec dec was enabled. This was fixed on the GPU side by #12188. This PR applies the same fix to the host side memory budget when KV cache offloading is enabled.

Test Coverage

Added new test.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

  • Tests

    • Added comprehensive test suite validating KV cache budget allocation between target and draft execution managers for both GPU and host memory configurations.
  • Improvements

    • Refactored KV cache budget splitting logic to proportionally allocate both GPU and host memory resources based on per-token requirements.

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
@mikeiovine mikeiovine requested a review from yizhang-nv April 16, 2026 16:48
@mikeiovine mikeiovine requested a review from a team as a code owner April 16, 2026 16:48
@mikeiovine mikeiovine requested a review from joyang-nv April 16, 2026 16:48
@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

Refactored _split_kv_cache_budget_for_draft in PyExecutor to split both GPU and host KV cache budgets proportionally between target and draft managers based on per-token KV size ratios. Added comprehensive test coverage validating GPU and host budget splitting across multiple scenarios, and registered the new test in the l0_a10 test suite.

Changes

Cohort / File(s) Summary
KV Cache Budget Splitting Logic
tensorrt_llm/_torch/pyexecutor/_util.py
Refactored _split_kv_cache_budget_for_draft to compute a draft_ratio from per-token KV sizes and apply it proportionally to both max_gpu_total_bytes and host_cache_size budgets. Added host-side budget splitting with in-place mutation of target config and assignment to draft config, plus corresponding log message.
Test Suite Configuration
tests/integration/test_lists/test-db/l0_a10.yml
Added new test module test_kv_cache_budget_split.py to the l0_a10 GPU test set for the PyTorch executor stage.
KV Cache Budget Split Tests
tests/unittest/_torch/executor/test_kv_cache_budget_split.py
New test module validating KvCacheCreator._split_kv_cache_budget_for_draft() with parameterized test cases covering GPU and host budget proportional splits, edge cases (zero budgets, draft equals target), regression tests (no host duplication), and consistency checks ensuring budgets sum back to originals.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The PR description explains the issue (regression doubled KV cache memory with spec dec), the solution (applying same fix to host-side budget), and test coverage. All required template sections are addressed.
Title check ✅ Passed The PR title '[fix] Fix host memory usage regression with spec dec' clearly describes the main change: addressing a host memory regression in speculative decoding, which aligns with the actual changes across the implementation and test files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@mikeiovine mikeiovine changed the title [https://nvbugs/6035425][fix] Fix host memory usage regression with spec dec [None][fix] Fix host memory usage regression with spec dec Apr 16, 2026
@mikeiovine mikeiovine changed the title [None][fix] Fix host memory usage regression with spec dec [https://nvbugs/6035425][fix] Fix host memory usage regression with spec dec Apr 16, 2026
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43814 [ run ] triggered by Bot. Commit: ce1c9c2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43814 [ run ] completed with state FAILURE. Commit: ce1c9c2
/LLM/main/L0_MergeRequest_PR pipeline #34290 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44488 [ run ] triggered by Bot. Commit: aadee37 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44488 [ run ] completed with state SUCCESS. Commit: aadee37
/LLM/main/L0_MergeRequest_PR pipeline #34888 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44513 [ run ] triggered by Bot. Commit: aadee37 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44513 [ run ] completed with state SUCCESS. Commit: aadee37
/LLM/main/L0_MergeRequest_PR pipeline #34912 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44743 [ run ] triggered by Bot. Commit: c545b1b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44743 [ run ] completed with state SUCCESS. Commit: c545b1b
/LLM/main/L0_MergeRequest_PR pipeline #35104 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44981 [ run ] triggered by Bot. Commit: d1a90d4 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44981 [ run ] completed with state SUCCESS. Commit: d1a90d4
/LLM/main/L0_MergeRequest_PR pipeline #35304 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45200 [ run ] triggered by Bot. Commit: 1258bff Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45230 [ run ] triggered by Bot. Commit: 1258bff Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45230 [ run ] completed with state SUCCESS. Commit: 1258bff
/LLM/main/L0_MergeRequest_PR pipeline #35491 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45742 [ run ] triggered by Bot. Commit: 916ade7 Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45786 [ run ] triggered by Bot. Commit: 1acd5f3 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45786 [ run ] completed with state FAILURE. Commit: 1acd5f3
/LLM/main/L0_MergeRequest_PR pipeline #35976 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46157 [ run ] triggered by Bot. Commit: f2713dd Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46200 [ run ] triggered by Bot. Commit: 61a8c76 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46200 [ run ] completed with state SUCCESS. Commit: 61a8c76
/LLM/main/L0_MergeRequest_PR pipeline #36313 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@mikeiovine
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46399 [ run ] triggered by Bot. Commit: 61a8c76 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46399 [ run ] completed with state SUCCESS. Commit: 61a8c76
/LLM/main/L0_MergeRequest_PR pipeline #36478 completed with status: 'SUCCESS'

CI Report

Link to invocation

@mikeiovine mikeiovine merged commit b857ee8 into NVIDIA:main Apr 30, 2026
6 checks passed
@mikeiovine mikeiovine deleted the fix-host-budget branch April 30, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants