[https://nvbugs/6035425][fix] Fix host memory usage regression with spec dec by mikeiovine · Pull Request #13130 · NVIDIA/TensorRT-LLM

mikeiovine · 2026-04-16T16:48:12Z

Description

A regression was introduced that doubled KV cache GPU memory usage when spec dec was enabled. This was fixed on the GPU side by #12188. This PR applies the same fix to the host side memory budget when KV cache offloading is enabled.

Test Coverage

Added new test.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Tests
- Added comprehensive test suite validating KV cache budget allocation between target and draft execution managers for both GPU and host memory configurations.
Improvements
- Refactored KV cache budget splitting logic to proportionally allocate both GPU and host memory resources based on per-token requirements.

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

mikeiovine · 2026-04-16T16:48:23Z

/bot run

coderabbitai · 2026-04-16T16:52:35Z

📝 Walkthrough

Walkthrough

Refactored _split_kv_cache_budget_for_draft in PyExecutor to split both GPU and host KV cache budgets proportionally between target and draft managers based on per-token KV size ratios. Added comprehensive test coverage validating GPU and host budget splitting across multiple scenarios, and registered the new test in the l0_a10 test suite.

Changes

Cohort / File(s)	Summary
KV Cache Budget Splitting Logic `tensorrt_llm/_torch/pyexecutor/_util.py`	Refactored `_split_kv_cache_budget_for_draft` to compute a `draft_ratio` from per-token KV sizes and apply it proportionally to both `max_gpu_total_bytes` and `host_cache_size` budgets. Added host-side budget splitting with in-place mutation of target config and assignment to draft config, plus corresponding log message.
Test Suite Configuration `tests/integration/test_lists/test-db/l0_a10.yml`	Added new test module `test_kv_cache_budget_split.py` to the `l0_a10` GPU test set for the PyTorch executor stage.
KV Cache Budget Split Tests `tests/unittest/_torch/executor/test_kv_cache_budget_split.py`	New test module validating `KvCacheCreator._split_kv_cache_budget_for_draft()` with parameterized test cases covering GPU and host budget proportional splits, edge cases (zero budgets, draft equals target), regression tests (no host duplication), and consistency checks ensuring budgets sum back to originals.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description explains the issue (regression doubled KV cache memory with spec dec), the solution (applying same fix to host-side budget), and test coverage. All required template sections are addressed.
Title check	✅ Passed	The PR title '[fix] Fix host memory usage regression with spec dec' clearly describes the main change: addressing a host memory regression in speculative decoding, which aligns with the actual changes across the implementation and test files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2026-04-16T17:08:29Z

PR_Github #43814 [ run ] triggered by Bot. Commit: ce1c9c2 Link to invocation

tensorrt-cicd · 2026-04-17T01:22:55Z

PR_Github #43814 [ run ] completed with state FAILURE. Commit: ce1c9c2
/LLM/main/L0_MergeRequest_PR pipeline #34290 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

mikeiovine · 2026-04-17T15:35:15Z

/bot run

mikeiovine · 2026-04-20T17:41:31Z

/bot run

tensorrt-cicd · 2026-04-20T17:47:27Z

PR_Github #44488 [ run ] triggered by Bot. Commit: aadee37 Link to invocation

tensorrt-cicd · 2026-04-20T20:37:02Z

PR_Github #44488 [ run ] completed with state SUCCESS. Commit: aadee37
/LLM/main/L0_MergeRequest_PR pipeline #34888 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

mikeiovine · 2026-04-20T20:47:44Z

/bot run

tensorrt-cicd · 2026-04-20T20:53:18Z

PR_Github #44513 [ run ] triggered by Bot. Commit: aadee37 Link to invocation

tensorrt-cicd · 2026-04-20T21:45:15Z

PR_Github #44513 [ run ] completed with state SUCCESS. Commit: aadee37
/LLM/main/L0_MergeRequest_PR pipeline #34912 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

mikeiovine · 2026-04-21T14:50:58Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-21T14:58:25Z

PR_Github #44743 [ run ] triggered by Bot. Commit: c545b1b Link to invocation

tensorrt-cicd · 2026-04-22T09:52:51Z

PR_Github #44743 [ run ] completed with state SUCCESS. Commit: c545b1b
/LLM/main/L0_MergeRequest_PR pipeline #35104 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

mikeiovine · 2026-04-22T15:47:58Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-22T15:54:02Z

PR_Github #44981 [ run ] triggered by Bot. Commit: d1a90d4 Link to invocation

tensorrt-cicd · 2026-04-23T11:13:00Z

PR_Github #44981 [ run ] completed with state SUCCESS. Commit: d1a90d4
/LLM/main/L0_MergeRequest_PR pipeline #35304 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

mikeiovine · 2026-04-23T14:50:59Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-23T14:58:31Z

PR_Github #45200 [ run ] triggered by Bot. Commit: 1258bff Link to invocation

mikeiovine · 2026-04-23T17:10:07Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-23T17:17:22Z

PR_Github #45230 [ run ] triggered by Bot. Commit: 1258bff Link to invocation

tensorrt-cicd · 2026-04-24T09:14:49Z

PR_Github #45230 [ run ] completed with state SUCCESS. Commit: 1258bff
/LLM/main/L0_MergeRequest_PR pipeline #35491 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

mikeiovine · 2026-04-27T14:37:19Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-27T14:43:33Z

PR_Github #45742 [ run ] triggered by Bot. Commit: 916ade7 Link to invocation

mikeiovine · 2026-04-27T20:46:50Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-27T20:52:54Z

PR_Github #45786 [ run ] triggered by Bot. Commit: 1acd5f3 Link to invocation

tensorrt-cicd · 2026-04-28T20:41:58Z

PR_Github #45786 [ run ] completed with state FAILURE. Commit: 1acd5f3
/LLM/main/L0_MergeRequest_PR pipeline #35976 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

mikeiovine · 2026-04-29T15:13:35Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-29T15:21:31Z

PR_Github #46157 [ run ] triggered by Bot. Commit: f2713dd Link to invocation

mikeiovine · 2026-04-29T17:48:20Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-29T17:54:35Z

PR_Github #46200 [ run ] triggered by Bot. Commit: 61a8c76 Link to invocation

tensorrt-cicd · 2026-04-30T05:38:14Z

PR_Github #46200 [ run ] completed with state SUCCESS. Commit: 61a8c76
/LLM/main/L0_MergeRequest_PR pipeline #36313 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

mikeiovine · 2026-04-30T15:26:53Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-30T15:34:43Z

PR_Github #46399 [ run ] triggered by Bot. Commit: 61a8c76 Link to invocation

tensorrt-cicd · 2026-04-30T16:34:24Z

PR_Github #46399 [ run ] completed with state SUCCESS. Commit: 61a8c76
/LLM/main/L0_MergeRequest_PR pipeline #36478 completed with status: 'SUCCESS'

CI Report

Link to invocation

Fix host budget doubling

ce1c9c2

Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

mikeiovine requested a review from yizhang-nv April 16, 2026 16:48

mikeiovine requested a review from a team as a code owner April 16, 2026 16:48

mikeiovine requested a review from joyang-nv April 16, 2026 16:48

github-actions Bot assigned mikeiovine Apr 16, 2026

mikeiovine changed the title ~~[https://nvbugs/6035425][fix] Fix host memory usage regression with spec dec~~ [None][fix] Fix host memory usage regression with spec dec Apr 16, 2026

mikeiovine changed the title ~~[None][fix] Fix host memory usage regression with spec dec~~ [https://nvbugs/6035425][fix] Fix host memory usage regression with spec dec Apr 16, 2026

Merge branch 'main' into fix-host-budget

aadee37

Merge branch 'main' into fix-host-budget

c545b1b

Merge branch 'main' into fix-host-budget

d1a90d4

Merge branch 'main' into fix-host-budget

1258bff

Merge branch 'main' into fix-host-budget

916ade7

Merge branch 'main' into fix-host-budget

1acd5f3

Merge branch 'main' into fix-host-budget

f2713dd

Merge branch 'main' into fix-host-budget

61a8c76

Tabrizian approved these changes Apr 30, 2026

View reviewed changes

mikeiovine merged commit b857ee8 into NVIDIA:main Apr 30, 2026
6 checks passed

mikeiovine deleted the fix-host-budget branch April 30, 2026 17:40

Conversation

mikeiovine commented Apr 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Summary by CodeRabbit

Uh oh!

mikeiovine commented Apr 16, 2026

Uh oh!

coderabbitai Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

mikeiovine commented Apr 17, 2026

Uh oh!

mikeiovine commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

mikeiovine commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

mikeiovine commented Apr 21, 2026

Uh oh!

tensorrt-cicd commented Apr 21, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

mikeiovine commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

mikeiovine commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

mikeiovine commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

mikeiovine commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

mikeiovine commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

mikeiovine commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

mikeiovine commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

mikeiovine commented Apr 30, 2026

Uh oh!

mikeiovine commented Apr 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading