[https://nvbugs/6025177][fix] Fix KV cache issue by thorjohnsen · Pull Request #12673 · NVIDIA/TensorRT-LLM

thorjohnsen · 2026-04-02T00:25:29Z

@coderabbitai summary

Description

Fixes an issue that caused KV cache to become corrupted.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

coderabbitai · 2026-04-02T00:29:41Z

📝 Walkthrough

Walkthrough

The changes introduce helper functions to compute token counts for KV-cache reuse by factoring in the request's materialized context position. Three block management operations are updated to use these computed counts instead of a fixed formula when chunking tokens into blocks.

Changes

Cohort / File(s)	Summary
KV Cache Token Counting Logic `cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`	Added `getMaterializedUniqueTokenCountForReuse()` and `getUsableUniqueTokenCountForReuse()` helper functions. Updated three call sites in `BlockManager::storeContextBlocks`, `WindowBlockManager::storeBlocksForReuse`, and `WindowBlockManager::releaseBlocks` to use computed usable token counts for `chopVectorIntoBlocks()` operations.
KV Cache Manager Tests `cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`	Updated existing test cases to explicitly set `contextCurrentPosition` to prompt length before sequence removal and block operations. Added two new test cases (`KVCacheManagerStoreContextBlocksUsesMaterializedContextExtent`, `KVCacheManagerReleaseBlocksUsesMaterializedContextExtent`) to validate materialized context extent behavior during context block storage and release.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description lacks essential details explaining the KV cache corruption issue, root cause, and solution approach.	Add a detailed description explaining: (1) what KV cache corruption issue occurred, (2) why it happened, (3) how the solution fixes it, and (4) clarify the 'Test Coverage' section with specific test names.
Title check	❓ Inconclusive	The title 'Fix KV cache issue' is vague and overly broad. While it references KV cache (which is in the changeset), it uses the generic term 'issue' without conveying the specific problem being fixed (cross contamination). The PR objectives reveal the actual issue is 'KV cache cross contamination,' making the title under-specified.	Revise the title to be more specific about the actual problem, such as 'Fix KV cache cross contamination' or reference the specific root cause being addressed, rather than using generic language like 'issue'.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp (1)
2-2: ⚠️ Potential issue | 🟠 Major

Update the NVIDIA copyright year for this modified file.

This file was modified in this PR, but the header still ends at 2025.

As per coding guidelines, "Add NVIDIA copyright header to ALL new files; update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` at line 2, This
file's NVIDIA copyright header in kvCacheManagerTest.cpp still reads
"2023-2025"; update the header to include the current year by changing the end
year to 2026 (e.g., "2023-2026") wherever that SPDX copyright line appears so
the modified file's header is up to date.

🧹 Nitpick comments (1)

cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp (1)

2830-2831: Rename new test constants to guideline-compliant constant style.

kMaterializedContextLength and kReusableContextLength should follow uppercase snakecase after the k prefix.

Proposed naming update

-    auto constexpr kMaterializedContextLength = 5;
-    auto constexpr kReusableContextLength = 4;
+    auto constexpr kMATERIALIZED_CONTEXT_LENGTH = 5;
+    auto constexpr kREUSABLE_CONTEXT_LENGTH = 4;
...
-    llmRequest0->setContextCurrentPosition(kMaterializedContextLength);
+    llmRequest0->setContextCurrentPosition(kMATERIALIZED_CONTEXT_LENGTH);
...
-    EXPECT_EQ(llmRequest1->getContextCurrentPosition(), kReusableContextLength);
+    EXPECT_EQ(llmRequest1->getContextCurrentPosition(), kREUSABLE_CONTEXT_LENGTH);

As per coding guidelines, "C++ constants should use uppercase snakecase with prefix 'k': int const kDIGIT_NUM = 10;".

Also applies to: 2875-2876

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` around lines 2830
- 2831, Rename the C++ test constants to uppercase snakecase after the 'k'
prefix: change kMaterializedContextLength -> kMATERIALIZED_CONTEXT_LENGTH and
kReusableContextLength -> kREUSABLE_CONTEXT_LENGTH, and update all
references/usages accordingly (there are additional similar constants in the
same file that need the same rename). Ensure declarations and any test code that
uses these symbols (e.g., in kvCacheManagerTest.cpp) are updated to the new
identifiers to keep the build passing.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Line 2: This file's NVIDIA copyright header in kvCacheManagerTest.cpp still
reads "2023-2025"; update the header to include the current year by changing the
end year to 2026 (e.g., "2023-2026") wherever that SPDX copyright line appears
so the modified file's header is up to date.

---

Nitpick comments:
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 2830-2831: Rename the C++ test constants to uppercase snakecase
after the 'k' prefix: change kMaterializedContextLength ->
kMATERIALIZED_CONTEXT_LENGTH and kReusableContextLength ->
kREUSABLE_CONTEXT_LENGTH, and update all references/usages accordingly (there
are additional similar constants in the same file that need the same rename).
Ensure declarations and any test code that uses these symbols (e.g., in
kvCacheManagerTest.cpp) are updated to the new identifiers to keep the build
passing.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 394f9969-5941-4e60-aca5-5d659a9e5e17

📥 Commits

Reviewing files that changed from the base of the PR and between afa11de and fc45204.

📒 Files selected for processing (2)

cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

thorjohnsen · 2026-04-02T00:58:11Z

/bot run --disabled-fail-fast

thorjohnsen · 2026-04-02T01:39:50Z

/bot run --disabled-fail-fast

tensorrt-cicd · 2026-04-02T01:46:35Z

PR_Github #41292 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: --disabled-fail-fast

Link to invocation

thorjohnsen · 2026-04-02T02:00:55Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-02T02:06:56Z

PR_Github #41305 [ run ] triggered by Bot. Commit: fa41473 Link to invocation

tensorrt-cicd · 2026-04-02T07:10:55Z

PR_Github #41305 [ run ] completed with state SUCCESS. Commit: fa41473
/LLM/main/L0_MergeRequest_PR pipeline #32260 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

juney-nvidia · 2026-04-02T09:27:36Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-02T09:42:23Z

PR_Github #41408 [ run ] triggered by Bot. Commit: fa41473 Link to invocation

tensorrt-cicd · 2026-04-02T12:57:51Z

PR_Github #41408 [ run ] completed with state SUCCESS. Commit: fa41473
/LLM/main/L0_MergeRequest_PR pipeline #32345 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

SimengLiu-nv

The CPP changes looks good to me.
On top of main branch, https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/pyexecutor/resource_manager.py#L827 would prevent storing not fully completed context request. That line should be removed to enable the changes in this PR.

…ting) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

…l to removeSequence or storeContextBlocks Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

schetlur-nv · 2026-04-03T16:52:37Z

The CPP changes looks good to me. On top of main branch, https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/pyexecutor/resource_manager.py#L827 would prevent storing not fully completed context request. That line should be removed to enable the changes in this PR.

@SimengLiu-nv suggest we take that as a separate thing; let's keep this fix as narrowly tailored as possible. It will need to be cherry picked to other branches, so narrow scope of changes will simplify that process.

…alls remove_sequence Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

tensorrt-cicd · 2026-04-06T01:11:22Z

PR_Github #41880 [ run ] completed with state SUCCESS. Commit: 102b548
/LLM/main/L0_MergeRequest_PR pipeline #32745 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

thorjohnsen · 2026-04-06T02:02:16Z

/bot run

tensorrt-cicd · 2026-04-06T02:08:00Z

PR_Github #41889 [ run ] triggered by Bot. Commit: 102b548 Link to invocation

These tests depend on countReusableBlocks, getEstimatedReusableTokens, and setEstimatedReusableTokens which are not available in release/1.2.1. The original tests are available in NVIDIA#12673. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…release/1.2.1 Cherry-pick of NVIDIA#12673 onto release/1.2.1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tensorrt-cicd · 2026-04-06T04:11:14Z

PR_Github #41889 [ run ] completed with state SUCCESS. Commit: 102b548
/LLM/main/L0_MergeRequest_PR pipeline #32754 completed with status: 'SUCCESS'

CI Report

Link to invocation

…release/1.2.1 Cherry-pick of NVIDIA#12673 onto release/1.2.1. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…#12673 Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Fixes KV cache corruption caused by storing blocks with over-counted unique token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse() and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the number of tokens stored for reuse. Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility class to avoid polluting the production interface. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Fixes KV cache corruption caused by storing blocks with over-counted unique token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse() and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the number of tokens stored for reuse. Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility class to avoid polluting the production interface. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

thorjohnsen added 2 commits April 1, 2026 23:14

Apply patch

1e5b9f3

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Update kvCacheManagerTest

fc45204

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

thorjohnsen requested a review from a team as a code owner April 2, 2026 00:25

github-actions bot assigned thorjohnsen Apr 2, 2026

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

thorjohnsen added 2 commits April 1, 2026 17:45

Add comments to new anonymous functions

083874d

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

precommit run

11720af

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Merge branch 'main' into user/tjohnsen/fix_cross_contamination

fa41473

laikhtewari changed the title ~~[https://nvbugs/6025177][fix] Fix KV cache cross contamination~~ [https://nvbugs/6025177][fix] Fix KV cache issue Apr 2, 2026

juney-nvidia approved these changes Apr 2, 2026

View reviewed changes

juney-nvidia enabled auto-merge (squash) April 2, 2026 09:30

schetlur-nv requested review from SimengLiu-nv and eopXD April 2, 2026 16:21

SimengLiu-nv requested changes Apr 2, 2026

View reviewed changes

thorjohnsen added 3 commits April 3, 2026 16:16

Add method to simulate effect of prefill completion (only use for tes…

666187b

…ting) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Update unit tests that need to simulate prefill completion before cal…

c5a08d6

…l to removeSequence or storeContextBlocks Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Update python tests

e040caf

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

thorjohnsen added 2 commits April 3, 2026 16:58

Missed some cases because they call free_resources which internally c…

051991f

…alls remove_sequence Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

precommit run

cc8fbe7

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

thorjohnsen mentioned this pull request Apr 6, 2026

[https://nvbugs/6025177][fix] Cherry-pick KV cache fix from #12673 to release/1.2.1 #12768

Closed

3 tasks

thorjohnsen mentioned this pull request Apr 6, 2026

[https://nvbugs/6025177][fix] Cherry-pick KV cache fix to release/1.2.1 (from PR #12673) #12769

Closed

3 tasks

thorjohnsen mentioned this pull request Apr 6, 2026

[https://nvbugs/6025177][fix] Cherry-pick KV cache corruption fix to release/1.2.1 #12770

Open

1 task

juney-nvidia merged commit b3a7381 into NVIDIA:main Apr 6, 2026
5 checks passed

thorjohnsen mentioned this pull request Apr 6, 2026

[https://nvbugs/6025177][fix] Fix KV cache issue (cherry-pick to release/1.3.0rc5.post2) #12780

Closed

1 task

thorjohnsen mentioned this pull request Apr 6, 2026

[https://nvbugs/6025177][fix] Fix KV cache issue (cherry-pick to release/1.3.0rc5.post2) #12786

Closed

1 task

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026

[https://nvbugs/6025177][fix] Fix KV cache issue (NVIDIA#12673)

8348b0c

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026

[https://nvbugs/6025177][fix] Fix KV cache issue (NVIDIA#12673)

dbb5414

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

thorjohnsen mentioned this pull request Apr 7, 2026

[https://nvbugs/6025177][fix] Fix KV cache issue (cherry-pick to rele… #12818

Closed

1 task

yuanjingx87 pushed a commit that referenced this pull request Apr 8, 2026

[https://nvbugs/6025177][fix] Fix KV cache issue (#12673)

086f1f8

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Tabrizian pushed a commit to Tabrizian/TensorRT-LLM that referenced this pull request Apr 8, 2026

[https://nvbugs/6025177][fix] Fix KV cache issue (NVIDIA#12673)

4451249

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Tabrizian pushed a commit to Tabrizian/TensorRT-LLM that referenced this pull request Apr 9, 2026

[https://nvbugs/6025177][fix] Fix KV cache issue (NVIDIA#12673)

9fcd5f7

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Tabrizian pushed a commit to Tabrizian/TensorRT-LLM that referenced this pull request Apr 9, 2026

[https://nvbugs/6025177][fix] Fix KV cache issue (NVIDIA#12673)

ede92dc

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>

Conversation

thorjohnsen commented Apr 2, 2026 • edited by laikhtewari Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

thorjohnsen commented Apr 2, 2026

Uh oh!

thorjohnsen commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

thorjohnsen commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

juney-nvidia commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

SimengLiu-nv left a comment

Choose a reason for hiding this comment

Uh oh!

schetlur-nv commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

thorjohnsen commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

thorjohnsen commented Apr 2, 2026 •

edited by laikhtewari

Loading

coderabbitai bot commented Apr 2, 2026 •

edited

Loading