Skip to content

[https://nvbugs/6025177][fix] Fix KV cache issue#12673

Merged
juney-nvidia merged 22 commits intoNVIDIA:mainfrom
thorjohnsen:user/tjohnsen/fix_cross_contamination
Apr 6, 2026
Merged

[https://nvbugs/6025177][fix] Fix KV cache issue#12673
juney-nvidia merged 22 commits intoNVIDIA:mainfrom
thorjohnsen:user/tjohnsen/fix_cross_contamination

Conversation

@thorjohnsen
Copy link
Copy Markdown
Collaborator

@thorjohnsen thorjohnsen commented Apr 2, 2026

@coderabbitai summary

Description

Fixes an issue that caused KV cache to become corrupted.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
@thorjohnsen thorjohnsen requested a review from a team as a code owner April 2, 2026 00:25
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

The changes introduce helper functions to compute token counts for KV-cache reuse by factoring in the request's materialized context position. Three block management operations are updated to use these computed counts instead of a fixed formula when chunking tokens into blocks.

Changes

Cohort / File(s) Summary
KV Cache Token Counting Logic
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
Added getMaterializedUniqueTokenCountForReuse() and getUsableUniqueTokenCountForReuse() helper functions. Updated three call sites in BlockManager::storeContextBlocks, WindowBlockManager::storeBlocksForReuse, and WindowBlockManager::releaseBlocks to use computed usable token counts for chopVectorIntoBlocks() operations.
KV Cache Manager Tests
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp
Updated existing test cases to explicitly set contextCurrentPosition to prompt length before sequence removal and block operations. Added two new test cases (KVCacheManagerStoreContextBlocksUsesMaterializedContextExtent, KVCacheManagerReleaseBlocksUsesMaterializedContextExtent) to validate materialized context extent behavior during context block storage and release.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description lacks essential details explaining the KV cache corruption issue, root cause, and solution approach. Add a detailed description explaining: (1) what KV cache corruption issue occurred, (2) why it happened, (3) how the solution fixes it, and (4) clarify the 'Test Coverage' section with specific test names.
Title check ❓ Inconclusive The title 'Fix KV cache issue' is vague and overly broad. While it references KV cache (which is in the changeset), it uses the generic term 'issue' without conveying the specific problem being fixed (cross contamination). The PR objectives reveal the actual issue is 'KV cache cross contamination,' making the title under-specified. Revise the title to be more specific about the actual problem, such as 'Fix KV cache cross contamination' or reference the specific root cause being addressed, rather than using generic language like 'issue'.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp (1)

2-2: ⚠️ Potential issue | 🟠 Major

Update the NVIDIA copyright year for this modified file.

This file was modified in this PR, but the header still ends at 2025.

As per coding guidelines, "Add NVIDIA copyright header to ALL new files; update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` at line 2, This
file's NVIDIA copyright header in kvCacheManagerTest.cpp still reads
"2023-2025"; update the header to include the current year by changing the end
year to 2026 (e.g., "2023-2026") wherever that SPDX copyright line appears so
the modified file's header is up to date.
🧹 Nitpick comments (1)
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp (1)

2830-2831: Rename new test constants to guideline-compliant constant style.

kMaterializedContextLength and kReusableContextLength should follow uppercase snakecase after the k prefix.

Proposed naming update
-    auto constexpr kMaterializedContextLength = 5;
-    auto constexpr kReusableContextLength = 4;
+    auto constexpr kMATERIALIZED_CONTEXT_LENGTH = 5;
+    auto constexpr kREUSABLE_CONTEXT_LENGTH = 4;
...
-    llmRequest0->setContextCurrentPosition(kMaterializedContextLength);
+    llmRequest0->setContextCurrentPosition(kMATERIALIZED_CONTEXT_LENGTH);
...
-    EXPECT_EQ(llmRequest1->getContextCurrentPosition(), kReusableContextLength);
+    EXPECT_EQ(llmRequest1->getContextCurrentPosition(), kREUSABLE_CONTEXT_LENGTH);

As per coding guidelines, "C++ constants should use uppercase snakecase with prefix 'k': int const kDIGIT_NUM = 10;".

Also applies to: 2875-2876

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` around lines 2830
- 2831, Rename the C++ test constants to uppercase snakecase after the 'k'
prefix: change kMaterializedContextLength -> kMATERIALIZED_CONTEXT_LENGTH and
kReusableContextLength -> kREUSABLE_CONTEXT_LENGTH, and update all
references/usages accordingly (there are additional similar constants in the
same file that need the same rename). Ensure declarations and any test code that
uses these symbols (e.g., in kvCacheManagerTest.cpp) are updated to the new
identifiers to keep the build passing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Line 2: This file's NVIDIA copyright header in kvCacheManagerTest.cpp still
reads "2023-2025"; update the header to include the current year by changing the
end year to 2026 (e.g., "2023-2026") wherever that SPDX copyright line appears
so the modified file's header is up to date.

---

Nitpick comments:
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 2830-2831: Rename the C++ test constants to uppercase snakecase
after the 'k' prefix: change kMaterializedContextLength ->
kMATERIALIZED_CONTEXT_LENGTH and kReusableContextLength ->
kREUSABLE_CONTEXT_LENGTH, and update all references/usages accordingly (there
are additional similar constants in the same file that need the same rename).
Ensure declarations and any test code that uses these symbols (e.g., in
kvCacheManagerTest.cpp) are updated to the new identifiers to keep the build
passing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 394f9969-5941-4e60-aca5-5d659a9e5e17

📥 Commits

Reviewing files that changed from the base of the PR and between afa11de and fc45204.

📒 Files selected for processing (2)
  • cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
  • cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
@thorjohnsen
Copy link
Copy Markdown
Collaborator Author

/bot run --disabled-fail-fast

@thorjohnsen
Copy link
Copy Markdown
Collaborator Author

/bot run --disabled-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41292 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: --disabled-fail-fast

Link to invocation

@thorjohnsen
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41305 [ run ] triggered by Bot. Commit: fa41473 Link to invocation

@laikhtewari laikhtewari changed the title [https://nvbugs/6025177][fix] Fix KV cache cross contamination [https://nvbugs/6025177][fix] Fix KV cache issue Apr 2, 2026
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41305 [ run ] completed with state SUCCESS. Commit: fa41473
/LLM/main/L0_MergeRequest_PR pipeline #32260 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@juney-nvidia
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@juney-nvidia juney-nvidia enabled auto-merge (squash) April 2, 2026 09:30
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41408 [ run ] triggered by Bot. Commit: fa41473 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41408 [ run ] completed with state SUCCESS. Commit: fa41473
/LLM/main/L0_MergeRequest_PR pipeline #32345 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Copy link
Copy Markdown
Collaborator

@SimengLiu-nv SimengLiu-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CPP changes looks good to me.
On top of main branch, https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/pyexecutor/resource_manager.py#L827 would prevent storing not fully completed context request. That line should be removed to enable the changes in this PR.

…ting)

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
…l to removeSequence or storeContextBlocks

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
@schetlur-nv
Copy link
Copy Markdown
Collaborator

The CPP changes looks good to me. On top of main branch, https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/pyexecutor/resource_manager.py#L827 would prevent storing not fully completed context request. That line should be removed to enable the changes in this PR.

@SimengLiu-nv suggest we take that as a separate thing; let's keep this fix as narrowly tailored as possible. It will need to be cherry picked to other branches, so narrow scope of changes will simplify that process.

…alls remove_sequence

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41880 [ run ] completed with state SUCCESS. Commit: 102b548
/LLM/main/L0_MergeRequest_PR pipeline #32745 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@thorjohnsen
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41889 [ run ] triggered by Bot. Commit: 102b548 Link to invocation

thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 6, 2026
These tests depend on countReusableBlocks, getEstimatedReusableTokens,
and setEstimatedReusableTokens which are not available in release/1.2.1.
The original tests are available in NVIDIA#12673.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 6, 2026
…release/1.2.1

Cherry-pick of NVIDIA#12673 onto release/1.2.1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41889 [ run ] completed with state SUCCESS. Commit: 102b548
/LLM/main/L0_MergeRequest_PR pipeline #32754 completed with status: 'SUCCESS'

CI Report

Link to invocation

@juney-nvidia juney-nvidia merged commit b3a7381 into NVIDIA:main Apr 6, 2026
5 checks passed
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 6, 2026
…release/1.2.1

Cherry-pick of NVIDIA#12673 onto release/1.2.1.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 6, 2026
…#12673

Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 6, 2026
…ase/1.3.0rc5.post2)

Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 7, 2026
…ase/1.3.0rc5.post2)

Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 7, 2026
…ase/1.3.0rc5.post2)

Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 7, 2026
…ase/1.3.0rc5.post2)

Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 7, 2026
…ase/1.3.0rc5.post2)

Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2.

Fixes KV cache corruption caused by storing blocks with over-counted unique
token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse()
and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the
number of tokens stored for reuse.

Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility
class to avoid polluting the production interface.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 7, 2026
…ase/1.3.0rc5.post2)

Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2.

Fixes KV cache corruption caused by storing blocks with over-counted unique
token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse()
and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the
number of tokens stored for reuse.

Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility
class to avoid polluting the production interface.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 7, 2026
…ase/1.3.0rc5.post2)

Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2.

Fixes KV cache corruption caused by storing blocks with over-counted unique
token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse()
and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the
number of tokens stored for reuse.

Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility
class to avoid polluting the production interface.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 7, 2026
…ase/1.3.0rc5.post2)

Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2.

Fixes KV cache corruption caused by storing blocks with over-counted unique
token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse()
and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the
number of tokens stored for reuse.

Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility
class to avoid polluting the production interface.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
thorjohnsen added a commit to thorjohnsen/TensorRT-LLM that referenced this pull request Apr 7, 2026
…ase/1.3.0rc5.post2)

Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2.

Fixes KV cache corruption caused by storing blocks with over-counted unique
token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse()
and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the
number of tokens stored for reuse.

Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility
class to avoid polluting the production interface.

Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
yuanjingx87 pushed a commit that referenced this pull request Apr 8, 2026
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Tabrizian pushed a commit to Tabrizian/TensorRT-LLM that referenced this pull request Apr 8, 2026
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Tabrizian pushed a commit to Tabrizian/TensorRT-LLM that referenced this pull request Apr 9, 2026
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Tabrizian pushed a commit to Tabrizian/TensorRT-LLM that referenced this pull request Apr 9, 2026
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants