[https://nvbugs/6025177][fix] Fix KV cache issue#12673
[https://nvbugs/6025177][fix] Fix KV cache issue#12673juney-nvidia merged 22 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
📝 WalkthroughWalkthroughThe changes introduce helper functions to compute token counts for KV-cache reuse by factoring in the request's materialized context position. Three block management operations are updated to use these computed counts instead of a fixed formula when chunking tokens into blocks. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp (1)
2-2:⚠️ Potential issue | 🟠 MajorUpdate the NVIDIA copyright year for this modified file.
This file was modified in this PR, but the header still ends at 2025.
As per coding guidelines, "Add NVIDIA copyright header to ALL new files; update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` at line 2, This file's NVIDIA copyright header in kvCacheManagerTest.cpp still reads "2023-2025"; update the header to include the current year by changing the end year to 2026 (e.g., "2023-2026") wherever that SPDX copyright line appears so the modified file's header is up to date.
🧹 Nitpick comments (1)
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp (1)
2830-2831: Rename new test constants to guideline-compliant constant style.
kMaterializedContextLengthandkReusableContextLengthshould follow uppercase snakecase after thekprefix.Proposed naming update
- auto constexpr kMaterializedContextLength = 5; - auto constexpr kReusableContextLength = 4; + auto constexpr kMATERIALIZED_CONTEXT_LENGTH = 5; + auto constexpr kREUSABLE_CONTEXT_LENGTH = 4; ... - llmRequest0->setContextCurrentPosition(kMaterializedContextLength); + llmRequest0->setContextCurrentPosition(kMATERIALIZED_CONTEXT_LENGTH); ... - EXPECT_EQ(llmRequest1->getContextCurrentPosition(), kReusableContextLength); + EXPECT_EQ(llmRequest1->getContextCurrentPosition(), kREUSABLE_CONTEXT_LENGTH);As per coding guidelines, "C++ constants should use uppercase snakecase with prefix 'k':
int const kDIGIT_NUM = 10;".Also applies to: 2875-2876
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` around lines 2830 - 2831, Rename the C++ test constants to uppercase snakecase after the 'k' prefix: change kMaterializedContextLength -> kMATERIALIZED_CONTEXT_LENGTH and kReusableContextLength -> kREUSABLE_CONTEXT_LENGTH, and update all references/usages accordingly (there are additional similar constants in the same file that need the same rename). Ensure declarations and any test code that uses these symbols (e.g., in kvCacheManagerTest.cpp) are updated to the new identifiers to keep the build passing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Line 2: This file's NVIDIA copyright header in kvCacheManagerTest.cpp still
reads "2023-2025"; update the header to include the current year by changing the
end year to 2026 (e.g., "2023-2026") wherever that SPDX copyright line appears
so the modified file's header is up to date.
---
Nitpick comments:
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 2830-2831: Rename the C++ test constants to uppercase snakecase
after the 'k' prefix: change kMaterializedContextLength ->
kMATERIALIZED_CONTEXT_LENGTH and kReusableContextLength ->
kREUSABLE_CONTEXT_LENGTH, and update all references/usages accordingly (there
are additional similar constants in the same file that need the same rename).
Ensure declarations and any test code that uses these symbols (e.g., in
kvCacheManagerTest.cpp) are updated to the new identifiers to keep the build
passing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 394f9969-5941-4e60-aca5-5d659a9e5e17
📒 Files selected for processing (2)
cpp/tensorrt_llm/batch_manager/kvCacheManager.cppcpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
|
/bot run --disabled-fail-fast |
|
/bot run --disabled-fail-fast |
|
PR_Github #41292 Bot args parsing error: usage: /bot [-h] |
|
/bot run --disable-fail-fast |
|
PR_Github #41305 [ run ] triggered by Bot. Commit: |
|
PR_Github #41305 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41408 [ run ] triggered by Bot. Commit: |
|
PR_Github #41408 [ run ] completed with state
|
SimengLiu-nv
left a comment
There was a problem hiding this comment.
The CPP changes looks good to me.
On top of main branch, https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/pyexecutor/resource_manager.py#L827 would prevent storing not fully completed context request. That line should be removed to enable the changes in this PR.
…ting) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
…l to removeSequence or storeContextBlocks Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
@SimengLiu-nv suggest we take that as a separate thing; let's keep this fix as narrowly tailored as possible. It will need to be cherry picked to other branches, so narrow scope of changes will simplify that process. |
…alls remove_sequence Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
|
PR_Github #41880 [ run ] completed with state
|
|
/bot run |
|
PR_Github #41889 [ run ] triggered by Bot. Commit: |
These tests depend on countReusableBlocks, getEstimatedReusableTokens, and setEstimatedReusableTokens which are not available in release/1.2.1. The original tests are available in NVIDIA#12673. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…release/1.2.1 Cherry-pick of NVIDIA#12673 onto release/1.2.1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
PR_Github #41889 [ run ] completed with state |
…release/1.2.1 Cherry-pick of NVIDIA#12673 onto release/1.2.1. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…#12673 Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Fixes KV cache corruption caused by storing blocks with over-counted unique token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse() and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the number of tokens stored for reuse. Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility class to avoid polluting the production interface. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Fixes KV cache corruption caused by storing blocks with over-counted unique token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse() and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the number of tokens stored for reuse. Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility class to avoid polluting the production interface. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Fixes KV cache corruption caused by storing blocks with over-counted unique token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse() and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the number of tokens stored for reuse. Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility class to avoid polluting the production interface. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Fixes KV cache corruption caused by storing blocks with over-counted unique token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse() and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the number of tokens stored for reuse. Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility class to avoid polluting the production interface. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
…ase/1.3.0rc5.post2) Cherry-pick of NVIDIA#12673 onto release/1.3.0rc5.post2. Fixes KV cache corruption caused by storing blocks with over-counted unique token extent during chunked prefill. Introduces getUsableUniqueTokenCountForReuse() and getMaterializedUniqueTokenCountForReuse() helpers to correctly cap the number of tokens stored for reuse. Also moves simulatePrefillCompletion to a KvCacheManagerTestUtil test utility class to avoid polluting the production interface. Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
@coderabbitai summary
Description
Fixes an issue that caused KV cache to become corrupted.
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.