[TRTLLM-11160][feat] Clean up SWA work-arounds with the new radix sea…#13346
[TRTLLM-11160][feat] Clean up SWA work-arounds with the new radix sea…#13346SimengLiu-nv merged 3 commits intoNVIDIA:mainfrom
Conversation
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughThese changes introduce a unified placeholder block mechanism for KV cache management with SWA support, refactor block storage APIs from block IDs to block pointers, remove per-sequence validity tracking, and add thread-safe mutex exposure in the radix lookup tree while making integrity verification methods const-correct. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (1 warning, 2 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp (1)
1-3:⚠️ Potential issue | 🟡 MinorUpdate the modified-file copyright year to 2026.
This source file changed in this PR, but the banner still says
2025. As per coding guidelines, "Add NVIDIA copyright header on ALL new files and update year on modified files."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp` around lines 1 - 3, Update the SPDX copyright banner in evictionPolicy.cpp: change the year from 2025 to 2026 in the top-of-file comment block (the SPDX-FileCopyrightText line and any related header lines) so the modified file reflects 2026 per the project copyright/header policy.cpp/include/tensorrt_llm/batch_manager/evictionPolicy.h (1)
1-2:⚠️ Potential issue | 🟡 MinorUpdate the modified-file copyright year to 2026.
This header changed in this PR, but the banner still ends at
2025. As per coding guidelines, "Add NVIDIA copyright header on ALL new files and update year on modified files."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/tensorrt_llm/batch_manager/evictionPolicy.h` around lines 1 - 2, The file's copyright banner still ends at 2025; update the modified-file header year to 2026 in the top comment of evictionPolicy.h so it reads 2022-2026 (or the appropriate start–2026 range) to comply with the project's copyright header policy; make the change in the top-of-file comment block that contains the NVIDIA CORPORATION copyright line.
🧹 Nitpick comments (1)
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp (1)
981-981: Minor: Use east-const style forconstexprdeclaration.Per the coding guidelines,
const/constexprshould be placed to the right of the type (east-const style).♻️ Suggested fix
- constexpr int beamIdx = 0; // no need to consider more than one beam for input tokens + int constexpr beamIdx = 0; // no need to consider more than one beam for input tokens🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp` at line 981, Change the east-west const placement for the constexpr declaration: replace the current declaration "constexpr int beamIdx = 0;" with the east-const style "int constexpr beamIdx = 0;". Update the occurrence in kvCacheManager.cpp where beamIdx is defined so the variable uses the type then constexpr modifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`:
- Around line 308-315: The sentinel kPlaceholderBlockId currently set to
std::numeric_limits<IdType>::min() can cause UB when negated (e.g., -INT_MIN)
and relies on fragile runtime guards (checks in code paths that use
KVCacheBlock::kPlaceholderBlockId, mAllocatedBlocksPerSeq, createPlaceholder,
and BidirectionalVector::operator[]); change the sentinel to a safe negatable
value (for example reserve -1 for a special meaning and use -2 or another small
negative value for kPlaceholderBlockId) or introduce an explicit
unsigned/reserved-flag type for placeholder blocks, and update all uses
(createPlaceholder overloads, any normalization/negation code, and checks that
compare block->getBlockId()) accordingly; alternatively, if you must keep
INT_MIN, add a clear comment above kPlaceholderBlockId and defensive assertions
in BidirectionalVector::operator[] and normalization code to prevent any
negation/normalization of this sentinel.
In `@cpp/include/tensorrt_llm/batch_manager/radixBlockTree.h`:
- Around line 74-88: The move ctor/operator in UnifiedBlockTree incorrectly
refers to the unqualified base type Trie; update both
UnifiedBlockTree(UnifiedBlockTree&&) and operator=(UnifiedBlockTree&&) to
qualify the base type (either via a using Base = Fully::Qualified::Trie; alias
at class scope or by calling the fully qualified base constructor and assignment
like Fully::Qualified::Trie(std::move(other)) and
Fully::Qualified::Trie::operator=(std::move(other))). Ensure the same qualified
base is used in the member initializer list of
UnifiedBlockTree(UnifiedBlockTree&&) and in the body of
UnifiedBlockTree::operator= to fix the compile error.
In `@cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp`:
- Around line 175-183: The early return in eviction policy (in
releaseBlocks()/eviction handling) unconditionally skips placeholder blocks,
which prevents pooled placeholders from being reinserted into
mFreeQueues[kPlaceholderLevel]; change the condition to only fast-path-skip the
special no-arg SWA sentinel (KVCacheBlock::kPlaceholderBlockId) instead of all
placeholders. Concretely, replace the block->isPlaceholder() check with a check
that verifies both isPlaceholder() and block->id() ==
KVCacheBlock::kPlaceholderBlockId (or equivalent accessor), so pooled
placeholders initialized by initializePlaceholders() and claimed via
claimBlock() will fall through to the normal placeholder-level queue reinsertion
logic. Ensure the symbols referenced (KVCacheBlock::kPlaceholderBlockId,
initializePlaceholders, claimBlock, mFreeQueues[kPlaceholderLevel]) are used to
locate and adjust the logic.
In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`:
- Around line 1961-1964: The unlock/relock pattern around calls to getFreeBlock
and onboardBlock is unsafe and unnecessary given mLookupTree->getMutex() returns
a std::recursive_mutex; remove the temporary lock.unlock()/lock() sequences and
hold the std::unique_lock<std::recursive_mutex> (lock) across the
getFreeBlock(...) and onboardBlock(...) calls so the recursive mutex handles
nested acquisition safely and avoids TOCTOU races that could invalidate
matchingBlock or searchRoot; update the blocks where this pattern appears (the
calls to getFreeBlock and onboardBlock that reference matchingBlock and
searchRoot) to keep the lock held for the entire operation.
---
Outside diff comments:
In `@cpp/include/tensorrt_llm/batch_manager/evictionPolicy.h`:
- Around line 1-2: The file's copyright banner still ends at 2025; update the
modified-file header year to 2026 in the top comment of evictionPolicy.h so it
reads 2022-2026 (or the appropriate start–2026 range) to comply with the
project's copyright header policy; make the change in the top-of-file comment
block that contains the NVIDIA CORPORATION copyright line.
In `@cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp`:
- Around line 1-3: Update the SPDX copyright banner in evictionPolicy.cpp:
change the year from 2025 to 2026 in the top-of-file comment block (the
SPDX-FileCopyrightText line and any related header lines) so the modified file
reflects 2026 per the project copyright/header policy.
---
Nitpick comments:
In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`:
- Line 981: Change the east-west const placement for the constexpr declaration:
replace the current declaration "constexpr int beamIdx = 0;" with the east-const
style "int constexpr beamIdx = 0;". Update the occurrence in kvCacheManager.cpp
where beamIdx is defined so the variable uses the type then constexpr modifier.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: e8abd71a-a6b3-4cd8-a1dd-da740305c1d0
📒 Files selected for processing (8)
cpp/include/tensorrt_llm/batch_manager/evictionPolicy.hcpp/include/tensorrt_llm/batch_manager/kvCacheManager.hcpp/include/tensorrt_llm/batch_manager/radixBlockTree.hcpp/tensorrt_llm/batch_manager/evictionPolicy.cppcpp/tensorrt_llm/batch_manager/kvCacheManager.cppcpp/tensorrt_llm/testing/kvCacheManagerTestUtil.hcpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cppcpp/tests/unit_tests/batch_manager/radixBlockTreeTest.cpp
|
PR_Github #45010 [ run ] triggered by Bot. Commit: |
|
PR_Github #45010 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #45247 [ run ] triggered by Bot. Commit: |
|
PR_Github #45247 [ run ] completed with state
|
faa576e to
d1f5780
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #45444 [ run ] triggered by Bot. Commit: |
|
PR_Github #45444 [ run ] completed with state
|
eopXD
left a comment
There was a problem hiding this comment.
Looks good. The existing "reuse enabled" test case should cover testing of the changes.
I think we should use a pooled-approach for the placeholder blocks. Saves us the time of creating PlaceholderBlock objects on the fly.
Thank you for the review. The pooled-approach for the placeholder blocks will be non trivial and should be added in a new PR. |
|
/bot run --disable-fail-fast |
bb8ea16 to
fb21bf7
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #45957 [ run ] triggered by Bot. Commit: |
|
PR_Github #45957 [ run ] completed with state |
fb21bf7 to
3676d69
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #46445 [ run ] triggered by Bot. Commit: |
|
PR_Github #46445 [ run ] completed with state
|
…rch tree Replaces the legacy SWA reuse bookkeeping (mCachedBlocksRootMutex, mBlockToSequence, mIsValidStoreForReuseSequence, mManagedSequences, holdSequence/releaseSequence, freeChildren, mRepurposed) with: - KVCacheBlock::kPlaceholderBlockId + no-arg createPlaceholder() sentinel installed in mAllocatedBlocksPerSeq on detachFrontBlock. - storeBlocks rewritten around UnifiedBlockTree::insertNodes() with mutually-exclusive branches for SWA on-demand placeholders and linear-attention queue placeholders (NVIDIA#13029). - Unified UnifiedBlockTree::getMutex() (std::recursive_mutex) replacing mCachedBlocksRootMutex across the batch manager. - LRUEvictionPolicy::releaseBlock silently skips placeholder blocks. Includes VSWA* and TruePriorityEviction* coverage and a new VSWAEvictedPlaceholderAnchorAllowsTrailingReuse regression test. Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
The concrete root error behind the MoE/Qwen3 failures is: ImportError: cannot import name 'RoutingMethodType' from 'flashinfer.fused_moe.core'That appears in: The AutoDeploy failure is only reported as:
Rebase to main branch and run CI again. |
3676d69 to
20fc5c7
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #46526 [ run ] triggered by Bot. Commit: |
|
PR_Github #46526 [ run ] completed with state
|
|
/bot skip --comment "PR #13346 is NOT the root cause of any of the observed failures. The PR only modifies C++ KV cache / SWA block management internals and its associated unit tests. The HFValidationError failures are a pre-existing or separately-introduced regression in the llmapi Python layer that happens to be surfacing in this CI run." |
|
PR_Github #46565 [ skip ] triggered by Bot. Commit: |
|
PR_Github #46565 [ skip ] completed with state |
Signed-off-by: Simeng Liu <simengl@nvidia.com>
…rch tree
Replaces the legacy SWA reuse bookkeeping (mCachedBlocksRootMutex, mBlockToSequence, mIsValidStoreForReuseSequence, mManagedSequences, holdSequence/releaseSequence, freeChildren, mRepurposed) with:
Includes VSWA* and TruePriorityEviction* coverage and a new VSWAEvictedPlaceholderAnchorAllowsTrailingReuse regression test.
Summary by CodeRabbit
Refactor
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.