Skip to content

[TRTLLM-11160][feat] Clean up SWA work-arounds with the new radix sea…#13346

Merged
SimengLiu-nv merged 3 commits intoNVIDIA:mainfrom
SimengLiu-nv:step3-rebased
May 2, 2026
Merged

[TRTLLM-11160][feat] Clean up SWA work-arounds with the new radix sea…#13346
SimengLiu-nv merged 3 commits intoNVIDIA:mainfrom
SimengLiu-nv:step3-rebased

Conversation

@SimengLiu-nv
Copy link
Copy Markdown
Collaborator

@SimengLiu-nv SimengLiu-nv commented Apr 22, 2026

…rch tree

Replaces the legacy SWA reuse bookkeeping (mCachedBlocksRootMutex, mBlockToSequence, mIsValidStoreForReuseSequence, mManagedSequences, holdSequence/releaseSequence, freeChildren, mRepurposed) with:

  • KVCacheBlock::kPlaceholderBlockId + no-arg createPlaceholder() sentinel installed in mAllocatedBlocksPerSeq on detachFrontBlock.
  • storeBlocks rewritten around UnifiedBlockTree::insertNodes() with mutually-exclusive branches for SWA on-demand placeholders and linear-attention queue placeholders ([None][feat] Batch addSequence with two-phase claim and unified VSWA/non-reuse support #13029).
  • Unified UnifiedBlockTree::getMutex() (std::recursive_mutex) replacing mCachedBlocksRootMutex across the batch manager.
  • LRUEvictionPolicy::releaseBlock silently skips placeholder blocks.

Includes VSWA* and TruePriorityEviction* coverage and a new VSWAEvictedPlaceholderAnchorAllowsTrailingReuse regression test.

Summary by CodeRabbit

  • Refactor

    • Improved KV cache block management and synchronization for better performance and stability.
    • Enhanced placeholder block handling in sliding window attention cache operations.
    • Streamlined sequence storage tracking and block reuse logic.
  • Tests

    • Added validation tests for placeholder block creation.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@SimengLiu-nv SimengLiu-nv requested a review from a team as a code owner April 22, 2026 19:05
@SimengLiu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

These changes introduce a unified placeholder block mechanism for KV cache management with SWA support, refactor block storage APIs from block IDs to block pointers, remove per-sequence validity tracking, and add thread-safe mutex exposure in the radix lookup tree while making integrity verification methods const-correct.

Changes

Cohort / File(s) Summary
Eviction Policy Const-Correctness
cpp/include/tensorrt_llm/batch_manager/evictionPolicy.h, cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp
Updated verifyQueueIntegrity() from non-const to const pure virtual method and its override. Added placeholder block early-return in releaseBlock() to skip processing blocks with no physical GPU memory.
KV Cache Manager Headers
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
Added KVCacheBlock::kPlaceholderBlockId sentinel constant. Introduced no-argument createPlaceholder() overload. Added WindowBlockManager::storeContextBlocks() method. Changed storeBlocks() signature to take std::vector<BlockPtr> const& instead of std::vector<IdType> const&. Made verifyQueueIntegrity() const. Removed per-sequence validity tracking methods and sequence management helpers (initializeSequenceStorageValidity, holdSequence, releaseSequence, isSequenceValidForStoreForReuse).
KV Cache Manager Implementation
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
Implemented no-argument createPlaceholder() overload with kPlaceholderBlockId. Refactored storeBlocks() to accept block pointers and handle placeholder insertion logic deterministically. Introduced WindowBlockManager::storeContextBlocks() to build reuse trie keys from unique tokens. Modified getFreeBlock() to detach only selected block instead of subtree. Switched synchronization from mCachedBlocksRootMutex to mLookupTree->getMutex() with temporary unlock pattern to prevent deadlocks. Updated detachFrontBlock() to replace evicted blocks with placeholders. Removed sequence "held" tracking calls.
Radix Block Tree
cpp/include/tensorrt_llm/batch_manager/radixBlockTree.h
Added private std::recursive_mutex mMutex member. Implemented explicit noexcept move constructor and move-assignment operator. Added [[nodiscard]] getMutex() noexcept public method to expose internal mutex for caller synchronization of all trie operations.
Tests and Utilities
cpp/tests/unit_tests/batch_manager/radixBlockTreeTest.cpp, cpp/tensorrt_llm/testing/kvCacheManagerTestUtil.h
Added unit test verifying no-argument createPlaceholder() creates placeholder with kPlaceholderBlockId. Updated documentation comment formatting in test utility (no logic changes).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title is incomplete and truncated, ending with '...' instead of providing the full description of the feature being implemented. Complete the PR title to fully describe the feature, e.g., '[TRTLLM-11160][feat] Clean up SWA work-arounds with the new radix search tree'.
Description check ❓ Inconclusive The PR description provides a clear technical summary of the changes and includes details about the implementation approach, but the required template sections (Description, Test Coverage, PR Checklist) are incomplete or not properly filled out. Complete the required template sections: add a concise description section explaining what and why, list specific test cases that validate the changes (mentioning test file names), and confirm the PR checklist items have been reviewed.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp (1)

1-3: ⚠️ Potential issue | 🟡 Minor

Update the modified-file copyright year to 2026.

This source file changed in this PR, but the banner still says 2025. As per coding guidelines, "Add NVIDIA copyright header on ALL new files and update year on modified files."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp` around lines 1 - 3, Update
the SPDX copyright banner in evictionPolicy.cpp: change the year from 2025 to
2026 in the top-of-file comment block (the SPDX-FileCopyrightText line and any
related header lines) so the modified file reflects 2026 per the project
copyright/header policy.
cpp/include/tensorrt_llm/batch_manager/evictionPolicy.h (1)

1-2: ⚠️ Potential issue | 🟡 Minor

Update the modified-file copyright year to 2026.

This header changed in this PR, but the banner still ends at 2025. As per coding guidelines, "Add NVIDIA copyright header on ALL new files and update year on modified files."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/tensorrt_llm/batch_manager/evictionPolicy.h` around lines 1 - 2,
The file's copyright banner still ends at 2025; update the modified-file header
year to 2026 in the top comment of evictionPolicy.h so it reads 2022-2026 (or
the appropriate start–2026 range) to comply with the project's copyright header
policy; make the change in the top-of-file comment block that contains the
NVIDIA CORPORATION copyright line.
🧹 Nitpick comments (1)
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp (1)

981-981: Minor: Use east-const style for constexpr declaration.

Per the coding guidelines, const/constexpr should be placed to the right of the type (east-const style).

♻️ Suggested fix
-    constexpr int beamIdx = 0; // no need to consider more than one beam for input tokens
+    int constexpr beamIdx = 0; // no need to consider more than one beam for input tokens
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp` at line 981, Change the
east-west const placement for the constexpr declaration: replace the current
declaration "constexpr int beamIdx = 0;" with the east-const style "int
constexpr beamIdx = 0;". Update the occurrence in kvCacheManager.cpp where
beamIdx is defined so the variable uses the type then constexpr modifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`:
- Around line 308-315: The sentinel kPlaceholderBlockId currently set to
std::numeric_limits<IdType>::min() can cause UB when negated (e.g., -INT_MIN)
and relies on fragile runtime guards (checks in code paths that use
KVCacheBlock::kPlaceholderBlockId, mAllocatedBlocksPerSeq, createPlaceholder,
and BidirectionalVector::operator[]); change the sentinel to a safe negatable
value (for example reserve -1 for a special meaning and use -2 or another small
negative value for kPlaceholderBlockId) or introduce an explicit
unsigned/reserved-flag type for placeholder blocks, and update all uses
(createPlaceholder overloads, any normalization/negation code, and checks that
compare block->getBlockId()) accordingly; alternatively, if you must keep
INT_MIN, add a clear comment above kPlaceholderBlockId and defensive assertions
in BidirectionalVector::operator[] and normalization code to prevent any
negation/normalization of this sentinel.

In `@cpp/include/tensorrt_llm/batch_manager/radixBlockTree.h`:
- Around line 74-88: The move ctor/operator in UnifiedBlockTree incorrectly
refers to the unqualified base type Trie; update both
UnifiedBlockTree(UnifiedBlockTree&&) and operator=(UnifiedBlockTree&&) to
qualify the base type (either via a using Base = Fully::Qualified::Trie; alias
at class scope or by calling the fully qualified base constructor and assignment
like Fully::Qualified::Trie(std::move(other)) and
Fully::Qualified::Trie::operator=(std::move(other))). Ensure the same qualified
base is used in the member initializer list of
UnifiedBlockTree(UnifiedBlockTree&&) and in the body of
UnifiedBlockTree::operator= to fix the compile error.

In `@cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp`:
- Around line 175-183: The early return in eviction policy (in
releaseBlocks()/eviction handling) unconditionally skips placeholder blocks,
which prevents pooled placeholders from being reinserted into
mFreeQueues[kPlaceholderLevel]; change the condition to only fast-path-skip the
special no-arg SWA sentinel (KVCacheBlock::kPlaceholderBlockId) instead of all
placeholders. Concretely, replace the block->isPlaceholder() check with a check
that verifies both isPlaceholder() and block->id() ==
KVCacheBlock::kPlaceholderBlockId (or equivalent accessor), so pooled
placeholders initialized by initializePlaceholders() and claimed via
claimBlock() will fall through to the normal placeholder-level queue reinsertion
logic. Ensure the symbols referenced (KVCacheBlock::kPlaceholderBlockId,
initializePlaceholders, claimBlock, mFreeQueues[kPlaceholderLevel]) are used to
locate and adjust the logic.

In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`:
- Around line 1961-1964: The unlock/relock pattern around calls to getFreeBlock
and onboardBlock is unsafe and unnecessary given mLookupTree->getMutex() returns
a std::recursive_mutex; remove the temporary lock.unlock()/lock() sequences and
hold the std::unique_lock<std::recursive_mutex> (lock) across the
getFreeBlock(...) and onboardBlock(...) calls so the recursive mutex handles
nested acquisition safely and avoids TOCTOU races that could invalidate
matchingBlock or searchRoot; update the blocks where this pattern appears (the
calls to getFreeBlock and onboardBlock that reference matchingBlock and
searchRoot) to keep the lock held for the entire operation.

---

Outside diff comments:
In `@cpp/include/tensorrt_llm/batch_manager/evictionPolicy.h`:
- Around line 1-2: The file's copyright banner still ends at 2025; update the
modified-file header year to 2026 in the top comment of evictionPolicy.h so it
reads 2022-2026 (or the appropriate start–2026 range) to comply with the
project's copyright header policy; make the change in the top-of-file comment
block that contains the NVIDIA CORPORATION copyright line.

In `@cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp`:
- Around line 1-3: Update the SPDX copyright banner in evictionPolicy.cpp:
change the year from 2025 to 2026 in the top-of-file comment block (the
SPDX-FileCopyrightText line and any related header lines) so the modified file
reflects 2026 per the project copyright/header policy.

---

Nitpick comments:
In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`:
- Line 981: Change the east-west const placement for the constexpr declaration:
replace the current declaration "constexpr int beamIdx = 0;" with the east-const
style "int constexpr beamIdx = 0;". Update the occurrence in kvCacheManager.cpp
where beamIdx is defined so the variable uses the type then constexpr modifier.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e8abd71a-a6b3-4cd8-a1dd-da740305c1d0

📥 Commits

Reviewing files that changed from the base of the PR and between 7a8bd87 and faa576e.

📒 Files selected for processing (8)
  • cpp/include/tensorrt_llm/batch_manager/evictionPolicy.h
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/include/tensorrt_llm/batch_manager/radixBlockTree.h
  • cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp
  • cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
  • cpp/tensorrt_llm/testing/kvCacheManagerTestUtil.h
  • cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp
  • cpp/tests/unit_tests/batch_manager/radixBlockTreeTest.cpp

Comment thread cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
Comment thread cpp/include/tensorrt_llm/batch_manager/radixBlockTree.h
Comment thread cpp/tensorrt_llm/batch_manager/evictionPolicy.cpp Outdated
Comment thread cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp Outdated
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45010 [ run ] triggered by Bot. Commit: faa576e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45010 [ run ] completed with state ABORTED. Commit: faa576e

Link to invocation

@peihu-nv
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45247 [ run ] triggered by Bot. Commit: faa576e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45247 [ run ] completed with state SUCCESS. Commit: faa576e
/LLM/main/L0_MergeRequest_PR pipeline #35507 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@SimengLiu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45444 [ run ] triggered by Bot. Commit: d1f5780 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45444 [ run ] completed with state FAILURE. Commit: d1f5780
/LLM/main/L0_MergeRequest_PR pipeline #35676 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Copy link
Copy Markdown
Collaborator

@eopXD eopXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The existing "reuse enabled" test case should cover testing of the changes.

I think we should use a pooled-approach for the placeholder blocks. Saves us the time of creating PlaceholderBlock objects on the fly.

@SimengLiu-nv
Copy link
Copy Markdown
Collaborator Author

Looks good. The existing "reuse enabled" test case should cover testing of the changes.

I think we should use a pooled-approach for the placeholder blocks. Saves us the time of creating PlaceholderBlock objects on the fly.

Thank you for the review. The pooled-approach for the placeholder blocks will be non trivial and should be added in a new PR.

@SimengLiu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@SimengLiu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45957 [ run ] triggered by Bot. Commit: fb21bf7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45957 [ run ] completed with state ABORTED. Commit: fb21bf7

Link to invocation

@SimengLiu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46445 [ run ] triggered by Bot. Commit: 3676d69 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46445 [ run ] completed with state SUCCESS. Commit: 3676d69
/LLM/main/L0_MergeRequest_PR pipeline #36516 completed with status: 'ABORTED'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…rch tree

Replaces the legacy SWA reuse bookkeeping (mCachedBlocksRootMutex,
mBlockToSequence, mIsValidStoreForReuseSequence, mManagedSequences,
holdSequence/releaseSequence, freeChildren, mRepurposed) with:
- KVCacheBlock::kPlaceholderBlockId + no-arg createPlaceholder() sentinel
  installed in mAllocatedBlocksPerSeq on detachFrontBlock.
- storeBlocks rewritten around UnifiedBlockTree::insertNodes() with
  mutually-exclusive branches for SWA on-demand placeholders and
  linear-attention queue placeholders (NVIDIA#13029).
- Unified UnifiedBlockTree::getMutex() (std::recursive_mutex) replacing
  mCachedBlocksRootMutex across the batch manager.
- LRUEvictionPolicy::releaseBlock silently skips placeholder blocks.

Includes VSWA* and TruePriorityEviction* coverage and a new
VSWAEvictedPlaceholderAnchorAllowsTrailingReuse regression test.

Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
@SimengLiu-nv
Copy link
Copy Markdown
Collaborator Author

The concrete root error behind the MoE/Qwen3 failures is:

ImportError: cannot import name 'RoutingMethodType' from 'flashinfer.fused_moe.core'

That appears in:

unittest/_torch/modules/moe/test_moe_backend.py::test_moe_backend -k "TRTLLM"
unittest/_torch/modules/moe/test_moe_module.py::test_configurable_moe_single_gpu -k "TRTLLM"
unittest/tools/test_layer_wise_benchmarks.py::test_qwen3_next_gen_tep[1]

The AutoDeploy failure is only reported as:

test_fp8_convert_amax_hook[224.0-0.5] - Test terminated unexpectedly

Rebase to main branch and run CI again.

@SimengLiu-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46526 [ run ] triggered by Bot. Commit: 20fc5c7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46526 [ run ] completed with state FAILURE. Commit: 20fc5c7
/LLM/main/L0_MergeRequest_PR pipeline #36585 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@SimengLiu-nv
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "PR #13346 is NOT the root cause of any of the observed failures. The PR only modifies C++ KV cache / SWA block management internals and its associated unit tests. The HFValidationError failures are a pre-existing or separately-introduced regression in the llmapi Python layer that happens to be surfacing in this CI run."

@SimengLiu-nv SimengLiu-nv enabled auto-merge (squash) May 2, 2026 18:13
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46565 [ skip ] triggered by Bot. Commit: 20fc5c7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46565 [ skip ] completed with state SUCCESS. Commit: 20fc5c7
Skipping testing for commit 20fc5c7

Link to invocation

@SimengLiu-nv SimengLiu-nv merged commit b9cbe46 into NVIDIA:main May 2, 2026
7 checks passed
SimengLiu-nv added a commit to SimengLiu-nv/TensorRT-LLM that referenced this pull request May 6, 2026
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants