[TRTLLM-9772][feat] Support cache reuse for SSM in KVCacheManagerV2#12644
[TRTLLM-9772][feat] Support cache reuse for SSM in KVCacheManagerV2#12644lowsfer merged 3 commits intoNVIDIA:mainfrom
Conversation
|
/bot run --disable-fail-fast |
|
PR_Github #41088 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThe PR introduces SSM (State Space Model) reuse with interval-based snapshots to the KV cache manager. Changes include new configuration types and fields for SSM support, a deferred GPU copy mechanism during first resume, reworked prefix reuse logic accounting for SSM lifecycle stages, and snapshot-driven commit behavior. Validation ensures compatibility constraints are met. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyi`:
- Around line 109-115: The stub for KVCacheManagerConfig has an incorrect type
for the layers field (currently declared as list[AttentionLayerConfig]); update
the annotation to use the union type LayerConfig so SsmLayerConfig instances are
accepted at type-check time—i.e., change the KVCacheManagerConfig.layers
annotation from list[AttentionLayerConfig] to list[LayerConfig] (LayerConfig is
already defined as AttentionLayerConfig | SsmLayerConfig).
In `@tensorrt_llm/runtime/kv_cache_manager_v2/_config.py`:
- Around line 209-213: The constructor/validation currently enforces
ssm_reuse_interval divisibility against tokens_per_block for all configs; change
the logic so the checks that ssm_reuse_interval is positive and a multiple of
tokens_per_block only run when has_ssm_layer is True (refer to the
ssm_reuse_interval, tokens_per_block, and has_ssm_layer symbols and the
validation block in the class/constructor that currently raises on
non-divisors), and add an attention-only regression test that constructs a
config with has_ssm_layer=False and tokens_per_block=96 to ensure the default
ssm_reuse_interval=512 does not raise.
In `@tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.py`:
- Around line 816-817: UncommittedPage is being constructed with
BlockOrdinal(0), which forces block-0 priority for every SSM snapshot; replace
BlockOrdinal(0) with the snapshot block's actual ordinal pulled from the
snapshot/tree_block (e.g., use tree_block.ordinal or tree_block.block_ordinal as
appropriate) so that UncommittedPage(self, <snapshot_ordinal>, ssm_lc_id, lvl,
new_slot, beam_idx) computes correct priority; update the constructor call
before calling convert_to_committed(tree_block, ready_event) to pass that real
ordinal.
- Around line 633-646: The new deferred allocation path in _kv_cache.py can
raise OutOfPagesError before the existing recovery path runs, changing
resume()'s boolean-only failure contract; wrap the storage.new_gpu_slots(...)
call and the subsequent loop that assigns deferred_slots (the block inside the
if self._never_resumed branch that constructs num_slots and calls
storage.new_gpu_slots and iterates tmp_slots) in a try/except that catches
OutOfPagesError and returns False from resume() (preserving other exception
propagation), so that when SsmLifeCycle or has_partial allocation fails under
memory pressure resume() still returns False rather than raising.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d82a63b7-3093-45e4-8c4e-1699e7621999
📒 Files selected for processing (7)
tensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyitensorrt_llm/runtime/kv_cache_manager_v2/_block_radix_tree.pytensorrt_llm/runtime/kv_cache_manager_v2/_config.pytensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.pytensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache_manager.pytensorrt_llm/runtime/kv_cache_manager_v2/_page.pytests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #41133 [ run ] triggered by Bot. Commit: |
|
PR_Github #41133 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41383 [ run ] triggered by Bot. Commit: |
|
PR_Github #41383 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41658 [ run ] triggered by Bot. Commit: |
|
PR_Github #41658 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41797 [ run ] triggered by Bot. Commit: |
|
PR_Github #41797 [ run ] completed with state |
…VIDIA#12644) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
…VIDIA#12644) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
@coderabbitai summary
Description
SSM was supported but reuse is disabled when SSM layers are present. This PR allows cache reuse for SSM layers as well, by snapshotting SSM states periodically.
Test Coverage
Added test cases in test_kv_cache_manager_v2.py to cover new feature.
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.