[None][chore] Refactor salting support for KVCacheManagerV2#14140
Conversation
📝 WalkthroughWalkthroughThis PR refactors KV cache prefix-reuse selection from a hashed ChangesReuseScope Abstraction and Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py (1)
497-535: QA list updates are unnecessary for this PR.This change is runtime wiring + unit-test scope, so no additions to
tests/integration/test_lists/qa/are needed.As per coding guidelines: "If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py` around lines 497 - 535, Single-line QA note missing: add an explicit statement that QA list updates are unnecessary because this PR only touches unit tests and runtime wiring; update the PR description (or the changelog/PR body) to include a brief sentence such as "QA list updates are unnecessary for this PR — changes are limited to unit tests (tests/unittest/...) and runtime wiring" and reference the affected test (test_reuse_scope_isolates_reuse) and symbols (ReuseScope, commit_for, num_reused) so reviewers understand scope.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py`:
- Around line 497-535: Single-line QA note missing: add an explicit statement
that QA list updates are unnecessary because this PR only touches unit tests and
runtime wiring; update the PR description (or the changelog/PR body) to include
a brief sentence such as "QA list updates are unnecessary for this PR — changes
are limited to unit tests (tests/unittest/...) and runtime wiring" and reference
the affected test (test_reuse_scope_isolates_reuse) and symbols (ReuseScope,
commit_for, num_reused) so reviewers understand scope.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 42b470bb-5c1e-4800-9279-8a2352bf015a
📒 Files selected for processing (10)
tensorrt_llm/_torch/pyexecutor/resource_manager.pytensorrt_llm/runtime/kv_cache_manager_v2/__init__.pytensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyitensorrt_llm/runtime/kv_cache_manager_v2/_block_radix_tree.pytensorrt_llm/runtime/kv_cache_manager_v2/_common.pytensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.pytensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache_manager.pytests/integration/test_lists/waives.txttests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.pytests/unittest/kv_cache_manager_v2_tests/test_kv_cache_salting.py
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/waives.txt
eopXD
left a comment
There was a problem hiding this comment.
Looks good. Thank you for the refactoring.
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #48585 [ run ] triggered by Bot. Commit: |
|
PR_Github #48585 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #48593 [ run ] triggered by Bot. Commit: |
|
PR_Github #48593 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #48743 [ run ] triggered by Bot. Commit: |
|
PR_Github #48743 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #48756 [ run ] triggered by Bot. Commit: |
|
PR_Github #48756 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #48840 [ run ] triggered by Bot. Commit: |
|
PR_Github #48840 [ run ] completed with state |
…4140) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
After NVIDIA#14140 (cherry-picked in NVIDIA#14353) RootBlock no longer exposes ``lora_task_id`` / ``cache_salt_id`` directly — both fields are folded into a ``ReuseScope`` NamedTuple attached as ``root.reuse_scope``. ``KVCacheEventManager._root_attrs_from_root_block`` was still reading the legacy attribute names via ``getattr(..., None)``, so every emitted event silently received ``(None, None)`` and the V1-compat hash collapsed to the same value for any LoRA/salt request. Downstream Dynamo routing depends on those hashes, so this regression would materially degrade prefix cache hit rate for any request carrying a LoRA task id or a cache salt. Fix ``_root_attrs_from_root_block`` to prefer ``root.reuse_scope`` and fall back to the legacy attributes (keeps any in-flight pre-refactor RootBlock instances working). Also: - Update the now-stale ``test_v2_root_key_distinguishes_lora_from_cache_salt_id`` to use the new single-arg ``RootBlock.make_key(ReuseScope(...))`` API. - Extend ``_FakeRootBlock`` with an optional ``reuse_scope`` kwarg so tests can mimic both the pre- and post-refactor RootBlock shape. - Add a regression test ``test_v2_kv_cache_event_manager_v1_hash_reads_root_reuse_scope`` that asserts (a) a ReuseScope-shaped root and a legacy root with the same lora/salt produce identical V1 event hashes, (b) different scopes still produce different hashes (no silent collapse), and (c) an empty ReuseScope matches a legacy root with no salt/lora. Addresses reviewer feedback on NVIDIA#14351 (peihu-nv). Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>
After NVIDIA#14140 (cherry-picked in NVIDIA#14353) RootBlock no longer exposes ``lora_task_id`` / ``cache_salt_id`` directly — both fields are folded into a ``ReuseScope`` NamedTuple attached as ``root.reuse_scope``. ``KVCacheEventManager._root_attrs_from_root_block`` was still reading the legacy attribute names via ``getattr(..., None)``, so every emitted event silently received ``(None, None)`` and the V1-compat hash collapsed to the same value for any LoRA/salt request. Downstream Dynamo routing depends on those hashes, so this regression would materially degrade prefix cache hit rate for any request carrying a LoRA task id or a cache salt. Fix ``_root_attrs_from_root_block`` to prefer ``root.reuse_scope`` and fall back to the legacy attributes (keeps any in-flight pre-refactor RootBlock instances working). Also: - Update the now-stale ``test_v2_root_key_distinguishes_lora_from_cache_salt_id`` to use the new single-arg ``RootBlock.make_key(ReuseScope(...))`` API. - Extend ``_FakeRootBlock`` with an optional ``reuse_scope`` kwarg so tests can mimic both the pre- and post-refactor RootBlock shape. - Add a regression test ``test_v2_kv_cache_event_manager_v1_hash_reads_root_reuse_scope`` that asserts (a) a ReuseScope-shaped root and a legacy root with the same lora/salt produce identical V1 event hashes, (b) different scopes still produce different hashes (no silent collapse), and (c) an empty ReuseScope matches a legacy root with no salt/lora. Addresses reviewer feedback on NVIDIA#14351 (peihu-nv). Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>
After NVIDIA#14140 (cherry-picked in NVIDIA#14353) RootBlock no longer exposes ``lora_task_id`` / ``cache_salt_id`` directly — both fields are folded into a ``ReuseScope`` NamedTuple attached as ``root.reuse_scope``. ``KVCacheEventManager._root_attrs_from_root_block`` was still reading the legacy attribute names via ``getattr(..., None)``, so every emitted event silently received ``(None, None)`` and the V1-compat hash collapsed to the same value for any LoRA/salt request. Downstream Dynamo routing depends on those hashes, so this regression would materially degrade prefix cache hit rate for any request carrying a LoRA task id or a cache salt. Fix ``_root_attrs_from_root_block`` to prefer ``root.reuse_scope`` and fall back to the legacy attributes (keeps any in-flight pre-refactor RootBlock instances working). Also: - Update the now-stale ``test_v2_root_key_distinguishes_lora_from_cache_salt_id`` to use the new single-arg ``RootBlock.make_key(ReuseScope(...))`` API. - Extend ``_FakeRootBlock`` with an optional ``reuse_scope`` kwarg so tests can mimic both the pre- and post-refactor RootBlock shape. - Add a regression test ``test_v2_kv_cache_event_manager_v1_hash_reads_root_reuse_scope`` that asserts (a) a ReuseScope-shaped root and a legacy root with the same lora/salt produce identical V1 event hashes, (b) different scopes still produce different hashes (no silent collapse), and (c) an empty ReuseScope matches a legacy root with no salt/lora. Addresses reviewer feedback on NVIDIA#14351 (peihu-nv). Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>
…4140) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
After NVIDIA#14140 (cherry-picked in NVIDIA#14353) RootBlock no longer exposes ``lora_task_id`` / ``cache_salt_id`` directly — both fields are folded into a ``ReuseScope`` NamedTuple attached as ``root.reuse_scope``. ``KVCacheEventManager._root_attrs_from_root_block`` was still reading the legacy attribute names via ``getattr(..., None)``, so every emitted event silently received ``(None, None)`` and the V1-compat hash collapsed to the same value for any LoRA/salt request. Downstream Dynamo routing depends on those hashes, so this regression would materially degrade prefix cache hit rate for any request carrying a LoRA task id or a cache salt. Fix ``_root_attrs_from_root_block`` to prefer ``root.reuse_scope`` and fall back to the legacy attributes (keeps any in-flight pre-refactor RootBlock instances working). Also: - Update the now-stale ``test_v2_root_key_distinguishes_lora_from_cache_salt_id`` to use the new single-arg ``RootBlock.make_key(ReuseScope(...))`` API. - Extend ``_FakeRootBlock`` with an optional ``reuse_scope`` kwarg so tests can mimic both the pre- and post-refactor RootBlock shape. - Add a regression test ``test_v2_kv_cache_event_manager_v1_hash_reads_root_reuse_scope`` that asserts (a) a ReuseScope-shaped root and a legacy root with the same lora/salt produce identical V1 event hashes, (b) different scopes still produce different hashes (no silent collapse), and (c) an empty ReuseScope matches a legacy root with no salt/lora. Addresses reviewer feedback on NVIDIA#14351 (peihu-nv). Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>
After NVIDIA#14140 (cherry-picked in NVIDIA#14353) RootBlock no longer exposes ``lora_task_id`` / ``cache_salt_id`` directly — both fields are folded into a ``ReuseScope`` NamedTuple attached as ``root.reuse_scope``. ``KVCacheEventManager._root_attrs_from_root_block`` was still reading the legacy attribute names via ``getattr(..., None)``, so every emitted event silently received ``(None, None)`` and the V1-compat hash collapsed to the same value for any LoRA/salt request. Downstream Dynamo routing depends on those hashes, so this regression would materially degrade prefix cache hit rate for any request carrying a LoRA task id or a cache salt. Fix ``_root_attrs_from_root_block`` to prefer ``root.reuse_scope`` and fall back to the legacy attributes (keeps any in-flight pre-refactor RootBlock instances working). Also: - Update the now-stale ``test_v2_root_key_distinguishes_lora_from_cache_salt_id`` to use the new single-arg ``RootBlock.make_key(ReuseScope(...))`` API. - Extend ``_FakeRootBlock`` with an optional ``reuse_scope`` kwarg so tests can mimic both the pre- and post-refactor RootBlock shape. - Add a regression test ``test_v2_kv_cache_event_manager_v1_hash_reads_root_reuse_scope`` that asserts (a) a ReuseScope-shaped root and a legacy root with the same lora/salt produce identical V1 event hashes, (b) different scopes still produce different hashes (no silent collapse), and (c) an empty ReuseScope matches a legacy root with no salt/lora. Addresses reviewer feedback on NVIDIA#14351 (peihu-nv). Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>
…4140) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
After NVIDIA#14140 (cherry-picked in NVIDIA#14353) RootBlock no longer exposes ``lora_task_id`` / ``cache_salt_id`` directly — both fields are folded into a ``ReuseScope`` NamedTuple attached as ``root.reuse_scope``. ``KVCacheEventManager._root_attrs_from_root_block`` was still reading the legacy attribute names via ``getattr(..., None)``, so every emitted event silently received ``(None, None)`` and the V1-compat hash collapsed to the same value for any LoRA/salt request. Downstream Dynamo routing depends on those hashes, so this regression would materially degrade prefix cache hit rate for any request carrying a LoRA task id or a cache salt. Fix ``_root_attrs_from_root_block`` to prefer ``root.reuse_scope`` and fall back to the legacy attributes (keeps any in-flight pre-refactor RootBlock instances working). Also: - Update the now-stale ``test_v2_root_key_distinguishes_lora_from_cache_salt_id`` to use the new single-arg ``RootBlock.make_key(ReuseScope(...))`` API. - Extend ``_FakeRootBlock`` with an optional ``reuse_scope`` kwarg so tests can mimic both the pre- and post-refactor RootBlock shape. - Add a regression test ``test_v2_kv_cache_event_manager_v1_hash_reads_root_reuse_scope`` that asserts (a) a ReuseScope-shaped root and a legacy root with the same lora/salt produce identical V1 event hashes, (b) different scopes still produce different hashes (no silent collapse), and (c) an empty ReuseScope matches a legacy root with no salt/lora. Addresses reviewer feedback on NVIDIA#14351 (peihu-nv). Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>
Summary
Tests
Note: targeted CUDA E2E test was attempted locally but blocked by CUDA initialization failure before test body.
Summary by CodeRabbit
New Features
ReuseScope, a unified parameter component for managing KV cache reuse configuration including LoRA and cache isolation settings.API Changes
KVCacheManager.create_kv_cache()now accepts areuse_scopeparameter instead of separatelora_task_idandcache_salt_idparameters, providing a more consolidated interface for cache reuse control.