[None][chore] Refactor salting support for KVCacheManagerV2 by lowsfer · Pull Request #14140 · NVIDIA/TensorRT-LLM

lowsfer · 2026-05-14T10:12:27Z

Summary

Refactor KV cache reuse namespace into ReuseScope.
Route LoRA task ID and cache salt through ReuseScope for KVCacheManagerV2.
Keep salting coverage focused on ReuseScope serialization, radix-tree scoping, and scoped reuse behavior.

Tests

git diff --check
python -m py_compile on touched Python files
PYTHONPATH=/home/yaoy/tekit/tensorrt_llm/runtime/ python tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_salting.py -v
pre-commit hooks on commit

Note: targeted CUDA E2E test was attempted locally but blocked by CUDA initialization failure before test body.

Summary by CodeRabbit

New Features
- Introduced ReuseScope, a unified parameter component for managing KV cache reuse configuration including LoRA and cache isolation settings.
API Changes
- KVCacheManager.create_kv_cache() now accepts a reuse_scope parameter instead of separate lora_task_id and cache_salt_id parameters, providing a more consolidated interface for cache reuse control.

coderabbitai · 2026-05-14T10:16:26Z

📝 Walkthrough

Walkthrough

This PR refactors KV cache prefix-reuse selection from a hashed tree_task_id derived from (lora_task_id, cache_salt_id) to a new ReuseScope abstraction. The change updates the radix-tree keying, core cache lifecycle, manager API, and test suite while maintaining the same reuse-isolation semantics.

Changes

ReuseScope Abstraction and Integration

Layer / File(s)	Summary
ReuseScope data contract and public exports `tensorrt_llm/runtime/kv_cache_manager_v2/_common.py`, `tensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyi`, `tensorrt_llm/runtime/kv_cache_manager_v2/__init__.py`	Defines `ReuseScope` as a frozen dataclass with optional `lora_id` and `salt` fields and a `to_bytes()` serialization method; adds type stubs and exports it as part of the public API.
Radix-tree refactoring for ReuseScope `tensorrt_llm/runtime/kv_cache_manager_v2/_block_radix_tree.py`	Updates radix-tree root keying and blockchain-key derivation to seed from `ReuseScope.to_bytes()` instead of `tree_task_id`; removes `TreeTaskId` type alias; updates all tree method signatures (`add_or_get_existing`, `match`, `sequence_to_blockchain_keys`) and `RootBlock` to accept and use `ReuseScope`.
Core KV cache lifecycle update `tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.py`	Removes `_make_tree_task_id()` helper and internal state tracking for `lora_task_id`/`cache_salt_id`/`_tree_task_id`; refactors `_KVCache` to store and pass `_reuse_scope` directly in commit and reuse-matching operations.
Manager public API refactoring `tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache_manager.py`	Updates `KVCacheManager.create_kv_cache()` signature to accept `reuse_scope: ReuseScope \| None` (defaulting to empty `ReuseScope()`) and passes it to `_KVCache` constructor instead of separate `lora_task_id` and `cache_salt_id` parameters.
Torch executor integration `tensorrt_llm/_torch/pyexecutor/resource_manager.py`	Updates `KVCacheManagerV2._create_kv_cache` to construct `ReuseScope(lora_id=..., salt=...)` and pass it to the underlying `impl.create_kv_cache` call.
Test suite updates and validation `tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py`, `tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_salting.py`, `tests/integration/test_lists/waives.txt`	Updates test callers to construct `ReuseScope(lora_id=...)` instead of passing `lora_task_id` directly; adds `test_reuse_scope_isolates_reuse` to verify reuse isolation across scopes; replaces comprehensive salt test suite with focused unit tests for `ReuseScope` primitives and radix-tree scoping; removes waiver for a test class that was deleted.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#13793: Prior PR introducing KV cache v2 salting/reuse isolation via cache_salt_id and tree_task_id, which this PR refactors into the ReuseScope abstraction.
NVIDIA/TensorRT-LLM#14094: Adds the same test waiver (TestKVCacheSaltCounters::test_aggregate_counters_match_expected) that this PR removes as part of the test suite rewrite.

Suggested reviewers

joyang-nv
yizhang-nv
QiJune
Shixiaowei02
yihwang-nv

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.89% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main refactoring work: introducing ReuseScope and refactoring salting support for KVCacheManagerV2.
Description check	✅ Passed	The PR description covers the summary of changes, test coverage, and testing methodology, but lacks detail on the motivation/rationale for the refactoring.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py (1)
497-535: QA list updates are unnecessary for this PR.

This change is runtime wiring + unit-test scope, so no additions to tests/integration/test_lists/qa/ are needed.

As per coding guidelines: "If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py` around
lines 497 - 535, Single-line QA note missing: add an explicit statement that QA
list updates are unnecessary because this PR only touches unit tests and runtime
wiring; update the PR description (or the changelog/PR body) to include a brief
sentence such as "QA list updates are unnecessary for this PR — changes are
limited to unit tests (tests/unittest/...) and runtime wiring" and reference the
affected test (test_reuse_scope_isolates_reuse) and symbols (ReuseScope,
commit_for, num_reused) so reviewers understand scope.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py`:
- Around line 497-535: Single-line QA note missing: add an explicit statement
that QA list updates are unnecessary because this PR only touches unit tests and
runtime wiring; update the PR description (or the changelog/PR body) to include
a brief sentence such as "QA list updates are unnecessary for this PR — changes
are limited to unit tests (tests/unittest/...) and runtime wiring" and reference
the affected test (test_reuse_scope_isolates_reuse) and symbols (ReuseScope,
commit_for, num_reused) so reviewers understand scope.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 42b470bb-5c1e-4800-9279-8a2352bf015a

📥 Commits

Reviewing files that changed from the base of the PR and between f3fe930 and dc66a3d.

📒 Files selected for processing (10)

tensorrt_llm/_torch/pyexecutor/resource_manager.py
tensorrt_llm/runtime/kv_cache_manager_v2/__init__.py
tensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyi
tensorrt_llm/runtime/kv_cache_manager_v2/_block_radix_tree.py
tensorrt_llm/runtime/kv_cache_manager_v2/_common.py
tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.py
tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache_manager.py
tests/integration/test_lists/waives.txt
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_manager_v2.py
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_salting.py

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

eopXD

Looks good. Thank you for the refactoring.

Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

lowsfer · 2026-05-15T10:25:11Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-15T10:31:46Z

PR_Github #48585 [ run ] triggered by Bot. Commit: f1c3470 Link to invocation

tensorrt-cicd · 2026-05-15T12:57:51Z

PR_Github #48585 [ run ] completed with state SUCCESS. Commit: f1c3470
/LLM/main/L0_MergeRequest_PR pipeline #38370 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lowsfer · 2026-05-15T13:41:43Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-15T13:46:44Z

PR_Github #48593 [ run ] triggered by Bot. Commit: f1c3470 Link to invocation

tensorrt-cicd · 2026-05-15T21:21:06Z

PR_Github #48593 [ run ] completed with state FAILURE. Commit: f1c3470
/LLM/main/L0_MergeRequest_PR pipeline #38376 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lowsfer · 2026-05-17T06:12:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-17T06:17:30Z

PR_Github #48743 [ run ] triggered by Bot. Commit: f1c3470 Link to invocation

tensorrt-cicd · 2026-05-17T07:28:30Z

PR_Github #48743 [ run ] completed with state SUCCESS. Commit: f1c3470
/LLM/main/L0_MergeRequest_PR pipeline #38510 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lowsfer · 2026-05-17T13:47:12Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-17T13:53:10Z

PR_Github #48756 [ run ] triggered by Bot. Commit: f1c3470 Link to invocation

tensorrt-cicd · 2026-05-17T14:03:27Z

PR_Github #48756 [ run ] completed with state FAILURE. Commit: f1c3470
/LLM/main/L0_MergeRequest_PR pipeline #38523 completed with status: 'ABORTED'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

lowsfer · 2026-05-18T05:30:06Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-18T05:36:10Z

PR_Github #48840 [ run ] triggered by Bot. Commit: f1c3470 Link to invocation

tensorrt-cicd · 2026-05-18T05:58:58Z

PR_Github #48840 [ run ] completed with state SUCCESS. Commit: f1c3470
/LLM/main/L0_MergeRequest_PR pipeline #38597 completed with status: 'SUCCESS'

CI Report

Link to invocation

…4140) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

After NVIDIA#14140 (cherry-picked in NVIDIA#14353) RootBlock no longer exposes ``lora_task_id`` / ``cache_salt_id`` directly — both fields are folded into a ``ReuseScope`` NamedTuple attached as ``root.reuse_scope``. ``KVCacheEventManager._root_attrs_from_root_block`` was still reading the legacy attribute names via ``getattr(..., None)``, so every emitted event silently received ``(None, None)`` and the V1-compat hash collapsed to the same value for any LoRA/salt request. Downstream Dynamo routing depends on those hashes, so this regression would materially degrade prefix cache hit rate for any request carrying a LoRA task id or a cache salt. Fix ``_root_attrs_from_root_block`` to prefer ``root.reuse_scope`` and fall back to the legacy attributes (keeps any in-flight pre-refactor RootBlock instances working). Also: - Update the now-stale ``test_v2_root_key_distinguishes_lora_from_cache_salt_id`` to use the new single-arg ``RootBlock.make_key(ReuseScope(...))`` API. - Extend ``_FakeRootBlock`` with an optional ``reuse_scope`` kwarg so tests can mimic both the pre- and post-refactor RootBlock shape. - Add a regression test ``test_v2_kv_cache_event_manager_v1_hash_reads_root_reuse_scope`` that asserts (a) a ReuseScope-shaped root and a legacy root with the same lora/salt produce identical V1 event hashes, (b) different scopes still produce different hashes (no silent collapse), and (c) an empty ReuseScope matches a legacy root with no salt/lora. Addresses reviewer feedback on NVIDIA#14351 (peihu-nv). Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>

…4140) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com> (cherry picked from commit 61a64e6) Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>

After NVIDIA#14140 (cherry-picked in NVIDIA#14353) RootBlock no longer exposes ``lora_task_id`` / ``cache_salt_id`` directly — both fields are folded into a ``ReuseScope`` NamedTuple attached as ``root.reuse_scope``. ``KVCacheEventManager._root_attrs_from_root_block`` was still reading the legacy attribute names via ``getattr(..., None)``, so every emitted event silently received ``(None, None)`` and the V1-compat hash collapsed to the same value for any LoRA/salt request. Downstream Dynamo routing depends on those hashes, so this regression would materially degrade prefix cache hit rate for any request carrying a LoRA task id or a cache salt. Fix ``_root_attrs_from_root_block`` to prefer ``root.reuse_scope`` and fall back to the legacy attributes (keeps any in-flight pre-refactor RootBlock instances working). Also: - Update the now-stale ``test_v2_root_key_distinguishes_lora_from_cache_salt_id`` to use the new single-arg ``RootBlock.make_key(ReuseScope(...))`` API. - Extend ``_FakeRootBlock`` with an optional ``reuse_scope`` kwarg so tests can mimic both the pre- and post-refactor RootBlock shape. - Add a regression test ``test_v2_kv_cache_event_manager_v1_hash_reads_root_reuse_scope`` that asserts (a) a ReuseScope-shaped root and a legacy root with the same lora/salt produce identical V1 event hashes, (b) different scopes still produce different hashes (no silent collapse), and (c) an empty ReuseScope matches a legacy root with no salt/lora. Addresses reviewer feedback on NVIDIA#14351 (peihu-nv). Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>

…4140) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

After NVIDIA#14140 (cherry-picked in NVIDIA#14353) RootBlock no longer exposes ``lora_task_id`` / ``cache_salt_id`` directly — both fields are folded into a ``ReuseScope`` NamedTuple attached as ``root.reuse_scope``. ``KVCacheEventManager._root_attrs_from_root_block`` was still reading the legacy attribute names via ``getattr(..., None)``, so every emitted event silently received ``(None, None)`` and the V1-compat hash collapsed to the same value for any LoRA/salt request. Downstream Dynamo routing depends on those hashes, so this regression would materially degrade prefix cache hit rate for any request carrying a LoRA task id or a cache salt. Fix ``_root_attrs_from_root_block`` to prefer ``root.reuse_scope`` and fall back to the legacy attributes (keeps any in-flight pre-refactor RootBlock instances working). Also: - Update the now-stale ``test_v2_root_key_distinguishes_lora_from_cache_salt_id`` to use the new single-arg ``RootBlock.make_key(ReuseScope(...))`` API. - Extend ``_FakeRootBlock`` with an optional ``reuse_scope`` kwarg so tests can mimic both the pre- and post-refactor RootBlock shape. - Add a regression test ``test_v2_kv_cache_event_manager_v1_hash_reads_root_reuse_scope`` that asserts (a) a ReuseScope-shaped root and a legacy root with the same lora/salt produce identical V1 event hashes, (b) different scopes still produce different hashes (no silent collapse), and (c) an empty ReuseScope matches a legacy root with no salt/lora. Addresses reviewer feedback on NVIDIA#14351 (peihu-nv). Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>

…4140) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

After NVIDIA#14140 (cherry-picked in NVIDIA#14353) RootBlock no longer exposes ``lora_task_id`` / ``cache_salt_id`` directly — both fields are folded into a ``ReuseScope`` NamedTuple attached as ``root.reuse_scope``. ``KVCacheEventManager._root_attrs_from_root_block`` was still reading the legacy attribute names via ``getattr(..., None)``, so every emitted event silently received ``(None, None)`` and the V1-compat hash collapsed to the same value for any LoRA/salt request. Downstream Dynamo routing depends on those hashes, so this regression would materially degrade prefix cache hit rate for any request carrying a LoRA task id or a cache salt. Fix ``_root_attrs_from_root_block`` to prefer ``root.reuse_scope`` and fall back to the legacy attributes (keeps any in-flight pre-refactor RootBlock instances working). Also: - Update the now-stale ``test_v2_root_key_distinguishes_lora_from_cache_salt_id`` to use the new single-arg ``RootBlock.make_key(ReuseScope(...))`` API. - Extend ``_FakeRootBlock`` with an optional ``reuse_scope`` kwarg so tests can mimic both the pre- and post-refactor RootBlock shape. - Add a regression test ``test_v2_kv_cache_event_manager_v1_hash_reads_root_reuse_scope`` that asserts (a) a ReuseScope-shaped root and a legacy root with the same lora/salt produce identical V1 event hashes, (b) different scopes still produce different hashes (no silent collapse), and (c) an empty ReuseScope matches a legacy root with no salt/lora. Addresses reviewer feedback on NVIDIA#14351 (peihu-nv). Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com>

Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com> Signed-off-by: lancelly <108499334+lancelly@users.noreply.github.com> Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>

lowsfer requested a review from a team as a code owner May 14, 2026 10:12

github-actions Bot assigned lowsfer May 14, 2026

lowsfer requested a review from eopXD May 14, 2026 10:13

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

lowsfer force-pushed the refactor-salting branch from 0399734 to 69cd970 Compare May 14, 2026 12:21

eopXD approved these changes May 15, 2026

View reviewed changes

eopXD reviewed May 15, 2026

View reviewed changes

Comment thread tensorrt_llm/runtime/kv_cache_manager_v2/_common.py Outdated

[None][chore] Refactor salting support for KVCacheManagerV2

f1c3470

Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

lowsfer force-pushed the refactor-salting branch from 69cd970 to f1c3470 Compare May 15, 2026 10:20

lowsfer enabled auto-merge (squash) May 15, 2026 10:35

lowsfer merged commit 61a64e6 into NVIDIA:main May 18, 2026
8 checks passed

lowsfer deleted the refactor-salting branch May 18, 2026 15:10

tburt-nv mentioned this pull request May 18, 2026

[#13076][fix] Destroy torch distributed process groups on PyExecutor shutdown #12993

Merged

1 task

KleinBlueC pushed a commit to KleinBlueC/TensorRT-LLM that referenced this pull request May 19, 2026

[None][chore] Refactor salting support for KVCacheManagerV2 (NVIDIA#1…

19125fe

…4140) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

tburt-nv mentioned this pull request May 19, 2026

[None][infra] Waive 1 failed cases for main in pre-merge 38804 #14317

Closed

coderabbitai Bot mentioned this pull request May 20, 2026

[None][feat] add KV cache reuse probe #14333

Merged

This was referenced May 20, 2026

[None][chore] Replace cache_salt_id impl with main's TreeTaskId (cp #13793) #14351

Merged

[None][chore] ReuseScope refactor (cp #14140) #14353

Merged

lancelly pushed a commit to lancelly/TensorRT-LLM that referenced this pull request May 21, 2026

[None][chore] Refactor salting support for KVCacheManagerV2 (NVIDIA#1…

729522b

…4140) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

lancelly pushed a commit to lancelly/TensorRT-LLM that referenced this pull request May 21, 2026

[None][chore] Refactor salting support for KVCacheManagerV2 (NVIDIA#1…

346770f

…4140) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

Conversation

lowsfer commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

eopXD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lowsfer commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

lowsfer commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

lowsfer commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

lowsfer commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

lowsfer commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lowsfer commented May 14, 2026 •

edited by coderabbitai Bot

Loading