[TRTLLM-12111][feat] Add V2 KV cache event support by yizhang-nv · Pull Request #13589 · NVIDIA/TensorRT-LLM

yizhang-nv · 2026-04-29T03:01:49Z

Description

Adds KVCacheManagerV2 event emission with V1-compatible public event shapes and explicit hash_algo / layer_group_id metadata. The V2 path now emits created, stored, removed, and cache-level updated events, serializes the event metadata, and supports attention-DP event gather on rank 0.

Updates the KV-cache-aware router and disaggregated tests so consumers compute block hashes with the server advertised algorithm: legacy v1_block_key for V1 and v2_sha256 for V2. V1 event setup now warns if configured with a non-v1 hash algorithm and continues with the existing V1 behavior.

This PR does not add real cache_salt support. Salt-aware event keys and connector behavior remain out of scope here. KV cache connector docs and the V2 implementation notes are intentionally not included in the PR; the notes live under ignored tmp/ for local reference.

Current limitations: V2 stored events currently cover text KV block event parity. They keep mm_keys empty and do not add a lora_id JSON field; V1 JSON serialization also does not currently expose lora_id. Multimodal mm_keys propagation and LoRA routing/event parity are out of scope for this PR.

Also keeps KVCacheManagerV2 selected when cache events or cache transceiver are enabled; unsupported connector manager / beam-width combinations still fall back as before.

Test Coverage

git commit -s pre-commit hooks passed: isort, yapf, ruff, ruff-format, codespell, merge-conflict checks, and related repo hooks.
python3 -m compileall on touched Python files passed.
git diff --check passed.
Pytest was not runnable in this local shell: default Python is missing pytest.

Checklist

I have read the TensorRT-LLM contribution guidelines.
I have added or updated tests for this change.
Documentation is out of scope for this PR; local notes are kept under ignored tmp/.

coderabbitai · 2026-04-29T03:10:33Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This change introduces KV cache event tracking and multi-algorithm hash support to KVCacheManagerV2. Documentation describes integration with scheduling and connector callbacks, event structures, and operational constraints. Python implementation adds optional event manager support throughout the resource and cache hierarchy, exposes event APIs, introduces v2 SHA256 hashing, and enables router-based prefix caching across hash algorithms.

Changes

Cohort / File(s)	Summary
Documentation `docs/source/features/kv-cache-connector.md`, `docs/source/features/kv-cache-event-manager-v2-design.md`	Expanded KV cache connector integration details including scheduler metadata broadcast, lifecycle callbacks, compatibility constraints, and clarification of event non-triggering. Added comprehensive V2 event manager design specifying JSON payload structures, V1-shaped event contracts, layer-group emission, and internal buffering semantics.
Event Manager Core `tensorrt_llm/runtime/kv_cache_manager_v2/_event_manager.py`	New `KVCacheEventManager` class that buffers and publishes KV cache events with configurable capacity. Implements event enqueueing (created, stored, removed, updated), stored-block deduplication, lifecycle registry, attention-DP gathering, and monotonic `event_id` assignment.
KV Cache Manager Integration `tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache_manager.py`, `tensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyi`	Updated `KVCacheManager` constructor and interface to accept optional `event_manager` dependency, expose it via new property, and pass it to `BlockRadixTree` and `StorageManager`. Replaced assertion-based feasibility checks with safe early returns in `clamp_max_seq_len_for_mem`.
Block/Radix Tree Event Integration `tensorrt_llm/runtime/kv_cache_manager_v2/_block_radix_tree.py`, `tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.py`	Added event emission integration points for block removal, lifecycle management, and page commitment. `BlockRadixTree` now stores optional event manager, tracks removal operations, and emits lifecycle-specific events when available.
Storage Layer Event Integration `tensorrt_llm/runtime/kv_cache_manager_v2/_storage_manager.py`	Extended `StorageManager` to accept optional `event_manager`, emit updated events during cache-level migrations for committed pages, and deduplicate emissions per block key.
Hash Algorithm Support `tensorrt_llm/runtime/kv_cache_hash.py`, `tensorrt_llm/serve/router.py`	New module defining hash algorithm constants (`KV_CACHE_HASH_ALGO_V1`, `KV_CACHE_HASH_ALGO_V2`). Router expanded to support multiple hash algorithms with per-algorithm block-hash storage, v2 SHA256 hasher, algorithm extraction from events, and hash-algo-aware server matching.
Event Serialization & Fallback Gating `tensorrt_llm/_utils.py`, `tensorrt_llm/_torch/pyexecutor/_util.py`	Updated `KVCacheEventSerializer.to_json_str` to serialize `hash_algo` and `layer_group_id` fields. Narrowed `KVCacheManagerV2` fallback warnings to exclude `event_buffer_max_size > 0` as an unsupported condition.
Resource Manager & Warmup `tensorrt_llm/_torch/pyexecutor/resource_manager.py`, `tensorrt_llm/_torch/pyexecutor/model_engine.py`	`KVCacheManagerV2` now conditionally instantiates `KVCacheEventManager` with window sizing and exposes `flush_iteration_events()` and `get_latest_events()` methods. CUDA-graph warmup refactored to safely free resources when `token_num` becomes non-positive.
Router Tests `tests/integration/defs/disaggregated/test_workers.py`, `tests/unittest/disaggregated/test_router.py`	Test infrastructure updated to extract and propagate `hash_algo` from events, compute block hashes per algorithm, and validate algorithm-specific token matching. Added coverage for V2 hash alignment and end-to-end router selection.
KV Cache Event Manager Tests `tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_event_manager.py`	Comprehensive test suite validating event ordering, hash algorithms, windowing, deduplication, coalescing, attention-DP rank logic, lifecycle handling, and CUDA-backed integration with radix-tree structures.
KV Cache Creator & LLM Tests `tests/unittest/_torch/executor/test_kv_cache_creator_v2_selection.py`, `tests/unittest/llmapi/test_llm_kv_cache_events.py`	New unit test for V2 manager selection logic with event buffers. Updated LLM tests to explicitly disable/enable V2 manager and validate v2-specific event hash algorithms and block-hash serialization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.71% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main feature: adding V2 KV cache event support to the codebase, which aligns with the dominant theme across all changed files.
Description check	✅ Passed	The PR description covers what (V2 event emission), why (V1-compatible shapes with explicit hash_algo), and test coverage (pre-commit checks, compilation, ruff), addressing the template's Description and Test Coverage sections.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch yizhan/v2-kv-cache-events-main

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

Filtered out disaggregation serving/router test changes for user/fanrongl/dsv4_model. Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

yizhang-nv requested review from a team as code owners April 29, 2026 03:01

yizhang-nv requested review from arysef, joyang-nv, nv-guomingz and zhenhuaw-me April 29, 2026 03:01

github-actions Bot assigned yizhang-nv Apr 29, 2026

yizhang-nv force-pushed the yizhan/v2-kv-cache-events-main branch from aa7c141 to 26152ba Compare April 29, 2026 03:06

yizhang-nv marked this pull request as draft April 29, 2026 03:06

yizhang-nv force-pushed the yizhan/v2-kv-cache-events-main branch from 26152ba to 6eadfc1 Compare April 29, 2026 05:34

yizhang-nv changed the base branch from main to feat/deepseek_v4 April 29, 2026 05:34

yizhang-nv added the deepseek-v4 label Apr 29, 2026

yizhang-nv force-pushed the yizhan/v2-kv-cache-events-main branch from ab441c5 to 3f46922 Compare April 29, 2026 09:31

yizhang-nv marked this pull request as ready for review April 29, 2026 09:54

yizhang-nv force-pushed the yizhan/v2-kv-cache-events-main branch 5 times, most recently from 1591d8b to 832f49a Compare April 29, 2026 10:45

peihu-nv mentioned this pull request Apr 29, 2026

[TRTLLM-12317][feat] Match V1 KV event hashes for V2 cache events #13624

Merged

1 task

[None][feat] Add V2 KV cache event support

d1d016a

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

yizhang-nv force-pushed the yizhan/v2-kv-cache-events-main branch from 832f49a to d1d016a Compare April 30, 2026 02:07

yizhang-nv changed the title ~~[None][feat] Add V2 KV cache event support~~ [TRTLLM-12111][feat] Add V2 KV cache event support Apr 30, 2026

yizhang-nv merged commit d2616cb into NVIDIA:feat/deepseek_v4 Apr 30, 2026
6 checks passed

lfr-0531 pushed a commit that referenced this pull request May 7, 2026

[TRTLLM-12111][feat] Add V2 KV cache event support (#13589)

59937c6

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

lfr-0531 pushed a commit that referenced this pull request May 14, 2026

[TRTLLM-12111][feat] Add V2 KV cache event support (#13589)

d956bc8

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRTLLM-12111][feat] Add V2 KV cache event support#13589

[TRTLLM-12111][feat] Add V2 KV cache event support#13589
yizhang-nv merged 1 commit into
NVIDIA:feat/deepseek_v4from
yizhang-nv:yizhan/v2-kv-cache-events-main

yizhang-nv commented Apr 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yizhang-nv commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

Checklist

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yizhang-nv commented Apr 29, 2026 •

edited

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading