Skip to content

[TRTLLM-12111][feat] Add V2 KV cache event support#13589

Merged
yizhang-nv merged 1 commit into
NVIDIA:feat/deepseek_v4from
yizhang-nv:yizhan/v2-kv-cache-events-main
Apr 30, 2026
Merged

[TRTLLM-12111][feat] Add V2 KV cache event support#13589
yizhang-nv merged 1 commit into
NVIDIA:feat/deepseek_v4from
yizhang-nv:yizhan/v2-kv-cache-events-main

Conversation

@yizhang-nv
Copy link
Copy Markdown
Member

@yizhang-nv yizhang-nv commented Apr 29, 2026

Description

Adds KVCacheManagerV2 event emission with V1-compatible public event shapes and explicit hash_algo / layer_group_id metadata. The V2 path now emits created, stored, removed, and cache-level updated events, serializes the event metadata, and supports attention-DP event gather on rank 0.

Updates the KV-cache-aware router and disaggregated tests so consumers compute block hashes with the server advertised algorithm: legacy v1_block_key for V1 and v2_sha256 for V2. V1 event setup now warns if configured with a non-v1 hash algorithm and continues with the existing V1 behavior.

This PR does not add real cache_salt support. Salt-aware event keys and connector behavior remain out of scope here. KV cache connector docs and the V2 implementation notes are intentionally not included in the PR; the notes live under ignored tmp/ for local reference.

Current limitations: V2 stored events currently cover text KV block event parity. They keep mm_keys empty and do not add a lora_id JSON field; V1 JSON serialization also does not currently expose lora_id. Multimodal mm_keys propagation and LoRA routing/event parity are out of scope for this PR.

Also keeps KVCacheManagerV2 selected when cache events or cache transceiver are enabled; unsupported connector manager / beam-width combinations still fall back as before.

Test Coverage

  • git commit -s pre-commit hooks passed: isort, yapf, ruff, ruff-format, codespell, merge-conflict checks, and related repo hooks.
  • python3 -m compileall on touched Python files passed.
  • git diff --check passed.
  • Pytest was not runnable in this local shell: default Python is missing pytest.

Checklist

  • I have read the TensorRT-LLM contribution guidelines.
  • I have added or updated tests for this change.
  • Documentation is out of scope for this PR; local notes are kept under ignored tmp/.

@yizhang-nv yizhang-nv requested review from a team as code owners April 29, 2026 03:01
@yizhang-nv yizhang-nv force-pushed the yizhan/v2-kv-cache-events-main branch from aa7c141 to 26152ba Compare April 29, 2026 03:06
@yizhang-nv yizhang-nv marked this pull request as draft April 29, 2026 03:06
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

This change introduces KV cache event tracking and multi-algorithm hash support to KVCacheManagerV2. Documentation describes integration with scheduling and connector callbacks, event structures, and operational constraints. Python implementation adds optional event manager support throughout the resource and cache hierarchy, exposes event APIs, introduces v2 SHA256 hashing, and enables router-based prefix caching across hash algorithms.

Changes

Cohort / File(s) Summary
Documentation
docs/source/features/kv-cache-connector.md, docs/source/features/kv-cache-event-manager-v2-design.md
Expanded KV cache connector integration details including scheduler metadata broadcast, lifecycle callbacks, compatibility constraints, and clarification of event non-triggering. Added comprehensive V2 event manager design specifying JSON payload structures, V1-shaped event contracts, layer-group emission, and internal buffering semantics.
Event Manager Core
tensorrt_llm/runtime/kv_cache_manager_v2/_event_manager.py
New KVCacheEventManager class that buffers and publishes KV cache events with configurable capacity. Implements event enqueueing (created, stored, removed, updated), stored-block deduplication, lifecycle registry, attention-DP gathering, and monotonic event_id assignment.
KV Cache Manager Integration
tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache_manager.py, tensorrt_llm/runtime/kv_cache_manager_v2/__init__.pyi
Updated KVCacheManager constructor and interface to accept optional event_manager dependency, expose it via new property, and pass it to BlockRadixTree and StorageManager. Replaced assertion-based feasibility checks with safe early returns in clamp_max_seq_len_for_mem.
Block/Radix Tree Event Integration
tensorrt_llm/runtime/kv_cache_manager_v2/_block_radix_tree.py, tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.py
Added event emission integration points for block removal, lifecycle management, and page commitment. BlockRadixTree now stores optional event manager, tracks removal operations, and emits lifecycle-specific events when available.
Storage Layer Event Integration
tensorrt_llm/runtime/kv_cache_manager_v2/_storage_manager.py
Extended StorageManager to accept optional event_manager, emit updated events during cache-level migrations for committed pages, and deduplicate emissions per block key.
Hash Algorithm Support
tensorrt_llm/runtime/kv_cache_hash.py, tensorrt_llm/serve/router.py
New module defining hash algorithm constants (KV_CACHE_HASH_ALGO_V1, KV_CACHE_HASH_ALGO_V2). Router expanded to support multiple hash algorithms with per-algorithm block-hash storage, v2 SHA256 hasher, algorithm extraction from events, and hash-algo-aware server matching.
Event Serialization & Fallback Gating
tensorrt_llm/_utils.py, tensorrt_llm/_torch/pyexecutor/_util.py
Updated KVCacheEventSerializer.to_json_str to serialize hash_algo and layer_group_id fields. Narrowed KVCacheManagerV2 fallback warnings to exclude event_buffer_max_size > 0 as an unsupported condition.
Resource Manager & Warmup
tensorrt_llm/_torch/pyexecutor/resource_manager.py, tensorrt_llm/_torch/pyexecutor/model_engine.py
KVCacheManagerV2 now conditionally instantiates KVCacheEventManager with window sizing and exposes flush_iteration_events() and get_latest_events() methods. CUDA-graph warmup refactored to safely free resources when token_num becomes non-positive.
Router Tests
tests/integration/defs/disaggregated/test_workers.py, tests/unittest/disaggregated/test_router.py
Test infrastructure updated to extract and propagate hash_algo from events, compute block hashes per algorithm, and validate algorithm-specific token matching. Added coverage for V2 hash alignment and end-to-end router selection.
KV Cache Event Manager Tests
tests/unittest/kv_cache_manager_v2_tests/test_kv_cache_event_manager.py
Comprehensive test suite validating event ordering, hash algorithms, windowing, deduplication, coalescing, attention-DP rank logic, lifecycle handling, and CUDA-backed integration with radix-tree structures.
KV Cache Creator & LLM Tests
tests/unittest/_torch/executor/test_kv_cache_creator_v2_selection.py, tests/unittest/llmapi/test_llm_kv_cache_events.py
New unit test for V2 manager selection logic with event buffers. Updated LLM tests to explicitly disable/enable V2 manager and validate v2-specific event hash algorithms and block-hash serialization.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main feature: adding V2 KV cache event support to the codebase, which aligns with the dominant theme across all changed files.
Description check ✅ Passed The PR description covers what (V2 event emission), why (V1-compatible shapes with explicit hash_algo), and test coverage (pre-commit checks, compilation, ruff), addressing the template's Description and Test Coverage sections.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch yizhan/v2-kv-cache-events-main

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@yizhang-nv yizhang-nv force-pushed the yizhan/v2-kv-cache-events-main branch from 26152ba to 6eadfc1 Compare April 29, 2026 05:34
@yizhang-nv yizhang-nv changed the base branch from main to feat/deepseek_v4 April 29, 2026 05:34
@yizhang-nv yizhang-nv force-pushed the yizhan/v2-kv-cache-events-main branch from ab441c5 to 3f46922 Compare April 29, 2026 09:31
@yizhang-nv yizhang-nv marked this pull request as ready for review April 29, 2026 09:54
@yizhang-nv yizhang-nv force-pushed the yizhan/v2-kv-cache-events-main branch 5 times, most recently from 1591d8b to 832f49a Compare April 29, 2026 10:45
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
@yizhang-nv yizhang-nv force-pushed the yizhan/v2-kv-cache-events-main branch from 832f49a to d1d016a Compare April 30, 2026 02:07
@yizhang-nv yizhang-nv changed the title [None][feat] Add V2 KV cache event support [TRTLLM-12111][feat] Add V2 KV cache event support Apr 30, 2026
@yizhang-nv yizhang-nv merged commit d2616cb into NVIDIA:feat/deepseek_v4 Apr 30, 2026
6 checks passed
lfr-0531 pushed a commit that referenced this pull request May 7, 2026
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
lfr-0531 pushed a commit that referenced this pull request May 14, 2026
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
lfr-0531 pushed a commit to lfr-0531/TensorRT-LLM that referenced this pull request May 29, 2026
Filtered out disaggregation serving/router test changes for user/fanrongl/dsv4_model.

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant