[TRTLLM-12111][feat] Add V2 KV cache event support#13589
Merged
yizhang-nv merged 1 commit intoApr 30, 2026
Merged
Conversation
aa7c141 to
26152ba
Compare
Contributor
|
Caution Review failedFailed to post review comments 📝 WalkthroughWalkthroughThis change introduces KV cache event tracking and multi-algorithm hash support to KVCacheManagerV2. Documentation describes integration with scheduling and connector callbacks, event structures, and operational constraints. Python implementation adds optional event manager support throughout the resource and cache hierarchy, exposes event APIs, introduces v2 SHA256 hashing, and enables router-based prefix caching across hash algorithms. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
26152ba to
6eadfc1
Compare
ab441c5 to
3f46922
Compare
1591d8b to
832f49a
Compare
1 task
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
832f49a to
d1d016a
Compare
lfr-0531
pushed a commit
that referenced
this pull request
May 7, 2026
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
lfr-0531
pushed a commit
that referenced
this pull request
May 14, 2026
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
lfr-0531
pushed a commit
to lfr-0531/TensorRT-LLM
that referenced
this pull request
May 29, 2026
Filtered out disaggregation serving/router test changes for user/fanrongl/dsv4_model. Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds KVCacheManagerV2 event emission with V1-compatible public event shapes and explicit
hash_algo/layer_group_idmetadata. The V2 path now emits created, stored, removed, and cache-level updated events, serializes the event metadata, and supports attention-DP event gather on rank 0.Updates the KV-cache-aware router and disaggregated tests so consumers compute block hashes with the server advertised algorithm: legacy
v1_block_keyfor V1 andv2_sha256for V2. V1 event setup now warns if configured with a non-v1 hash algorithm and continues with the existing V1 behavior.This PR does not add real
cache_saltsupport. Salt-aware event keys and connector behavior remain out of scope here. KV cache connector docs and the V2 implementation notes are intentionally not included in the PR; the notes live under ignoredtmp/for local reference.Current limitations: V2 stored events currently cover text KV block event parity. They keep
mm_keysempty and do not add alora_idJSON field; V1 JSON serialization also does not currently exposelora_id. Multimodalmm_keyspropagation and LoRA routing/event parity are out of scope for this PR.Also keeps KVCacheManagerV2 selected when cache events or cache transceiver are enabled; unsupported connector manager / beam-width combinations still fall back as before.
Test Coverage
git commit -spre-commit hooks passed: isort, yapf, ruff, ruff-format, codespell, merge-conflict checks, and related repo hooks.python3 -m compileallon touched Python files passed.git diff --checkpassed.pytest.Checklist
tmp/.