[None][fix] Improve KV Event Batching#11883
Conversation
📝 WalkthroughWalkthroughIntroduces per-window event batching for KV cache removed events in KVCacheEventManager through a new public Changes
Sequence DiagramsequenceDiagram
participant App as Application Code
participant Mgr as KVCacheEventManager
participant EventQ as Event Queue/Consumer
rect rgba(200, 150, 150, 0.5)
Note over App,EventQ: Per-Window Removed Event Batching Flow (New)
App->>Mgr: enqueueRemovedEvent(block1, window=1)
activate Mgr
Mgr->>Mgr: Check mLatestRemovedEvents[1]
Mgr->>Mgr: Create/append batch in mLatestRemovedEvents[1]
deactivate Mgr
App->>Mgr: enqueueRemovedEvent(block2, window=1)
activate Mgr
Mgr->>Mgr: Append to existing mLatestRemovedEvents[1]
deactivate Mgr
App->>Mgr: enqueueStoredEvent(data, window=1)
activate Mgr
Mgr->>Mgr: Pre-flush: flushRemovedEvents(1)
Mgr->>Mgr: Emit consolidated KVCacheRemovedData
Mgr->>Mgr: Reset mLatestRemovedEvents[1]
Mgr->>Mgr: Enqueue Stored event
deactivate Mgr
App->>Mgr: flush()
activate Mgr
Mgr->>Mgr: Iterate all remaining mLatestRemovedEvents
Mgr->>Mgr: flushRemovedEvents for each window
Mgr->>EventQ: Transfer all events to consumer
deactivate Mgr
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp (1)
127-138: Eliminate stale nullopt entries and use move semantics in flushRemovedEvents.
flushRemovedEventscurrently sets entries tostd::nulloptinstead of erasing them, causingflush()to iterate over accumulated dead entries on every invocation. Additionally, the optional payload is unnecessarily copied before enqueueing. The proposed changes erase flushed entries and move the payload, avoiding both costs. The while-loop change inflush()is also necessary to safely iterate with erasure, preventing iterator invalidation.♻️ Proposed patch
void KVCacheEventManager::flushRemovedEvents(SizeType32 windowSize) { - if (mLatestRemovedEvents.find(windowSize) != mLatestRemovedEvents.end()) + auto it = mLatestRemovedEvents.find(windowSize); + if (it == mLatestRemovedEvents.end()) { - auto latestRemovedEvent = mLatestRemovedEvents[windowSize]; - if (latestRemovedEvent != std::nullopt) - { - enqueueEvent({mEventId++, *latestRemovedEvent, windowSize, mAttentionDpRank}); - } + return; } - mLatestRemovedEvents[windowSize] = std::nullopt; + if (it->second != std::nullopt) + { + enqueueEvent({mEventId++, std::move(*(it->second)), windowSize, mAttentionDpRank}); + } + mLatestRemovedEvents.erase(it); } void KVCacheEventManager::flush() { - for (auto const& [windowSize, latestRemovedEvent] : mLatestRemovedEvents) - { - flushRemovedEvents(windowSize); - } + while (!mLatestRemovedEvents.empty()) + { + flushRemovedEvents(mLatestRemovedEvents.begin()->first); + } auto eventQueue = std::exchange(mEventQueue, {});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp` around lines 127 - 138, flushRemovedEvents is leaving std::nullopt entries in mLatestRemovedEvents and copying the optional payload; change KVCacheEventManager::flushRemovedEvents to erase the map entry (mLatestRemovedEvents.erase(windowSize)) after handling it and pass the payload to enqueueEvent using move semantics (std::move on the optional/value) to avoid copies while still incrementing mEventId; also update KVCacheEventManager::flush() to iterate with a while-loop using iterators so you can safely erase entries during traversal without iterator invalidation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/include/tensorrt_llm/batch_manager/kvCacheEventManager.h`:
- Line 104: The header declares mLatestRemovedEvents as std::unordered_map but
doesn't include <unordered_map>; add a direct include for <unordered_map> at the
top of kvCacheEventManager.h (near other standard includes) so the symbol
std::unordered_map used for mLatestRemovedEvents is defined; no other API
changes needed—just add the include alongside existing <optional> and related
headers referenced by executor::KVCacheRemovedData.
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 6426-6435: The single-line if/else-if bodies in the event scan
loop (checks using event.windowSize against wSWA/wFull and the
std::holds_alternative<tle::KVCacheStoredData> branch) violate the C++ style
rule requiring brace-delimited blocks; update the conditional statements that
set removedSWAPos, removedFullPos, and storedFullPos so each if and the else if
use explicit { } around their statements (including the single assignment
statements) to conform with the repository coding guidelines.
- Around line 6278-6318: The test is flaky because with blocksInPrimaryPool = 8
seq1 often finds unused free blocks so no Removed event occurs; make eviction
deterministic by either lowering blocksInPrimaryPool to a small number (e.g., 2)
or explicitly pre-allocate/consume free blocks before adding seq1 (create dummy
sequences or call addSequence/storeContextBlocks to exhaust free blocks), so
that kvCacheManager.addSequence(1, ...) must evict seq0 and produce a Removed
event before kvCacheManager.storeContextBlocks(*llmRequest1); modify the setup
where blocksInPrimaryPool is defined or insert pre-allocation steps right before
adding seq1 to force eviction.
---
Nitpick comments:
In `@cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp`:
- Around line 127-138: flushRemovedEvents is leaving std::nullopt entries in
mLatestRemovedEvents and copying the optional payload; change
KVCacheEventManager::flushRemovedEvents to erase the map entry
(mLatestRemovedEvents.erase(windowSize)) after handling it and pass the payload
to enqueueEvent using move semantics (std::move on the optional/value) to avoid
copies while still incrementing mEventId; also update
KVCacheEventManager::flush() to iterate with a while-loop using iterators so you
can safely erase entries during traversal without iterator invalidation.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ea73c803-ba47-4dcd-b385-7ce48fef264e
📒 Files selected for processing (3)
cpp/include/tensorrt_llm/batch_manager/kvCacheEventManager.hcpp/tensorrt_llm/batch_manager/kvCacheEventManager.cppcpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp
|
/bot run --disable-fail-fast |
|
/bot kill |
|
PR_Github #37744 [ kill ] triggered by Bot. Commit: |
|
PR_Github #37744 [ kill ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #37753 [ run ] triggered by Bot. Commit: |
|
PR_Github #37753 [ run ] completed with state
|
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
44540d6 to
e7b1930
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #37933 [ run ] triggered by Bot. Commit: |
|
PR_Github #37933 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #38110 [ run ] triggered by Bot. Commit: |
|
PR_Github #38110 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #38309 [ run ] triggered by Bot. Commit: |
|
PR_Github #38309 [ run ] completed with state |
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Improves the KV cache event handling logic to increase batching of remove block events. This MR more effectively batches remove events for models with multiple window sizes, as well as across block updated events.
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.