[None][fix] Improve KV Event Batching by jthomson04 · Pull Request #11883 · NVIDIA/TensorRT-LLM

jthomson04 · 2026-03-04T02:41:40Z

Summary by CodeRabbit

New Features
- Enhanced KV cache event handling with improved removal event batching and consolidation.
Bug Fixes
- Fixed event ordering to ensure removal events are correctly processed before storage events.
- Improved per-window isolation to prevent cross-window event interference.
Tests
- Added unit tests validating removal event batching and windowed isolation behavior.

Improves the KV cache event handling logic to increase batching of remove block events. This MR more effectively batches remove events for models with multiple window sizes, as well as across block updated events.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-03-04T02:51:10Z

📝 Walkthrough

Walkthrough

Introduces per-window event batching for KV cache removed events in KVCacheEventManager through a new public flushRemovedEvents() method and internal mLatestRemovedEvents map, consolidating multiple removals within each window into single batched events.

Changes

Cohort / File(s)	Summary
Header Definition `cpp/include/tensorrt_llm/batch_manager/kvCacheEventManager.h`	Added public method `flushRemovedEvents(SizeType32 windowSize)` and private member `mLatestRemovedEvents` map to track latest removed events per window.
Implementation Logic `cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp`	Replaced ad-hoc in-place batching with per-window batching in `enqueueRemovedEvent()`. Added pre-flush step in `enqueueStoredEvent()`, implemented `flushRemovedEvents()` helper, and extended `flush()` to drain all pending per-window batches.
Unit Tests `cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`	Added three tests validating event batching consolidation within windows, correct event ordering (Removed before Stored), and per-window isolation guarantees.

Sequence Diagram

sequenceDiagram
    participant App as Application Code
    participant Mgr as KVCacheEventManager
    participant EventQ as Event Queue/Consumer

    rect rgba(200, 150, 150, 0.5)
    Note over App,EventQ: Per-Window Removed Event Batching Flow (New)
    
    App->>Mgr: enqueueRemovedEvent(block1, window=1)
    activate Mgr
    Mgr->>Mgr: Check mLatestRemovedEvents[1]
    Mgr->>Mgr: Create/append batch in mLatestRemovedEvents[1]
    deactivate Mgr
    
    App->>Mgr: enqueueRemovedEvent(block2, window=1)
    activate Mgr
    Mgr->>Mgr: Append to existing mLatestRemovedEvents[1]
    deactivate Mgr
    
    App->>Mgr: enqueueStoredEvent(data, window=1)
    activate Mgr
    Mgr->>Mgr: Pre-flush: flushRemovedEvents(1)
    Mgr->>Mgr: Emit consolidated KVCacheRemovedData
    Mgr->>Mgr: Reset mLatestRemovedEvents[1]
    Mgr->>Mgr: Enqueue Stored event
    deactivate Mgr
    
    App->>Mgr: flush()
    activate Mgr
    Mgr->>Mgr: Iterate all remaining mLatestRemovedEvents
    Mgr->>Mgr: flushRemovedEvents for each window
    Mgr->>EventQ: Transfer all events to consumer
    deactivate Mgr
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	PR description is largely incomplete with missing critical sections: no specific Description of the issue/solution provided, no Test Coverage details listed, and most PR Checklist items unchecked.	Complete the Description section explaining the issue and solution, detail Test Coverage with relevant test names, and mark or verify all applicable PR Checklist items before merging.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[None][fix] Improve KV Event Batching' directly and clearly describes the main change of improving KV cache event batching logic.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp (1)

127-138: Eliminate stale nullopt entries and use move semantics in flushRemovedEvents.

flushRemovedEvents currently sets entries to std::nullopt instead of erasing them, causing flush() to iterate over accumulated dead entries on every invocation. Additionally, the optional payload is unnecessarily copied before enqueueing. The proposed changes erase flushed entries and move the payload, avoiding both costs. The while-loop change in flush() is also necessary to safely iterate with erasure, preventing iterator invalidation.

♻️ Proposed patch

 void KVCacheEventManager::flushRemovedEvents(SizeType32 windowSize)
 {
-    if (mLatestRemovedEvents.find(windowSize) != mLatestRemovedEvents.end())
+    auto it = mLatestRemovedEvents.find(windowSize);
+    if (it == mLatestRemovedEvents.end())
     {
-        auto latestRemovedEvent = mLatestRemovedEvents[windowSize];
-        if (latestRemovedEvent != std::nullopt)
-        {
-            enqueueEvent({mEventId++, *latestRemovedEvent, windowSize, mAttentionDpRank});
-        }
+        return;
     }
-    mLatestRemovedEvents[windowSize] = std::nullopt;
+    if (it->second != std::nullopt)
+    {
+        enqueueEvent({mEventId++, std::move(*(it->second)), windowSize, mAttentionDpRank});
+    }
+    mLatestRemovedEvents.erase(it);
 }
 
 void KVCacheEventManager::flush()
 {
-    for (auto const& [windowSize, latestRemovedEvent] : mLatestRemovedEvents)
-    {
-        flushRemovedEvents(windowSize);
-    }
+    while (!mLatestRemovedEvents.empty())
+    {
+        flushRemovedEvents(mLatestRemovedEvents.begin()->first);
+    }
     auto eventQueue = std::exchange(mEventQueue, {});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp` around lines 127 -
138, flushRemovedEvents is leaving std::nullopt entries in mLatestRemovedEvents
and copying the optional payload; change KVCacheEventManager::flushRemovedEvents
to erase the map entry (mLatestRemovedEvents.erase(windowSize)) after handling
it and pass the payload to enqueueEvent using move semantics (std::move on the
optional/value) to avoid copies while still incrementing mEventId; also update
KVCacheEventManager::flush() to iterate with a while-loop using iterators so you
can safely erase entries during traversal without iterator invalidation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/tensorrt_llm/batch_manager/kvCacheEventManager.h`:
- Line 104: The header declares mLatestRemovedEvents as std::unordered_map but
doesn't include <unordered_map>; add a direct include for <unordered_map> at the
top of kvCacheEventManager.h (near other standard includes) so the symbol
std::unordered_map used for mLatestRemovedEvents is defined; no other API
changes needed—just add the include alongside existing <optional> and related
headers referenced by executor::KVCacheRemovedData.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 6426-6435: The single-line if/else-if bodies in the event scan
loop (checks using event.windowSize against wSWA/wFull and the
std::holds_alternative<tle::KVCacheStoredData> branch) violate the C++ style
rule requiring brace-delimited blocks; update the conditional statements that
set removedSWAPos, removedFullPos, and storedFullPos so each if and the else if
use explicit { } around their statements (including the single assignment
statements) to conform with the repository coding guidelines.
- Around line 6278-6318: The test is flaky because with blocksInPrimaryPool = 8
seq1 often finds unused free blocks so no Removed event occurs; make eviction
deterministic by either lowering blocksInPrimaryPool to a small number (e.g., 2)
or explicitly pre-allocate/consume free blocks before adding seq1 (create dummy
sequences or call addSequence/storeContextBlocks to exhaust free blocks), so
that kvCacheManager.addSequence(1, ...) must evict seq0 and produce a Removed
event before kvCacheManager.storeContextBlocks(*llmRequest1); modify the setup
where blocksInPrimaryPool is defined or insert pre-allocation steps right before
adding seq1 to force eviction.

---

Nitpick comments:
In `@cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp`:
- Around line 127-138: flushRemovedEvents is leaving std::nullopt entries in
mLatestRemovedEvents and copying the optional payload; change
KVCacheEventManager::flushRemovedEvents to erase the map entry
(mLatestRemovedEvents.erase(windowSize)) after handling it and pass the payload
to enqueueEvent using move semantics (std::move on the optional/value) to avoid
copies while still incrementing mEventId; also update
KVCacheEventManager::flush() to iterate with a while-loop using iterators so you
can safely erase entries during traversal without iterator invalidation.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ea73c803-ba47-4dcd-b385-7ce48fef264e

📥 Commits

Reviewing files that changed from the base of the PR and between 0a5a5e7 and 44540d6.

📒 Files selected for processing (3)

cpp/include/tensorrt_llm/batch_manager/kvCacheEventManager.h
cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp

cpp/include/tensorrt_llm/batch_manager/kvCacheEventManager.h

cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp

jthomson04 · 2026-03-04T05:00:53Z

/bot run --disable-fail-fast

jthomson04 · 2026-03-04T18:04:53Z

/bot kill

tensorrt-cicd · 2026-03-04T18:12:10Z

PR_Github #37744 [ kill ] triggered by Bot. Commit: 44540d6 Link to invocation

tensorrt-cicd · 2026-03-04T18:12:12Z

PR_Github #37744 [ kill ] completed with state SUCCESS. Commit: 44540d6
Successfully killed previous jobs for commit 44540d6

Link to invocation

jthomson04 · 2026-03-04T19:37:40Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-04T19:47:34Z

PR_Github #37753 [ run ] triggered by Bot. Commit: 44540d6 Link to invocation

tensorrt-cicd · 2026-03-05T01:30:02Z

PR_Github #37753 [ run ] completed with state SUCCESS. Commit: 44540d6
/LLM/main/L0_MergeRequest_PR pipeline #29223 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

jthomson04 · 2026-03-06T01:10:32Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-06T01:17:46Z

PR_Github #37933 [ run ] triggered by Bot. Commit: e7b1930 Link to invocation

tensorrt-cicd · 2026-03-06T06:00:39Z

PR_Github #37933 [ run ] completed with state SUCCESS. Commit: e7b1930
/LLM/main/L0_MergeRequest_PR pipeline #29377 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jthomson04 · 2026-03-07T20:14:22Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-07T20:22:39Z

PR_Github #38110 [ run ] triggered by Bot. Commit: e7b1930 Link to invocation

tensorrt-cicd · 2026-03-08T00:06:44Z

PR_Github #38110 [ run ] completed with state SUCCESS. Commit: e7b1930
/LLM/main/L0_MergeRequest_PR pipeline #29521 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

jthomson04 · 2026-03-09T17:45:26Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-09T17:52:24Z

PR_Github #38309 [ run ] triggered by Bot. Commit: e7b1930 Link to invocation

tensorrt-cicd · 2026-03-09T20:45:07Z

PR_Github #38309 [ run ] completed with state SUCCESS. Commit: e7b1930
/LLM/main/L0_MergeRequest_PR pipeline #29684 completed with status: 'SUCCESS'

Link to invocation

thorjohnsen

LGTM

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

jthomson04 requested a review from a team as a code owner March 4, 2026 02:41

github-actions bot assigned jthomson04 Mar 4, 2026

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

cpp/include/tensorrt_llm/batch_manager/kvCacheEventManager.h Show resolved Hide resolved

cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp Show resolved Hide resolved

cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp Show resolved Hide resolved

jthomson04 added 3 commits March 5, 2026 17:09

improve remove event batching

9ac8a22

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

more fixes

11c1b16

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

placate the rabbit

e7b1930

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

jthomson04 force-pushed the jthomson04/improve-kv-remove-event-batching branch from 44540d6 to e7b1930 Compare March 6, 2026 01:09

jthomson04 requested a review from pcastonguay March 10, 2026 16:44

pcastonguay approved these changes Mar 10, 2026

View reviewed changes

thorjohnsen approved these changes Mar 10, 2026

View reviewed changes

jthomson04 merged commit 2afe11d into NVIDIA:main Mar 11, 2026
5 checks passed

bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Mar 12, 2026

[None][fix] Improve KV Event Batching (NVIDIA#11883)

eb086f6

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>

Conversation

jthomson04 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jthomson04 commented Mar 4, 2026

Uh oh!

jthomson04 commented Mar 4, 2026

Uh oh!

tensorrt-cicd commented Mar 4, 2026

Uh oh!

tensorrt-cicd commented Mar 4, 2026

Uh oh!

jthomson04 commented Mar 4, 2026

Uh oh!

tensorrt-cicd commented Mar 4, 2026

Uh oh!

tensorrt-cicd commented Mar 5, 2026

Uh oh!

jthomson04 commented Mar 6, 2026

Uh oh!

tensorrt-cicd commented Mar 6, 2026

Uh oh!

tensorrt-cicd commented Mar 6, 2026

Uh oh!

jthomson04 commented Mar 7, 2026

Uh oh!

tensorrt-cicd commented Mar 7, 2026

Uh oh!

tensorrt-cicd commented Mar 8, 2026

Uh oh!

jthomson04 commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

thorjohnsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jthomson04 commented Mar 4, 2026 •

edited

Loading

coderabbitai bot commented Mar 4, 2026 •

edited

Loading