Skip to content

[None][fix] Improve KV Event Batching#11883

Merged
jthomson04 merged 3 commits intoNVIDIA:mainfrom
jthomson04:jthomson04/improve-kv-remove-event-batching
Mar 11, 2026
Merged

[None][fix] Improve KV Event Batching#11883
jthomson04 merged 3 commits intoNVIDIA:mainfrom
jthomson04:jthomson04/improve-kv-remove-event-batching

Conversation

@jthomson04
Copy link
Collaborator

@jthomson04 jthomson04 commented Mar 4, 2026

Summary by CodeRabbit

  • New Features

    • Enhanced KV cache event handling with improved removal event batching and consolidation.
  • Bug Fixes

    • Fixed event ordering to ensure removal events are correctly processed before storage events.
    • Improved per-window isolation to prevent cross-window event interference.
  • Tests

    • Added unit tests validating removal event batching and windowed isolation behavior.

Improves the KV cache event handling logic to increase batching of remove block events. This MR more effectively batches remove events for models with multiple window sizes, as well as across block updated events.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@jthomson04 jthomson04 requested a review from a team as a code owner March 4, 2026 02:41
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

Introduces per-window event batching for KV cache removed events in KVCacheEventManager through a new public flushRemovedEvents() method and internal mLatestRemovedEvents map, consolidating multiple removals within each window into single batched events.

Changes

Cohort / File(s) Summary
Header Definition
cpp/include/tensorrt_llm/batch_manager/kvCacheEventManager.h
Added public method flushRemovedEvents(SizeType32 windowSize) and private member mLatestRemovedEvents map to track latest removed events per window.
Implementation Logic
cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp
Replaced ad-hoc in-place batching with per-window batching in enqueueRemovedEvent(). Added pre-flush step in enqueueStoredEvent(), implemented flushRemovedEvents() helper, and extended flush() to drain all pending per-window batches.
Unit Tests
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp
Added three tests validating event batching consolidation within windows, correct event ordering (Removed before Stored), and per-window isolation guarantees.

Sequence Diagram

sequenceDiagram
    participant App as Application Code
    participant Mgr as KVCacheEventManager
    participant EventQ as Event Queue/Consumer

    rect rgba(200, 150, 150, 0.5)
    Note over App,EventQ: Per-Window Removed Event Batching Flow (New)
    
    App->>Mgr: enqueueRemovedEvent(block1, window=1)
    activate Mgr
    Mgr->>Mgr: Check mLatestRemovedEvents[1]
    Mgr->>Mgr: Create/append batch in mLatestRemovedEvents[1]
    deactivate Mgr
    
    App->>Mgr: enqueueRemovedEvent(block2, window=1)
    activate Mgr
    Mgr->>Mgr: Append to existing mLatestRemovedEvents[1]
    deactivate Mgr
    
    App->>Mgr: enqueueStoredEvent(data, window=1)
    activate Mgr
    Mgr->>Mgr: Pre-flush: flushRemovedEvents(1)
    Mgr->>Mgr: Emit consolidated KVCacheRemovedData
    Mgr->>Mgr: Reset mLatestRemovedEvents[1]
    Mgr->>Mgr: Enqueue Stored event
    deactivate Mgr
    
    App->>Mgr: flush()
    activate Mgr
    Mgr->>Mgr: Iterate all remaining mLatestRemovedEvents
    Mgr->>Mgr: flushRemovedEvents for each window
    Mgr->>EventQ: Transfer all events to consumer
    deactivate Mgr
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning PR description is largely incomplete with missing critical sections: no specific Description of the issue/solution provided, no Test Coverage details listed, and most PR Checklist items unchecked. Complete the Description section explaining the issue and solution, detail Test Coverage with relevant test names, and mark or verify all applicable PR Checklist items before merging.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][fix] Improve KV Event Batching' directly and clearly describes the main change of improving KV cache event batching logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp (1)

127-138: Eliminate stale nullopt entries and use move semantics in flushRemovedEvents.

flushRemovedEvents currently sets entries to std::nullopt instead of erasing them, causing flush() to iterate over accumulated dead entries on every invocation. Additionally, the optional payload is unnecessarily copied before enqueueing. The proposed changes erase flushed entries and move the payload, avoiding both costs. The while-loop change in flush() is also necessary to safely iterate with erasure, preventing iterator invalidation.

♻️ Proposed patch
 void KVCacheEventManager::flushRemovedEvents(SizeType32 windowSize)
 {
-    if (mLatestRemovedEvents.find(windowSize) != mLatestRemovedEvents.end())
+    auto it = mLatestRemovedEvents.find(windowSize);
+    if (it == mLatestRemovedEvents.end())
     {
-        auto latestRemovedEvent = mLatestRemovedEvents[windowSize];
-        if (latestRemovedEvent != std::nullopt)
-        {
-            enqueueEvent({mEventId++, *latestRemovedEvent, windowSize, mAttentionDpRank});
-        }
+        return;
     }
-    mLatestRemovedEvents[windowSize] = std::nullopt;
+    if (it->second != std::nullopt)
+    {
+        enqueueEvent({mEventId++, std::move(*(it->second)), windowSize, mAttentionDpRank});
+    }
+    mLatestRemovedEvents.erase(it);
 }
 
 void KVCacheEventManager::flush()
 {
-    for (auto const& [windowSize, latestRemovedEvent] : mLatestRemovedEvents)
-    {
-        flushRemovedEvents(windowSize);
-    }
+    while (!mLatestRemovedEvents.empty())
+    {
+        flushRemovedEvents(mLatestRemovedEvents.begin()->first);
+    }
     auto eventQueue = std::exchange(mEventQueue, {});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp` around lines 127 -
138, flushRemovedEvents is leaving std::nullopt entries in mLatestRemovedEvents
and copying the optional payload; change KVCacheEventManager::flushRemovedEvents
to erase the map entry (mLatestRemovedEvents.erase(windowSize)) after handling
it and pass the payload to enqueueEvent using move semantics (std::move on the
optional/value) to avoid copies while still incrementing mEventId; also update
KVCacheEventManager::flush() to iterate with a while-loop using iterators so you
can safely erase entries during traversal without iterator invalidation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/tensorrt_llm/batch_manager/kvCacheEventManager.h`:
- Line 104: The header declares mLatestRemovedEvents as std::unordered_map but
doesn't include <unordered_map>; add a direct include for <unordered_map> at the
top of kvCacheEventManager.h (near other standard includes) so the symbol
std::unordered_map used for mLatestRemovedEvents is defined; no other API
changes needed—just add the include alongside existing <optional> and related
headers referenced by executor::KVCacheRemovedData.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Around line 6426-6435: The single-line if/else-if bodies in the event scan
loop (checks using event.windowSize against wSWA/wFull and the
std::holds_alternative<tle::KVCacheStoredData> branch) violate the C++ style
rule requiring brace-delimited blocks; update the conditional statements that
set removedSWAPos, removedFullPos, and storedFullPos so each if and the else if
use explicit { } around their statements (including the single assignment
statements) to conform with the repository coding guidelines.
- Around line 6278-6318: The test is flaky because with blocksInPrimaryPool = 8
seq1 often finds unused free blocks so no Removed event occurs; make eviction
deterministic by either lowering blocksInPrimaryPool to a small number (e.g., 2)
or explicitly pre-allocate/consume free blocks before adding seq1 (create dummy
sequences or call addSequence/storeContextBlocks to exhaust free blocks), so
that kvCacheManager.addSequence(1, ...) must evict seq0 and produce a Removed
event before kvCacheManager.storeContextBlocks(*llmRequest1); modify the setup
where blocksInPrimaryPool is defined or insert pre-allocation steps right before
adding seq1 to force eviction.

---

Nitpick comments:
In `@cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp`:
- Around line 127-138: flushRemovedEvents is leaving std::nullopt entries in
mLatestRemovedEvents and copying the optional payload; change
KVCacheEventManager::flushRemovedEvents to erase the map entry
(mLatestRemovedEvents.erase(windowSize)) after handling it and pass the payload
to enqueueEvent using move semantics (std::move on the optional/value) to avoid
copies while still incrementing mEventId; also update
KVCacheEventManager::flush() to iterate with a while-loop using iterators so you
can safely erase entries during traversal without iterator invalidation.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ea73c803-ba47-4dcd-b385-7ce48fef264e

📥 Commits

Reviewing files that changed from the base of the PR and between 0a5a5e7 and 44540d6.

📒 Files selected for processing (3)
  • cpp/include/tensorrt_llm/batch_manager/kvCacheEventManager.h
  • cpp/tensorrt_llm/batch_manager/kvCacheEventManager.cpp
  • cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp

@jthomson04
Copy link
Collaborator Author

/bot run --disable-fail-fast

@jthomson04
Copy link
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37744 [ kill ] triggered by Bot. Commit: 44540d6 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37744 [ kill ] completed with state SUCCESS. Commit: 44540d6
Successfully killed previous jobs for commit 44540d6

Link to invocation

@jthomson04
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37753 [ run ] triggered by Bot. Commit: 44540d6 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37753 [ run ] completed with state SUCCESS. Commit: 44540d6
/LLM/main/L0_MergeRequest_PR pipeline #29223 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
@jthomson04 jthomson04 force-pushed the jthomson04/improve-kv-remove-event-batching branch from 44540d6 to e7b1930 Compare March 6, 2026 01:09
@jthomson04
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37933 [ run ] triggered by Bot. Commit: e7b1930 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #37933 [ run ] completed with state SUCCESS. Commit: e7b1930
/LLM/main/L0_MergeRequest_PR pipeline #29377 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@jthomson04
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38110 [ run ] triggered by Bot. Commit: e7b1930 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38110 [ run ] completed with state SUCCESS. Commit: e7b1930
/LLM/main/L0_MergeRequest_PR pipeline #29521 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@jthomson04
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38309 [ run ] triggered by Bot. Commit: e7b1930 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38309 [ run ] completed with state SUCCESS. Commit: e7b1930
/LLM/main/L0_MergeRequest_PR pipeline #29684 completed with status: 'SUCCESS'

Link to invocation

@jthomson04 jthomson04 requested a review from pcastonguay March 10, 2026 16:44
Copy link
Collaborator

@thorjohnsen thorjohnsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jthomson04 jthomson04 merged commit 2afe11d into NVIDIA:main Mar 11, 2026
5 checks passed
bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Mar 12, 2026
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants