[#11879][fix] Clamp usedNumBlocks to non-negative in KV cache stats#11922
Conversation
…tats In disaggregated serving (prefill mode), getNumFreeBlocks() can momentarily exceed getMaxNumBlocks(), causing getNumAllocatedBlocks() to return negative values. This propagates through KV cache stats as negative usedNumBlocks, which is invalid for metrics consumers. Clamp the result of getNumAllocatedBlocks() to a minimum of zero using std::max to prevent negative block counts from being reported. Fixes NVIDIA#11879 Signed-off-by: Wojciech Wais <wojciech.wais@gmail.com>
📝 WalkthroughWalkthroughA defensive programming fix to the KV cache manager that prevents negative block count calculations by clamping the result to a minimum value of 0 using Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
|
/bot run --disable-fail-fast |
|
PR_Github #37825 [ run ] triggered by Bot. Commit: |
eopXD
left a comment
There was a problem hiding this comment.
Sorry, revoking the approval. The fix should root to why getNumFreeBlocks is greater than getMaxNumBlocks which causes the negative number. This is not a good fix to the problem.
… root cause Add a warning log in WindowBlockManager::getNumFreeBlocks() when the primary free block count exceeds the total block count. This diagnostic helps identify the root cause of negative usedNumBlocks values in disaggregated/prefill mode. Root cause analysis: after swapMemoryPoolBlockOffset() in offload/onboard paths, getCacheLevel() returns the block's new physical location rather than its original queue membership. This causes mNumFreeBlocksPerLevel counters in the eviction policy to become desynchronized blocks claimed from one level get released to the other level, inflating the primary free counter beyond the actual primary pool size. The existing verifyQueueIntegrity() check detects this exact condition. The clamp in getNumAllocatedBlocks() (prior commit) prevents negative values from propagating to stats consumers while the accounting inconsistency is investigated further. Refs NVIDIA#11879 Signed-off-by: Wojciech Wais <wojciech.wais@gmail.com>
|
PR_Github #37825 [ run ] completed with state |
…nt exceeding max Replace TLLM_LOG_WARNING with TLLM_CHECK_WITH_INFO assertion when numFreeBlocks exceeds getMaxNumBlocks() to catch block accounting inconsistencies in CI. Remove the now-unnecessary std::max clamp in getNumAllocatedBlocks() since the assertion ensures the invariant. Signed-off-by: Wojtek Marczenko <wojciech.marczenko@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Wojciech Wais <wojciech.wais@gmail.com>
9f136d4 to
d2bf822
Compare
|
@wojciech-wais, |
…e-stats Signed-off-by: Wojciech Wais <12673503+wojciech-wais@users.noreply.github.com>
|
@karljang |
|
/bot run |
|
PR_Github #40459 [ run ] triggered by Bot. Commit: |
|
PR_Github #40459 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #40487 [ run ] triggered by Bot. Commit: |
|
PR_Github #40487 [ run ] completed with state
|
I've reviewed those tests failures and IMO they are not strictly related to my change. |
|
/bot run --disable-fail-fast |
|
PR_Github #41731 [ run ] triggered by Bot. Commit: |
|
PR_Github #41731 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41801 [ run ] triggered by Bot. Commit: |
|
PR_Github #41801 [ run ] completed with state |
|
@eopXD |
|
@eopXD could you review this? thanks |
…tats (NVIDIA#11922) Signed-off-by: Wojciech Wais <wojciech.wais@gmail.com> Signed-off-by: Wojtek Marczenko <wojciech.marczenko@gmail.com> Signed-off-by: Wojciech Wais <12673503+wojciech-wais@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com> Signed-off-by: Doloxetine <youliyuan92@gmail.com>
In disaggregated serving (prefill mode), getNumFreeBlocks() can momentarily exceed getMaxNumBlocks(), causing getNumAllocatedBlocks() to return negative values. This propagates through KV cache stats as negative usedNumBlocks, which is invalid for metrics consumers.
Clamp the result of getNumAllocatedBlocks() to a minimum of zero using std::max to prevent negative block counts from being reported.
Fixes #11879
Summary by CodeRabbit
Bug Fixes
Chores
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.