[TRTLLM-12112][feat] Support v2 KV cache stats by yizhang-nv · Pull Request #13953 · NVIDIA/TensorRT-LLM

yizhang-nv · 2026-05-10T05:27:11Z

Description

Port V2 KV cache stats support onto the DeepSeek V4 feature branch.

This PR adds V2 KV cache stats delta tracking and wires it into the existing PyExecutor stats path so LLM.get_stats() and server polling can surface kvCacheStats / kvCacheIterationStats for KV cache manager V2. It also adds V1/V2 alignment coverage for basic block counts, SWA windows, and stacked multi-window reuse scenarios.

Test Coverage

python3 -m py_compile tests/integration/defs/kv_cache/test_kv_cache_iteration_stats_alignment.py
git diff --check -- tests/integration/defs/kv_cache/test_kv_cache_iteration_stats_alignment.py
/home/yizhan/.local/bin/pre-commit run --files tests/integration/defs/kv_cache/test_kv_cache_iteration_stats_alignment.py
B200 LLM_MODELS_ROOT=/scratch.trt_llm_data/llm-models pytest -q -s tests/integration/defs/kv_cache/test_kv_cache_iteration_stats_alignment.py
- 3 passed, 3 warnings in 153.30s
- log: /home/scratch.yizhan_sw_1/logs/2026-05-08/6u1g-0015/LLM_MODELS_ROOT__scratch_trt_llm_data_llm-models___01-35-50.stdout.log

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

lfr-0531 · 2026-05-18T13:30:45Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-18T13:38:22Z

PR_Github #48918 [ run ] triggered by Bot. Commit: 8cc9c6c Link to invocation

tensorrt-cicd · 2026-05-18T17:42:47Z

PR_Github #48918 [ run ] completed with state SUCCESS. Commit: 8cc9c6c
/LLM/main/L0_MergeRequest_PR pipeline #38667 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

yizhang-nv · 2026-05-19T07:11:35Z

/bot run

tensorrt-cicd · 2026-05-19T07:18:03Z

PR_Github #49133 [ run ] triggered by Bot. Commit: 107553d Link to invocation

yizhang-nv · 2026-05-19T08:11:03Z

/bot run

tensorrt-cicd · 2026-05-19T08:19:08Z

PR_Github #49147 [ run ] triggered by Bot. Commit: bd74724 Link to invocation

tensorrt-cicd · 2026-05-19T08:19:12Z

PR_Github #49133 [ run ] completed with state ABORTED. Commit: 107553d

Link to invocation

tensorrt-cicd · 2026-05-19T11:26:54Z

PR_Github #49147 [ run ] completed with state SUCCESS. Commit: bd74724
/LLM/main/L0_MergeRequest_PR pipeline #38832 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

nvpohanh · 2026-05-19T11:56:06Z

@lowsfer could you review this? thanks

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

yizhang-nv · 2026-05-20T02:10:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-20T02:16:54Z

PR_Github #49308 [ run ] triggered by Bot. Commit: 9ec657f Link to invocation

tensorrt-cicd · 2026-05-20T08:25:28Z

PR_Github #49308 [ run ] completed with state SUCCESS. Commit: 9ec657f
/LLM/main/L0_MergeRequest_PR pipeline #38970 completed with status: 'SUCCESS'

CI Report

Link to invocation

Filtered out disaggregation and AutoDeploy-only changes for user/fanrongl/dsv4_model. Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

github-actions Bot assigned yizhang-nv May 10, 2026

yizhang-nv force-pushed the feat/kv-cache-v2-stats-port branch 7 times, most recently from c022723 to c02889d Compare May 11, 2026 16:37

yizhang-nv marked this pull request as ready for review May 12, 2026 05:36

yizhang-nv requested review from a team as code owners May 12, 2026 05:36

yizhang-nv requested review from dongxuy04, hchings and liji-nv and removed request for a team May 12, 2026 05:36

yizhang-nv force-pushed the feat/kv-cache-v2-stats-port branch from c02889d to 4e40e05 Compare May 13, 2026 02:25

yizhang-nv requested a review from lowsfer May 13, 2026 02:33

yizhang-nv force-pushed the feat/kv-cache-v2-stats-port branch 8 times, most recently from c705ef1 to 5273874 Compare May 14, 2026 06:15

lfr-0531 force-pushed the feat/deepseek_v4 branch from 0a93d10 to 118e7a5 Compare May 14, 2026 07:44

lowsfer reviewed May 18, 2026

View reviewed changes

Comment thread tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache.py Outdated

yizhang-nv changed the title ~~[None][fix] Support v2 KV cache stats~~ [TRTLLM-12112][fix] Support v2 KV cache stats May 19, 2026

yizhang-nv changed the title ~~[TRTLLM-12112][fix] Support v2 KV cache stats~~ [TRTLLM-12112][feaet] Support v2 KV cache stats May 19, 2026

yizhang-nv changed the title ~~[TRTLLM-12112][feaet] Support v2 KV cache stats~~ [TRTLLM-12112][feat] Support v2 KV cache stats May 19, 2026

yizhang-nv added 10 commits May 19, 2026 19:04

[None][fix] Support KV cache manager V2 stats

cd0de28

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

[None][fix] Report V2 KV cache transfer metrics

15c8dac

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

test: clarify V2 partial leaf reuse stats

a749d74

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

test: add V2 transfer metrics to H100 L0

b62e555

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

test: cover V2 context stats rollback

685e8fe

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

[None][fix] Refactor KV cache V2 pending stats

d7758a3

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

[None][fix] Fold reuse stats into setup

f0dc5e5

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

fix: materialize ctx gpu time before response

ed66a07

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

[None][fix] Refactor KV cache V2 stats views

385b1cb

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

[None][fix] Log KV cache lifecycle IDs

9ec657f

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

yizhang-nv force-pushed the feat/kv-cache-v2-stats-port branch from bd74724 to 9ec657f Compare May 20, 2026 02:10

lfr-0531 merged commit 14c51a4 into NVIDIA:feat/deepseek_v4 May 20, 2026
6 checks passed

Conversation

yizhang-nv commented May 10, 2026

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

lfr-0531 commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

yizhang-nv commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

yizhang-nv commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

nvpohanh commented May 19, 2026

Uh oh!

yizhang-nv commented May 20, 2026

Uh oh!

tensorrt-cicd commented May 20, 2026

Uh oh!

tensorrt-cicd commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants