[TRTLLM-12112][feat] Support v2 KV cache stats#13953
Conversation
c022723 to
c02889d
Compare
c02889d to
4e40e05
Compare
c705ef1 to
5273874
Compare
0a93d10 to
118e7a5
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #48918 [ run ] triggered by Bot. Commit: |
|
PR_Github #48918 [ run ] completed with state
|
|
/bot run |
|
PR_Github #49133 [ run ] triggered by Bot. Commit: |
|
/bot run |
|
PR_Github #49147 [ run ] triggered by Bot. Commit: |
|
PR_Github #49133 [ run ] completed with state |
|
PR_Github #49147 [ run ] completed with state
|
|
@lowsfer could you review this? thanks |
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
bd74724 to
9ec657f
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #49308 [ run ] triggered by Bot. Commit: |
|
PR_Github #49308 [ run ] completed with state |
Filtered out disaggregation and AutoDeploy-only changes for user/fanrongl/dsv4_model. Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
@coderabbitai summary
Description
Port V2 KV cache stats support onto the DeepSeek V4 feature branch.
This PR adds V2 KV cache stats delta tracking and wires it into the existing PyExecutor stats path so
LLM.get_stats()and server polling can surfacekvCacheStats/kvCacheIterationStatsfor KV cache manager V2. It also adds V1/V2 alignment coverage for basic block counts, SWA windows, and stacked multi-window reuse scenarios.Test Coverage
python3 -m py_compile tests/integration/defs/kv_cache/test_kv_cache_iteration_stats_alignment.pygit diff --check -- tests/integration/defs/kv_cache/test_kv_cache_iteration_stats_alignment.py/home/yizhan/.local/bin/pre-commit run --files tests/integration/defs/kv_cache/test_kv_cache_iteration_stats_alignment.pyLLM_MODELS_ROOT=/scratch.trt_llm_data/llm-models pytest -q -s tests/integration/defs/kv_cache/test_kv_cache_iteration_stats_alignment.py3 passed, 3 warnings in 153.30s/home/scratch.yizhan_sw_1/logs/2026-05-08/6u1g-0015/LLM_MODELS_ROOT__scratch_trt_llm_data_llm-models___01-35-50.stdout.logPR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.