Skip to content

[None][feat] add DSV4 KV cache pool ratio config#14623

Merged
lfr-0531 merged 2 commits into
NVIDIA:feat/deepseek_v4from
jiaganc:dsv4-pool-ratio-knob
May 28, 2026
Merged

[None][feat] add DSV4 KV cache pool ratio config#14623
lfr-0531 merged 2 commits into
NVIDIA:feat/deepseek_v4from
jiaganc:dsv4-pool-ratio-knob

Conversation

@jiaganc
Copy link
Copy Markdown
Collaborator

@jiaganc jiaganc commented May 27, 2026

@coderabbitai summary

Description

Add DeepSeek-V4 KV cache configuration knobs for KV cache manager v2:

  • kv_cache_config.pool_ratio: user-provided initial pool ratios. When set for DSV4, this is passed to KV cache manager v2 as initial_pool_ratio and bypasses typical-step and constraint-based initial sizing.
  • kv_cache_config.avg_seq_len: average sequence length used by DSV4 typical-step construction. When unset, DSV4 continues to use max_seq_len.

The change also validates pool ratios, keeps the new LLM API fields out of the legacy pybind KV cache config, and adds a guard for partially constructed KV cache managers when constructor validation fails.

Test Coverage

  • Added LLM API validation coverage for KvCacheConfig.pool_ratio and KvCacheConfig.avg_seq_len.
  • Added KV cache manager v2 coverage for explicit initial_pool_ratio overriding typical-step and constraints.
  • Added DSV4 cache manager coverage for pool_ratio, avg_seq_len, and avg_seq_len > max_seq_len validation.
  • Clean GB200 build on gb-nvl-081-compute02 with CUDA_ARCHITECTURES=100-real.
  • Editable install in dsv4-pool-ratio-knob-jenkins-aarch64-jiaganc; TensorRT-LLM import reports 1.3.0rc15.
  • Focused pytest run: 10 passed.
  • git diff --check HEAD passed.
  • python3 -m py_compile passed for changed Python files and tests.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@jiaganc jiaganc requested review from a team as code owners May 27, 2026 07:56
@jiaganc jiaganc requested review from JunyiXu-nv and QiJune and removed request for a team May 27, 2026 07:56
@jiaganc
Copy link
Copy Markdown
Collaborator Author

jiaganc commented May 27, 2026

/bot run

Comment thread tensorrt_llm/runtime/kv_cache_manager_v2/_core/_kv_cache_manager.py Outdated
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50502 [ run ] triggered by Bot. Commit: 4499c60 Link to invocation

Comment thread tensorrt_llm/llmapi/llm_args.py Outdated
Copy link
Copy Markdown
Collaborator

@lancelly lancelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Comment thread tensorrt_llm/llmapi/llm_args.py
@jiaganc jiaganc force-pushed the dsv4-pool-ratio-knob branch from 4499c60 to 5bc4a09 Compare May 27, 2026 09:34
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50502 [ run ] completed with state SUCCESS. Commit: 4499c60
/LLM/main/L0_MergeRequest_PR pipeline #40010 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
@jiaganc jiaganc force-pushed the dsv4-pool-ratio-knob branch from 5bc4a09 to 20cdeae Compare May 27, 2026 09:46
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
@jiaganc
Copy link
Copy Markdown
Collaborator Author

jiaganc commented May 27, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50526 [ run ] triggered by Bot. Commit: ef8d286 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50526 [ run ] completed with state SUCCESS. Commit: ef8d286
/LLM/main/L0_MergeRequest_PR pipeline #40033 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@peihu-nv
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50591 [ run ] triggered by Bot. Commit: ef8d286 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50591 [ run ] completed with state SUCCESS. Commit: ef8d286
/LLM/main/L0_MergeRequest_PR pipeline #40089 completed with status: 'SUCCESS'

CI Report

Link to invocation

@lfr-0531 lfr-0531 merged commit 69e7acc into NVIDIA:feat/deepseek_v4 May 28, 2026
6 checks passed
@jiaganc jiaganc deleted the dsv4-pool-ratio-knob branch May 28, 2026 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants