[None][fix] Handle DeepSeek-V4 fused A scale shape by lfr-0531 · Pull Request #14149 · NVIDIA/TensorRT-LLM

lfr-0531 · 2026-05-14T16:37:54Z

Description

Fix DeepSeek-V4 NVFP4 expert checkpoint loading when the checkpoint fused A projection scale has the FP8 block-scale shape (16, 56), while the module can be initialized with a stale/mismatched (16, 16) weight_scale.

The loader now preserves the existing oversized slice-copy behavior for padded kv_a_proj_with_mqa modules, and rebuilds the weight_scale parameter plus tensor metadata only when the checkpoint scale matches the expected FP8 block-scale shape for the fused A projection.

Test Coverage

Built TensorRT-LLM wheel: tensorrt_llm-1.3.0rc15-cp312-cp312-linux_x86_64.whl.
Installed the built wheel, then installed the current source in editable mode for Python-layer validation.
pre-commit run --files tensorrt_llm/_torch/models/modeling_deepseekv4.py tests/unittest/_torch/modeling/test_modeling_deepseekv4.py
PYTHONNOUSERSITE=1 pytest tests/unittest/_torch/modeling/test_modeling_deepseekv4.py -k "fused_a_weight_scale or routed_moe_quant_config or weight_remap" -q (8 passed)
Manually validated the NVFP4 checkpoint at /home/scratch.fanrongl_coreai/models/deepseek_v4/pro-nvfp4-experts-v3.5: checkpoint fused scale is (16, 56) and the loader helper rebuilds module weight_scale from (16, 16) to (16, 56).
Full GPU model run was not started because all B300 GPUs on the shared machine were already occupied with ~144-146 GiB used per GPU.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

lfr-0531 · 2026-05-14T16:38:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-14T16:44:46Z

PR_Github #48405 [ run ] triggered by Bot. Commit: 42ad633 Link to invocation

tensorrt-cicd · 2026-05-14T19:53:22Z

PR_Github #48405 [ run ] completed with state SUCCESS. Commit: 42ad633
/LLM/main/L0_MergeRequest_PR pipeline #38208 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lfr-0531 · 2026-05-15T01:39:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-15T01:46:47Z

PR_Github #48478 [ run ] triggered by Bot. Commit: 42ad633 Link to invocation

tensorrt-cicd · 2026-05-15T03:14:58Z

PR_Github #48478 [ run ] completed with state SUCCESS. Commit: 42ad633
/LLM/main/L0_MergeRequest_PR pipeline #38274 completed with status: 'SUCCESS'

CI Report

Link to invocation

…ongl/fix-dsv4-nvfp4-load Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> # Conflicts: # tensorrt_llm/_torch/models/modeling_deepseekv4.py

lfr-0531 · 2026-05-15T03:40:00Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-15T03:47:41Z

PR_Github #48506 [ run ] triggered by Bot. Commit: 13bf121 Link to invocation

tensorrt-cicd · 2026-05-15T08:38:45Z

PR_Github #48506 [ run ] completed with state SUCCESS. Commit: 13bf121
/LLM/main/L0_MergeRequest_PR pipeline #38304 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lfr-0531 · 2026-05-15T09:16:25Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-15T09:24:00Z

PR_Github #48575 [ run ] triggered by Bot. Commit: 13bf121 Link to invocation

tensorrt-cicd · 2026-05-15T12:08:23Z

PR_Github #48575 [ run ] completed with state SUCCESS. Commit: 13bf121
/LLM/main/L0_MergeRequest_PR pipeline #38362 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lfr-0531 · 2026-05-17T05:05:19Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-17T05:10:46Z

PR_Github #48733 [ run ] triggered by Bot. Commit: 13bf121 Link to invocation

tensorrt-cicd · 2026-05-17T06:45:00Z

PR_Github #48733 [ run ] completed with state ABORTED. Commit: 13bf121
/LLM/main/L0_MergeRequest_PR pipeline #38502 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

lfr-0531 · 2026-05-17T16:00:02Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-17T16:06:44Z

PR_Github #48771 [ run ] triggered by Bot. Commit: 13bf121 Link to invocation

tensorrt-cicd · 2026-05-17T16:25:11Z

PR_Github #48771 [ run ] completed with state FAILURE. Commit: 13bf121
/LLM/main/L0_MergeRequest_PR pipeline #38538 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

lfr-0531 · 2026-05-18T04:18:52Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-18T04:24:47Z

PR_Github #48831 [ run ] triggered by Bot. Commit: 13bf121 Link to invocation

tensorrt-cicd · 2026-05-18T05:21:38Z

PR_Github #48831 [ run ] completed with state SUCCESS. Commit: 13bf121
/LLM/main/L0_MergeRequest_PR pipeline #38591 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 5af1511) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

[None][fix] handle DeepSeek-V4 fused A scale shape

42ad633

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

lfr-0531 requested a review from a team as a code owner May 14, 2026 16:37

lfr-0531 requested review from symphonylyh and removed request for a team May 14, 2026 16:37

github-actions Bot assigned lfr-0531 May 14, 2026

lfr-0531 requested review from Tracin and removed request for symphonylyh May 14, 2026 16:38

Merge remote-tracking branch 'github/feat/deepseek_v4' into user/fanr…

13bf121

…ongl/fix-dsv4-nvfp4-load Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> # Conflicts: # tensorrt_llm/_torch/models/modeling_deepseekv4.py

lfr-0531 merged commit 5af1511 into NVIDIA:feat/deepseek_v4 May 18, 2026
6 checks passed

lfr-0531 added the deepseek-v4 label May 19, 2026

Conversation

lfr-0531 commented May 14, 2026

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

lfr-0531 commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

tensorrt-cicd commented May 14, 2026

Uh oh!

lfr-0531 commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

lfr-0531 commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

lfr-0531 commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

tensorrt-cicd commented May 15, 2026

Uh oh!

lfr-0531 commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

lfr-0531 commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

tensorrt-cicd commented May 17, 2026

Uh oh!

lfr-0531 commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants