Skip to content

[None][fix] Handle DeepSeek-V4 fused A scale shape#14149

Merged
lfr-0531 merged 2 commits into
NVIDIA:feat/deepseek_v4from
lfr-0531:user/fanrongl/fix-dsv4-nvfp4-load
May 18, 2026
Merged

[None][fix] Handle DeepSeek-V4 fused A scale shape#14149
lfr-0531 merged 2 commits into
NVIDIA:feat/deepseek_v4from
lfr-0531:user/fanrongl/fix-dsv4-nvfp4-load

Conversation

@lfr-0531
Copy link
Copy Markdown
Collaborator

@coderabbitai summary

Description

Fix DeepSeek-V4 NVFP4 expert checkpoint loading when the checkpoint fused A projection scale has the FP8 block-scale shape (16, 56), while the module can be initialized with a stale/mismatched (16, 16) weight_scale.

The loader now preserves the existing oversized slice-copy behavior for padded kv_a_proj_with_mqa modules, and rebuilds the weight_scale parameter plus tensor metadata only when the checkpoint scale matches the expected FP8 block-scale shape for the fused A projection.

Test Coverage

  • Built TensorRT-LLM wheel: tensorrt_llm-1.3.0rc15-cp312-cp312-linux_x86_64.whl.
  • Installed the built wheel, then installed the current source in editable mode for Python-layer validation.
  • pre-commit run --files tensorrt_llm/_torch/models/modeling_deepseekv4.py tests/unittest/_torch/modeling/test_modeling_deepseekv4.py
  • PYTHONNOUSERSITE=1 pytest tests/unittest/_torch/modeling/test_modeling_deepseekv4.py -k "fused_a_weight_scale or routed_moe_quant_config or weight_remap" -q (8 passed)
  • Manually validated the NVFP4 checkpoint at /home/scratch.fanrongl_coreai/models/deepseek_v4/pro-nvfp4-experts-v3.5: checkpoint fused scale is (16, 56) and the loader helper rebuilds module weight_scale from (16, 16) to (16, 56).
  • Full GPU model run was not started because all B300 GPUs on the shared machine were already occupied with ~144-146 GiB used per GPU.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
@lfr-0531 lfr-0531 requested a review from a team as a code owner May 14, 2026 16:37
@lfr-0531 lfr-0531 requested review from symphonylyh and removed request for a team May 14, 2026 16:37
@lfr-0531 lfr-0531 requested review from Tracin and removed request for symphonylyh May 14, 2026 16:38
@lfr-0531
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48405 [ run ] triggered by Bot. Commit: 42ad633 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48405 [ run ] completed with state SUCCESS. Commit: 42ad633
/LLM/main/L0_MergeRequest_PR pipeline #38208 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lfr-0531
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48478 [ run ] triggered by Bot. Commit: 42ad633 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48478 [ run ] completed with state SUCCESS. Commit: 42ad633
/LLM/main/L0_MergeRequest_PR pipeline #38274 completed with status: 'SUCCESS'

CI Report

Link to invocation

…ongl/fix-dsv4-nvfp4-load

Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>

# Conflicts:
#	tensorrt_llm/_torch/models/modeling_deepseekv4.py
@lfr-0531
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48506 [ run ] triggered by Bot. Commit: 13bf121 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48506 [ run ] completed with state SUCCESS. Commit: 13bf121
/LLM/main/L0_MergeRequest_PR pipeline #38304 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lfr-0531
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48575 [ run ] triggered by Bot. Commit: 13bf121 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48575 [ run ] completed with state SUCCESS. Commit: 13bf121
/LLM/main/L0_MergeRequest_PR pipeline #38362 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lfr-0531
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48733 [ run ] triggered by Bot. Commit: 13bf121 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48733 [ run ] completed with state ABORTED. Commit: 13bf121
/LLM/main/L0_MergeRequest_PR pipeline #38502 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@lfr-0531
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48771 [ run ] triggered by Bot. Commit: 13bf121 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48771 [ run ] completed with state FAILURE. Commit: 13bf121
/LLM/main/L0_MergeRequest_PR pipeline #38538 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@lfr-0531
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48831 [ run ] triggered by Bot. Commit: 13bf121 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48831 [ run ] completed with state SUCCESS. Commit: 13bf121
/LLM/main/L0_MergeRequest_PR pipeline #38591 completed with status: 'SUCCESS'

CI Report

Link to invocation

@lfr-0531 lfr-0531 merged commit 5af1511 into NVIDIA:feat/deepseek_v4 May 18, 2026
6 checks passed
lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request May 29, 2026
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com>
(cherry picked from commit 5af1511)
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants