[None][fix] Handle DeepSeek-V4 fused A scale shape#14149
Conversation
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #48405 [ run ] triggered by Bot. Commit: |
|
PR_Github #48405 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #48478 [ run ] triggered by Bot. Commit: |
|
PR_Github #48478 [ run ] completed with state |
…ongl/fix-dsv4-nvfp4-load Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> # Conflicts: # tensorrt_llm/_torch/models/modeling_deepseekv4.py
|
/bot run --disable-fail-fast |
|
PR_Github #48506 [ run ] triggered by Bot. Commit: |
|
PR_Github #48506 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #48575 [ run ] triggered by Bot. Commit: |
|
PR_Github #48575 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #48733 [ run ] triggered by Bot. Commit: |
|
PR_Github #48733 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #48771 [ run ] triggered by Bot. Commit: |
|
PR_Github #48771 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #48831 [ run ] triggered by Bot. Commit: |
|
PR_Github #48831 [ run ] completed with state |
Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com> Co-authored-by: Fanrong Li <lfr-0531@users.noreply.github.com> (cherry picked from commit 5af1511) Signed-off-by: Fanrong Li <lfr-0531@users.noreply.github.com>
@coderabbitai summary
Description
Fix DeepSeek-V4 NVFP4 expert checkpoint loading when the checkpoint fused A projection scale has the FP8 block-scale shape
(16, 56), while the module can be initialized with a stale/mismatched(16, 16)weight_scale.The loader now preserves the existing oversized slice-copy behavior for padded
kv_a_proj_with_mqamodules, and rebuilds theweight_scaleparameter plus tensor metadata only when the checkpoint scale matches the expected FP8 block-scale shape for the fused A projection.Test Coverage
tensorrt_llm-1.3.0rc15-cp312-cp312-linux_x86_64.whl.pre-commit run --files tensorrt_llm/_torch/models/modeling_deepseekv4.py tests/unittest/_torch/modeling/test_modeling_deepseekv4.pyPYTHONNOUSERSITE=1 pytest tests/unittest/_torch/modeling/test_modeling_deepseekv4.py -k "fused_a_weight_scale or routed_moe_quant_config or weight_remap" -q(8 passed)/home/scratch.fanrongl_coreai/models/deepseek_v4/pro-nvfp4-experts-v3.5: checkpoint fused scale is(16, 56)and the loader helper rebuilds moduleweight_scalefrom(16, 16)to(16, 56).PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.