Skip to content

[https://nvbugs/5973536][fix] Add NVFP4+FP8KV+MTP accuracy specs for DeepSeek-V3.2-Exp#12269

Merged
yizhang-nv merged 1 commit intoNVIDIA:mainfrom
yizhang-nv:fix/nvfp4-fp8kv-mtp-accuracy-spec
Mar 18, 2026
Merged

[https://nvbugs/5973536][fix] Add NVFP4+FP8KV+MTP accuracy specs for DeepSeek-V3.2-Exp#12269
yizhang-nv merged 1 commit intoNVIDIA:mainfrom
yizhang-nv:fix/nvfp4-fp8kv-mtp-accuracy-spec

Conversation

@yizhang-nv
Copy link
Member

@yizhang-nv yizhang-nv commented Mar 17, 2026

Summary by CodeRabbit

Release Notes

  • Tests
    • Added new variant configurations for DeepSeek-V3.2-Exp model with updated parameter settings
    • Updated accuracy reference baselines for GSM8K (95.6) and MMLU (87.2) benchmarks with new model variants

Description

Add missing accuracy reference specs for the combination of NVFP4 quantization, FP8 KV cache, and MTP speculative decoding for the DeepSeek-V3.2-Exp model in both mmlu.yaml and gsm8k.yaml.

Without these entries, the test case TestDeepSeekV32::test_nvfp4_multi_gpus_piecewise_cuda_graph[mtp3_fp8kv_chunked] fails with:

ValueError: Not registered specs: {'dtype': 'auto', 'quant_algo': <QuantAlgo.NVFP4: 'NVFP4'>, 'kv_cache_quant_algo': <QuantAlgo.FP8: 'FP8'>, 'spec_dec_algo': 'MTP', 'extra_acc_spec': None}

The accuracy thresholds are set to match the existing NVFP4+MTP specs (without FP8 KV), consistent with other models (e.g., DeepSeek-V3-Lite) where FP8 KV cache does not significantly affect accuracy.

Test Coverage

  • TestDeepSeekV32::test_nvfp4_multi_gpus_piecewise_cuda_graph[mtp3_fp8kv_chunked]

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@yizhang-nv yizhang-nv requested a review from a team as a code owner March 17, 2026 03:11
@yizhang-nv
Copy link
Member Author

/bot run --only-multi-gpu-test --disable-fail-fast

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f0b9dc16-2cf7-4643-95f3-514f308f82f6

📥 Commits

Reviewing files that changed from the base of the PR and between 20fc52c and 3c9d8a3.

📒 Files selected for processing (2)
  • tests/integration/defs/accuracy/references/gsm8k.yaml
  • tests/integration/defs/accuracy/references/mmlu.yaml

📝 Walkthrough

Walkthrough

Adds new variant entries for the DeepSeek-V3.2-Exp model in accuracy reference files, configuring FP8 kv_cache_quant_algo and MTP spec_dec_algo while preserving existing accuracy scores. No modifications to other models or entries.

Changes

Cohort / File(s) Summary
Accuracy Reference Variants
tests/integration/defs/accuracy/references/gsm8k.yaml, tests/integration/defs/accuracy/references/mmlu.yaml
Adds new variant entries for DeepSeek-V3.2-Exp with FP8 kv_cache_quant_algo and MTP spec_dec_algo, maintaining existing accuracy metrics (95.6 for gsm8k, 87.2 for mmlu).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the primary change: adding NVFP4+FP8KV+MTP accuracy specs for DeepSeek-V3.2-Exp. It follows the template format and succinctly describes the main modification.
Description check ✅ Passed The description provides a clear explanation of what was changed and why, identifies the failing test case, explains the rationale for accuracy thresholds, and includes test coverage and a completed checklist.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the tone of the review comments and chat replies.

Configure the tone_instructions setting to customize the tone of the review comments and chat replies. For example, you can set the tone to Act like a strict teacher, Act like a pirate and more.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39170 [ run ] triggered by Bot. Commit: 3c9d8a3 Link to invocation

@yizhang-nv yizhang-nv force-pushed the fix/nvfp4-fp8kv-mtp-accuracy-spec branch from 3c9d8a3 to 47ea2dd Compare March 17, 2026 03:18
@yizhang-nv
Copy link
Member Author

/bot run --only-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39172 [ run ] triggered by Bot. Commit: 47ea2dd Link to invocation

@yizhang-nv yizhang-nv force-pushed the fix/nvfp4-fp8kv-mtp-accuracy-spec branch from 47ea2dd to 9bf2492 Compare March 17, 2026 06:38
@yizhang-nv
Copy link
Member Author

/bot run --only-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39207 [ run ] triggered by Bot. Commit: 9bf2492 Link to invocation

…DeepSeek-V3.2-Exp

Add missing accuracy reference specs for the combination of NVFP4
quantization, FP8 KV cache, and MTP speculative decoding for
DeepSeek-V3.2-Exp model in both mmlu.yaml and gsm8k.yaml.

Without these entries, the test case
TestDeepSeekV32::test_nvfp4_multi_gpus_piecewise_cuda_graph[mtp3_fp8kv_chunked]
fails with ValueError: Not registered specs.

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
@yizhang-nv yizhang-nv force-pushed the fix/nvfp4-fp8kv-mtp-accuracy-spec branch from 9bf2492 to 89ed318 Compare March 17, 2026 10:44
@yizhang-nv
Copy link
Member Author

/bot run --only-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39243 [ run ] triggered by Bot. Commit: 89ed318 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39243 [ run ] completed with state SUCCESS. Commit: 89ed318
/LLM/main/L0_MergeRequest_PR pipeline #30495 (Partly Tested) completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@yizhang-nv yizhang-nv enabled auto-merge (squash) March 18, 2026 02:43
@yizhang-nv
Copy link
Member Author

/bot skip --comment "Only Multi GPU is enough"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39371 [ skip ] triggered by Bot. Commit: 89ed318 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39371 [ skip ] completed with state SUCCESS. Commit: 89ed318
Skipping testing for commit 89ed318

Link to invocation

@yizhang-nv yizhang-nv merged commit a588688 into NVIDIA:main Mar 18, 2026
5 checks passed
limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026
…DeepSeek-V3.2-Exp (NVIDIA#12269)

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants