[https://nvbugs/5973536][fix] Add NVFP4+FP8KV+MTP accuracy specs for DeepSeek-V3.2-Exp by yizhang-nv · Pull Request #12269 · NVIDIA/TensorRT-LLM

yizhang-nv · 2026-03-17T03:11:08Z

Summary by CodeRabbit

Release Notes

Tests
- Added new variant configurations for DeepSeek-V3.2-Exp model with updated parameter settings
- Updated accuracy reference baselines for GSM8K (95.6) and MMLU (87.2) benchmarks with new model variants

Description

Add missing accuracy reference specs for the combination of NVFP4 quantization, FP8 KV cache, and MTP speculative decoding for the DeepSeek-V3.2-Exp model in both mmlu.yaml and gsm8k.yaml.

Without these entries, the test case TestDeepSeekV32::test_nvfp4_multi_gpus_piecewise_cuda_graph[mtp3_fp8kv_chunked] fails with:

ValueError: Not registered specs: {'dtype': 'auto', 'quant_algo': <QuantAlgo.NVFP4: 'NVFP4'>, 'kv_cache_quant_algo': <QuantAlgo.FP8: 'FP8'>, 'spec_dec_algo': 'MTP', 'extra_acc_spec': None}

The accuracy thresholds are set to match the existing NVFP4+MTP specs (without FP8 KV), consistent with other models (e.g., DeepSeek-V3-Lite) where FP8 KV cache does not significantly affect accuracy.

Test Coverage

TestDeepSeekV32::test_nvfp4_multi_gpus_piecewise_cuda_graph[mtp3_fp8kv_chunked]

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

yizhang-nv · 2026-03-17T03:12:00Z

/bot run --only-multi-gpu-test --disable-fail-fast

coderabbitai · 2026-03-17T03:13:34Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f0b9dc16-2cf7-4643-95f3-514f308f82f6

📥 Commits

Reviewing files that changed from the base of the PR and between 20fc52c and 3c9d8a3.

📒 Files selected for processing (2)

tests/integration/defs/accuracy/references/gsm8k.yaml
tests/integration/defs/accuracy/references/mmlu.yaml

📝 Walkthrough

Walkthrough

Adds new variant entries for the DeepSeek-V3.2-Exp model in accuracy reference files, configuring FP8 kv_cache_quant_algo and MTP spec_dec_algo while preserving existing accuracy scores. No modifications to other models or entries.

Changes

Cohort / File(s)	Summary
Accuracy Reference Variants `tests/integration/defs/accuracy/references/gsm8k.yaml`, `tests/integration/defs/accuracy/references/mmlu.yaml`	Adds new variant entries for DeepSeek-V3.2-Exp with FP8 kv_cache_quant_algo and MTP spec_dec_algo, maintaining existing accuracy metrics (95.6 for gsm8k, 87.2 for mmlu).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the primary change: adding NVFP4+FP8KV+MTP accuracy specs for DeepSeek-V3.2-Exp. It follows the template format and succinctly describes the main modification.
Description check	✅ Passed	The description provides a clear explanation of what was changed and why, identifies the failing test case, explains the rationale for accuracy thresholds, and includes test coverage and a completed checklist.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can customize the tone of the review comments and chat replies.

Configure the tone_instructions setting to customize the tone of the review comments and chat replies. For example, you can set the tone to Act like a strict teacher, Act like a pirate and more.

tensorrt-cicd · 2026-03-17T03:17:37Z

PR_Github #39170 [ run ] triggered by Bot. Commit: 3c9d8a3 Link to invocation

yizhang-nv · 2026-03-17T03:18:28Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-03-17T03:25:24Z

PR_Github #39172 [ run ] triggered by Bot. Commit: 47ea2dd Link to invocation

yizhang-nv · 2026-03-17T06:50:56Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-03-17T06:57:27Z

PR_Github #39207 [ run ] triggered by Bot. Commit: 9bf2492 Link to invocation

…DeepSeek-V3.2-Exp Add missing accuracy reference specs for the combination of NVFP4 quantization, FP8 KV cache, and MTP speculative decoding for DeepSeek-V3.2-Exp model in both mmlu.yaml and gsm8k.yaml. Without these entries, the test case TestDeepSeekV32::test_nvfp4_multi_gpus_piecewise_cuda_graph[mtp3_fp8kv_chunked] fails with ValueError: Not registered specs. Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

yizhang-nv · 2026-03-17T10:44:57Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2026-03-17T10:51:27Z

PR_Github #39243 [ run ] triggered by Bot. Commit: 89ed318 Link to invocation

tensorrt-cicd · 2026-03-17T14:39:45Z

PR_Github #39243 [ run ] completed with state SUCCESS. Commit: 89ed318
/LLM/main/L0_MergeRequest_PR pipeline #30495 (Partly Tested) completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

yizhang-nv · 2026-03-18T02:44:03Z

/bot skip --comment "Only Multi GPU is enough"

tensorrt-cicd · 2026-03-18T02:52:34Z

PR_Github #39371 [ skip ] triggered by Bot. Commit: 89ed318 Link to invocation

tensorrt-cicd · 2026-03-18T03:00:16Z

PR_Github #39371 [ skip ] completed with state SUCCESS. Commit: 89ed318
Skipping testing for commit 89ed318

Link to invocation

…DeepSeek-V3.2-Exp (NVIDIA#12269) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

yizhang-nv requested a review from a team as a code owner March 17, 2026 03:11

github-actions bot assigned yizhang-nv Mar 17, 2026

yizhang-nv force-pushed the fix/nvfp4-fp8kv-mtp-accuracy-spec branch from 3c9d8a3 to 47ea2dd Compare March 17, 2026 03:18

yizhang-nv force-pushed the fix/nvfp4-fp8kv-mtp-accuracy-spec branch from 47ea2dd to 9bf2492 Compare March 17, 2026 06:38

ZhanruiSunCh approved these changes Mar 17, 2026

View reviewed changes

yizhang-nv force-pushed the fix/nvfp4-fp8kv-mtp-accuracy-spec branch from 9bf2492 to 89ed318 Compare March 17, 2026 10:44

crazydemo approved these changes Mar 18, 2026

View reviewed changes

yizhang-nv enabled auto-merge (squash) March 18, 2026 02:43

yizhang-nv merged commit a588688 into NVIDIA:main Mar 18, 2026
5 checks passed

limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026

[https://nvbugs/5973536][fix] Add NVFP4+FP8KV+MTP accuracy specs for …

7580c13

…DeepSeek-V3.2-Exp (NVIDIA#12269) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

Conversation

yizhang-nv commented Mar 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

yizhang-nv commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

yizhang-nv commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

yizhang-nv commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

yizhang-nv commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

yizhang-nv commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yizhang-nv commented Mar 17, 2026 •

edited by coderabbitai bot

Loading