[https://nvbugs/5973536][fix] Add NVFP4+FP8KV+MTP accuracy specs for DeepSeek-V3.2-Exp#12269
Conversation
|
/bot run --only-multi-gpu-test --disable-fail-fast |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds new variant entries for the DeepSeek-V3.2-Exp model in accuracy reference files, configuring FP8 kv_cache_quant_algo and MTP spec_dec_algo while preserving existing accuracy scores. No modifications to other models or entries. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can customize the tone of the review comments and chat replies.Configure the |
|
PR_Github #39170 [ run ] triggered by Bot. Commit: |
3c9d8a3 to
47ea2dd
Compare
|
/bot run --only-multi-gpu-test --disable-fail-fast |
|
PR_Github #39172 [ run ] triggered by Bot. Commit: |
47ea2dd to
9bf2492
Compare
|
/bot run --only-multi-gpu-test --disable-fail-fast |
|
PR_Github #39207 [ run ] triggered by Bot. Commit: |
…DeepSeek-V3.2-Exp Add missing accuracy reference specs for the combination of NVFP4 quantization, FP8 KV cache, and MTP speculative decoding for DeepSeek-V3.2-Exp model in both mmlu.yaml and gsm8k.yaml. Without these entries, the test case TestDeepSeekV32::test_nvfp4_multi_gpus_piecewise_cuda_graph[mtp3_fp8kv_chunked] fails with ValueError: Not registered specs. Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
9bf2492 to
89ed318
Compare
|
/bot run --only-multi-gpu-test --disable-fail-fast |
|
PR_Github #39243 [ run ] triggered by Bot. Commit: |
|
PR_Github #39243 [ run ] completed with state |
|
/bot skip --comment "Only Multi GPU is enough" |
|
PR_Github #39371 [ skip ] triggered by Bot. Commit: |
|
PR_Github #39371 [ skip ] completed with state |
…DeepSeek-V3.2-Exp (NVIDIA#12269) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Summary by CodeRabbit
Release Notes
Description
Add missing accuracy reference specs for the combination of NVFP4 quantization, FP8 KV cache, and MTP speculative decoding for the DeepSeek-V3.2-Exp model in both
mmlu.yamlandgsm8k.yaml.Without these entries, the test case
TestDeepSeekV32::test_nvfp4_multi_gpus_piecewise_cuda_graph[mtp3_fp8kv_chunked]fails with:The accuracy thresholds are set to match the existing NVFP4+MTP specs (without FP8 KV), consistent with other models (e.g., DeepSeek-V3-Lite) where FP8 KV cache does not significantly affect accuracy.
Test Coverage
TestDeepSeekV32::test_nvfp4_multi_gpus_piecewise_cuda_graph[mtp3_fp8kv_chunked]PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.