[#11526][chore] AutoDeploy accuracy tests: use nemotron-3 official checkpoints by galagam · Pull Request #12243 · NVIDIA/TensorRT-LLM

galagam · 2026-03-16T12:21:07Z

Description

update nanov3 and superv3 checkpoints in AD accuracy test
adjust mmlu and gsm8k reference values accordingly
enable non-indentical MoE input scales for super fp8 checkpoint
move super and nano configs under canonical location examples/auto_deploy/model_registry/configs, keep symlinks from original location to avoid breaking dashboards/scripts
bugfix in superV3 test with attention dp - entire sharding config was overwritten
unwaive accuracy tests that failed due to checkpoint corruption

Note:
NanoV3 final bf16 and fp8 accuracy is significantly lower with the official checkpoint (nvfp4 is improved).
Verified the same accuracy is reported with Pytorch backend, which means is a checkpoint tradeoff, not an implementation bug.

Test Coverage

N/A - Test refactoring PR.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

galagam · 2026-03-16T12:24:39Z

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

coderabbitai · 2026-03-16T12:26:17Z

📝 Walkthrough

Walkthrough

Introduces YAML configuration files for NVIDIA Nemotron 3 Nano and Super V3 models with TRT-LLM runtime settings and sharding strategies, adds accuracy reference benchmarks for these models, updates test code to use model mappings via HuggingFace identifiers, and removes associated test skip entries.

Changes

Cohort / File(s)	Summary
Model Registry Configurations `examples/auto_deploy/model_registry/configs/nano_v3.yaml`, `examples/auto_deploy/model_registry/configs/super_v3.yaml`	New YAML config files defining TRT-LLM runtime, compilation backend, batching parameters, KV cache settings, and advanced sharding/MoE transform strategies with per-layer projection mappings (q_proj, k_proj, v_proj, o_proj, etc.) and multi-stream MoE/logits fusion options.
Accuracy Reference Data `tests/integration/defs/accuracy/references/gsm8k.yaml`, `tests/integration/defs/accuracy/references/mmlu.yaml`	Added accuracy benchmark entries for NVIDIA-Nemotron-3-Nano-30B and NVIDIA-Nemotron-3-Super-120B variants with multiple quantization configurations (FP8, NVFP4) and their corresponding accuracy scores.
Test Infrastructure Updates `tests/integration/defs/accuracy/test_llm_api_autodeploy.py`, `tests/test_common/llm_data.py`	Updated TestNemotronNanoV3 and TestNemotronSuperV3 to use hf_id_to_local_model_dir() wrappers for model path resolution; modified enable_attention_dp application logic to use setdefault for detect_sharding; added HuggingFace ID to local model directory mappings for Nano variants (BF16, FP8, NVFP4).
Test Configuration `tests/integration/test_lists/waives.txt`	Removed two SKIP entries related to Nemotron accuracy tests with bf16 configurations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

[None][fix] Refactor nanoV3+superV3 accuracy tests to load example config #11458 — Modifies nano/super V3 autodeploy accuracy tests and configs, replacing hard-coded test configurations with example/config-driven MODEL_PATHS and CONFIG_YAML entries with related test logic changes.

Suggested reviewers

suyoggupta
tcherckez-nvidia
marinayanov
jieli-matrix

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the main change: updating AutoDeploy accuracy tests to use official nemotron-3 checkpoints, which aligns with the substantial file modifications across test configurations and reference values.
Description check	✅ Passed	The PR description clearly explains the changes, rationale, and includes specific implementation details about checkpoint updates, reference value adjustments, and bug fixes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can use your project's `pylint` configuration to improve the quality of Python code reviews.

Add a pylint configuration file to your project to customize how CodeRabbit runs pylint.

tensorrt-cicd · 2026-03-16T12:31:24Z

PR_Github #39085 [ run ] triggered by Bot. Commit: 925a2e8 Link to invocation

tensorrt-cicd · 2026-03-16T14:26:44Z

PR_Github #39085 [ run ] completed with state SUCCESS. Commit: 925a2e8
/LLM/main/L0_MergeRequest_PR pipeline #30348 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

galagam · 2026-03-16T15:37:47Z

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

tensorrt-cicd · 2026-03-16T15:44:06Z

PR_Github #39102 [ run ] triggered by Bot. Commit: 2fe81a3 Link to invocation

tensorrt-cicd · 2026-03-16T17:48:08Z

PR_Github #39102 [ run ] completed with state SUCCESS. Commit: 2fe81a3
/LLM/main/L0_MergeRequest_PR pipeline #30363 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

galagam · 2026-03-16T17:56:43Z

/bot run

galagam · 2026-03-16T19:00:27Z

/bot run

tensorrt-cicd · 2026-03-17T05:41:42Z

PR_Github #39192 [ run ] triggered by Bot. Commit: 2fe81a3 Link to invocation

tensorrt-cicd · 2026-03-17T09:16:30Z

PR_Github #39192 [ run ] completed with state FAILURE. Commit: 2fe81a3
/LLM/main/L0_MergeRequest_PR pipeline #30444 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

tests/integration/defs/accuracy/references/mmlu.yaml

galagam · 2026-03-17T15:03:20Z

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

tensorrt-cicd · 2026-03-17T15:09:28Z

PR_Github #39272 [ run ] triggered by Bot. Commit: 6f270f6 Link to invocation

tensorrt-cicd · 2026-03-17T17:17:12Z

PR_Github #39272 [ run ] completed with state SUCCESS. Commit: 6f270f6
/LLM/main/L0_MergeRequest_PR pipeline #30519 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

galagam · 2026-03-17T17:20:53Z

/bot run

tensorrt-cicd · 2026-03-17T17:28:41Z

PR_Github #39298 [ run ] triggered by Bot. Commit: 6f270f6 Link to invocation

tensorrt-cicd · 2026-03-17T20:35:50Z

PR_Github #39298 [ run ] completed with state SUCCESS. Commit: 6f270f6
/LLM/main/L0_MergeRequest_PR pipeline #30547 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

tests/integration/defs/accuracy/references/mmlu.yaml

…ial checkpoints - update nanov3 and superv3 checkpoints in AD accuracy test - adjust mmlu and gsm8k reference values accordingly - enable non-indentical MoE input scales for super fp8 checkpoint - move super and nano configs under canonical location examples/auto_deploy/model_registry/configs, keep symlinks from original location to avoid breaking dashboards/scripts - bugfix in superV3 test with attention dp - entire sharding config was overwritten - unwaive two tests accuracy tests that are passing after the above changes Note: NanoV3 final bf16 and fp8 accuracy is significantly lower with the official checkpoint (nvfp4 is improved). Verified the same accuracy is reported with Pytorch backend. This is a checkpoint tradeoff, not implementation-specific. Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> undo unwaive http://nvbugs/5919796 Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> align to existing reference records Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> align nano reference Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

galagam · 2026-03-18T08:03:00Z

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

tensorrt-cicd · 2026-03-18T08:09:17Z

PR_Github #39421 [ run ] triggered by Bot. Commit: 1691a1d Link to invocation

tensorrt-cicd · 2026-03-18T10:27:35Z

PR_Github #39421 [ run ] completed with state SUCCESS. Commit: 1691a1d
/LLM/main/L0_MergeRequest_PR pipeline #30650 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

galagam · 2026-03-18T10:28:29Z

/bot run

tensorrt-cicd · 2026-03-18T10:34:10Z

PR_Github #39443 [ run ] triggered by Bot. Commit: 1691a1d Link to invocation

tensorrt-cicd · 2026-03-18T14:58:55Z

PR_Github #39443 [ run ] completed with state SUCCESS. Commit: 1691a1d
/LLM/main/L0_MergeRequest_PR pipeline #30670 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

galagam · 2026-03-18T15:41:47Z

/bot run

tensorrt-cicd · 2026-03-18T15:47:37Z

PR_Github #39476 [ run ] triggered by Bot. Commit: 1691a1d Link to invocation

tensorrt-cicd · 2026-03-18T19:03:39Z

PR_Github #39476 [ run ] completed with state SUCCESS. Commit: 1691a1d
/LLM/main/L0_MergeRequest_PR pipeline #30701 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

…ial checkpoints (NVIDIA#12243) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

galagam requested review from a team as code owners March 16, 2026 12:21

galagam requested a review from Fridah-nv March 16, 2026 12:21

github-actions bot assigned galagam Mar 16, 2026

tcherckez-nvidia approved these changes Mar 17, 2026

View reviewed changes

nv-guomingz reviewed Mar 17, 2026

View reviewed changes

tests/integration/defs/accuracy/references/mmlu.yaml Outdated Show resolved Hide resolved

tcherckez-nvidia mentioned this pull request Mar 17, 2026

[#12183][fix] Fix TRTLLM-Gen NVFP4 MoE scales for mixed-precision che… #12240

Merged

1 task

galagam requested review from StanleySun639 and tburt-nv March 17, 2026 14:29

galagam force-pushed the gagam/acc-tests-checkpoints-update branch 2 times, most recently from 7c32a66 to 6f270f6 Compare March 17, 2026 15:01

nv-guomingz reviewed Mar 17, 2026

View reviewed changes

tests/integration/defs/accuracy/references/mmlu.yaml Outdated Show resolved Hide resolved

galagam force-pushed the gagam/acc-tests-checkpoints-update branch from 97d8cab to 1691a1d Compare March 18, 2026 07:02

xinhe-nv approved these changes Mar 19, 2026

View reviewed changes

galagam merged commit 32be345 into NVIDIA:main Mar 19, 2026
5 checks passed

limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026

[NVIDIA#11526][chore] AutoDeploy accuracy tests: use nemotron-3 offic…

da727ec

…ial checkpoints (NVIDIA#12243) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

Conversation

galagam commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

galagam commented Mar 16, 2026

Uh oh!

coderabbitai bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Mar 16, 2026

Uh oh!

tensorrt-cicd commented Mar 16, 2026

Uh oh!

galagam commented Mar 16, 2026

Uh oh!

tensorrt-cicd commented Mar 16, 2026

Uh oh!

tensorrt-cicd commented Mar 16, 2026

Uh oh!

galagam commented Mar 16, 2026

Uh oh!

galagam commented Mar 16, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

Uh oh!

galagam commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

galagam commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

tensorrt-cicd commented Mar 17, 2026

Uh oh!

Uh oh!

galagam commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

galagam commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

galagam commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

tensorrt-cicd commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

galagam commented Mar 16, 2026 •

edited

Loading

coderabbitai bot commented Mar 16, 2026 •

edited

Loading