Skip to content

[#11526][chore] AutoDeploy accuracy tests: use nemotron-3 official checkpoints#12243

Merged
galagam merged 1 commit intoNVIDIA:mainfrom
nv-auto-deploy:gagam/acc-tests-checkpoints-update
Mar 19, 2026
Merged

[#11526][chore] AutoDeploy accuracy tests: use nemotron-3 official checkpoints#12243
galagam merged 1 commit intoNVIDIA:mainfrom
nv-auto-deploy:gagam/acc-tests-checkpoints-update

Conversation

@galagam
Copy link
Collaborator

@galagam galagam commented Mar 16, 2026

Description

  • update nanov3 and superv3 checkpoints in AD accuracy test
  • adjust mmlu and gsm8k reference values accordingly
  • enable non-indentical MoE input scales for super fp8 checkpoint
  • move super and nano configs under canonical location examples/auto_deploy/model_registry/configs, keep symlinks from original location to avoid breaking dashboards/scripts
  • bugfix in superV3 test with attention dp - entire sharding config was overwritten
  • unwaive accuracy tests that failed due to checkpoint corruption

Note:
NanoV3 final bf16 and fp8 accuracy is significantly lower with the official checkpoint (nvfp4 is improved).
Verified the same accuracy is reported with Pytorch backend, which means is a checkpoint tradeoff, not an implementation bug.

Test Coverage

N/A - Test refactoring PR.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@galagam galagam requested review from a team as code owners March 16, 2026 12:21
@galagam galagam requested a review from Fridah-nv March 16, 2026 12:21
@galagam
Copy link
Collaborator Author

galagam commented Mar 16, 2026

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 16, 2026

📝 Walkthrough

Walkthrough

Introduces YAML configuration files for NVIDIA Nemotron 3 Nano and Super V3 models with TRT-LLM runtime settings and sharding strategies, adds accuracy reference benchmarks for these models, updates test code to use model mappings via HuggingFace identifiers, and removes associated test skip entries.

Changes

Cohort / File(s) Summary
Model Registry Configurations
examples/auto_deploy/model_registry/configs/nano_v3.yaml, examples/auto_deploy/model_registry/configs/super_v3.yaml
New YAML config files defining TRT-LLM runtime, compilation backend, batching parameters, KV cache settings, and advanced sharding/MoE transform strategies with per-layer projection mappings (q_proj, k_proj, v_proj, o_proj, etc.) and multi-stream MoE/logits fusion options.
Accuracy Reference Data
tests/integration/defs/accuracy/references/gsm8k.yaml, tests/integration/defs/accuracy/references/mmlu.yaml
Added accuracy benchmark entries for NVIDIA-Nemotron-3-Nano-30B and NVIDIA-Nemotron-3-Super-120B variants with multiple quantization configurations (FP8, NVFP4) and their corresponding accuracy scores.
Test Infrastructure Updates
tests/integration/defs/accuracy/test_llm_api_autodeploy.py, tests/test_common/llm_data.py
Updated TestNemotronNanoV3 and TestNemotronSuperV3 to use hf_id_to_local_model_dir() wrappers for model path resolution; modified enable_attention_dp application logic to use setdefault for detect_sharding; added HuggingFace ID to local model directory mappings for Nano variants (BF16, FP8, NVFP4).
Test Configuration
tests/integration/test_lists/waives.txt
Removed two SKIP entries related to Nemotron accuracy tests with bf16 configurations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • suyoggupta
  • tcherckez-nvidia
  • marinayanov
  • jieli-matrix
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main change: updating AutoDeploy accuracy tests to use official nemotron-3 checkpoints, which aligns with the substantial file modifications across test configurations and reference values.
Description check ✅ Passed The PR description clearly explains the changes, rationale, and includes specific implementation details about checkpoint updates, reference value adjustments, and bug fixes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use your project's `pylint` configuration to improve the quality of Python code reviews.

Add a pylint configuration file to your project to customize how CodeRabbit runs pylint.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39085 [ run ] triggered by Bot. Commit: 925a2e8 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39085 [ run ] completed with state SUCCESS. Commit: 925a2e8
/LLM/main/L0_MergeRequest_PR pipeline #30348 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@galagam
Copy link
Collaborator Author

galagam commented Mar 16, 2026

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39102 [ run ] triggered by Bot. Commit: 2fe81a3 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39102 [ run ] completed with state SUCCESS. Commit: 2fe81a3
/LLM/main/L0_MergeRequest_PR pipeline #30363 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@galagam
Copy link
Collaborator Author

galagam commented Mar 16, 2026

/bot run

1 similar comment
@galagam
Copy link
Collaborator Author

galagam commented Mar 16, 2026

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39192 [ run ] triggered by Bot. Commit: 2fe81a3 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39192 [ run ] completed with state FAILURE. Commit: 2fe81a3
/LLM/main/L0_MergeRequest_PR pipeline #30444 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@galagam galagam force-pushed the gagam/acc-tests-checkpoints-update branch 2 times, most recently from 7c32a66 to 6f270f6 Compare March 17, 2026 15:01
@galagam
Copy link
Collaborator Author

galagam commented Mar 17, 2026

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39272 [ run ] triggered by Bot. Commit: 6f270f6 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39272 [ run ] completed with state SUCCESS. Commit: 6f270f6
/LLM/main/L0_MergeRequest_PR pipeline #30519 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@galagam
Copy link
Collaborator Author

galagam commented Mar 17, 2026

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39298 [ run ] triggered by Bot. Commit: 6f270f6 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39298 [ run ] completed with state SUCCESS. Commit: 6f270f6
/LLM/main/L0_MergeRequest_PR pipeline #30547 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…ial checkpoints

- update nanov3 and superv3 checkpoints in AD accuracy test
- adjust mmlu and gsm8k reference values accordingly
- enable non-indentical MoE input scales for super fp8 checkpoint
- move super and nano configs under canonical location examples/auto_deploy/model_registry/configs, keep symlinks from original location to avoid breaking dashboards/scripts
- bugfix in superV3 test with attention dp - entire sharding config was overwritten
- unwaive two tests accuracy tests that are passing after the above changes

Note: NanoV3 final bf16 and fp8 accuracy is significantly lower with the official checkpoint (nvfp4 is improved).
      Verified the same accuracy is reported with Pytorch backend.
      This is a checkpoint tradeoff, not implementation-specific.

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

undo unwaive http://nvbugs/5919796

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

align to existing reference records

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

align nano reference

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
@galagam galagam force-pushed the gagam/acc-tests-checkpoints-update branch from 97d8cab to 1691a1d Compare March 18, 2026 07:02
@galagam
Copy link
Collaborator Author

galagam commented Mar 18, 2026

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39421 [ run ] triggered by Bot. Commit: 1691a1d Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39421 [ run ] completed with state SUCCESS. Commit: 1691a1d
/LLM/main/L0_MergeRequest_PR pipeline #30650 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@galagam
Copy link
Collaborator Author

galagam commented Mar 18, 2026

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39443 [ run ] triggered by Bot. Commit: 1691a1d Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39443 [ run ] completed with state SUCCESS. Commit: 1691a1d
/LLM/main/L0_MergeRequest_PR pipeline #30670 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@galagam
Copy link
Collaborator Author

galagam commented Mar 18, 2026

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39476 [ run ] triggered by Bot. Commit: 1691a1d Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #39476 [ run ] completed with state SUCCESS. Commit: 1691a1d
/LLM/main/L0_MergeRequest_PR pipeline #30701 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@galagam galagam merged commit 32be345 into NVIDIA:main Mar 19, 2026
5 checks passed
limin2021 pushed a commit to limin2021/TensorRT-LLM that referenced this pull request Mar 19, 2026
…ial checkpoints (NVIDIA#12243)

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants