[#11526][chore] AutoDeploy accuracy tests: use nemotron-3 official checkpoints#12243
Conversation
|
/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" |
📝 WalkthroughWalkthroughIntroduces YAML configuration files for NVIDIA Nemotron 3 Nano and Super V3 models with TRT-LLM runtime settings and sharding strategies, adds accuracy reference benchmarks for these models, updates test code to use model mappings via HuggingFace identifiers, and removes associated test skip entries. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can use your project's `pylint` configuration to improve the quality of Python code reviews.Add a pylint configuration file to your project to customize how CodeRabbit runs |
|
PR_Github #39085 [ run ] triggered by Bot. Commit: |
|
PR_Github #39085 [ run ] completed with state
|
|
/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" |
|
PR_Github #39102 [ run ] triggered by Bot. Commit: |
|
PR_Github #39102 [ run ] completed with state |
|
/bot run |
1 similar comment
|
/bot run |
|
PR_Github #39192 [ run ] triggered by Bot. Commit: |
|
PR_Github #39192 [ run ] completed with state
|
7c32a66 to
6f270f6
Compare
|
/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" |
|
PR_Github #39272 [ run ] triggered by Bot. Commit: |
|
PR_Github #39272 [ run ] completed with state |
|
/bot run |
|
PR_Github #39298 [ run ] triggered by Bot. Commit: |
|
PR_Github #39298 [ run ] completed with state
|
…ial checkpoints
- update nanov3 and superv3 checkpoints in AD accuracy test
- adjust mmlu and gsm8k reference values accordingly
- enable non-indentical MoE input scales for super fp8 checkpoint
- move super and nano configs under canonical location examples/auto_deploy/model_registry/configs, keep symlinks from original location to avoid breaking dashboards/scripts
- bugfix in superV3 test with attention dp - entire sharding config was overwritten
- unwaive two tests accuracy tests that are passing after the above changes
Note: NanoV3 final bf16 and fp8 accuracy is significantly lower with the official checkpoint (nvfp4 is improved).
Verified the same accuracy is reported with Pytorch backend.
This is a checkpoint tradeoff, not implementation-specific.
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
undo unwaive http://nvbugs/5919796
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
align to existing reference records
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
align nano reference
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
97d8cab to
1691a1d
Compare
|
/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" |
|
PR_Github #39421 [ run ] triggered by Bot. Commit: |
|
PR_Github #39421 [ run ] completed with state |
|
/bot run |
|
PR_Github #39443 [ run ] triggered by Bot. Commit: |
|
PR_Github #39443 [ run ] completed with state
|
|
/bot run |
|
PR_Github #39476 [ run ] triggered by Bot. Commit: |
|
PR_Github #39476 [ run ] completed with state |
…ial checkpoints (NVIDIA#12243) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Description
examples/auto_deploy/model_registry/configs, keep symlinks from original location to avoid breaking dashboards/scriptsNote:
NanoV3 final bf16 and fp8 accuracy is significantly lower with the official checkpoint (nvfp4 is improved).
Verified the same accuracy is reported with Pytorch backend, which means is a checkpoint tradeoff, not an implementation bug.
Test Coverage
N/A - Test refactoring PR.
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.