Nemotron Ultra & Super launcher examples#1609
Conversation
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
📝 WalkthroughWalkthroughThis PR implements an end-to-end Megatron-LM PTQ pipeline for quantizing, exporting, and validating Nemotron-3 models. A new export wrapper script bridges quantized checkpoints to Hugging Face format; quantization orchestration becomes conditional and persists outputs under standardized ChangesNemotron-3 PTQ Pipeline
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tools/launcher/common/query.py (1)
210-211: 💤 Low valueLGTM!
The fix correctly prevents
num_shardsfrom becoming zero when the dataset is small, which would causedataset.shard()to fail at line 223.
Optional: Consider documenting the sharding heuristics.
The magic numbers (100 samples per shard target, 16 shard cap) reflect non-obvious design decisions that would benefit from a brief comment.
📝 Suggested documentation
if args.num_shards * 100 > len(dataset): + # Shrink num_shards to maintain ~100 samples/shard, capped at [1, 16] args.num_shards = max(1, min(16, len(dataset) // 100))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/common/query.py` around lines 210 - 211, Add a brief inline comment above the num_shards adjustment explaining the sharding heuristic: we target ~100 samples per shard and cap shards at 16 to avoid too many tiny shards, and ensure num_shards stays at least 1 to prevent dataset.shard() failures; reference the adjustment logic that checks "if args.num_shards * 100 > len(dataset): args.num_shards = max(1, min(16, len(dataset) // 100))" and mention the rationale for the constants 100 and 16 so future readers know why those magic numbers were chosen.tools/launcher/common/megatron_lm/quantize/quantize.sh (1)
38-40: ⚡ Quick winInline-export
EXPORT_DIRdiverges from the standalone export wrapper.Here
EXPORT_DIRis/scratchspace/export/...and_QUANT_CFG_TAGkeeps any.yaml/.ymlsuffix. The wrapperexport/export.shinstead uses/cicd/export/...and strips the extension (Lines 38-43 there). So a chained run (quantize inline export → a later vLLM task expecting the wrapper's path) would point at different directories wheneverRUN_EXPORT=trueand/orQUANT_CFGis a recipe file. This pipeline avoids it viaRUN_EXPORT=false, but the mismatch is a latent bug.♻️ Align base path and tag derivation with export.sh
# If QUANT_CFG is a recipe, use the basename _QUANT_CFG_TAG="$(basename "${QUANT_CFG}")" -export EXPORT_DIR="/scratchspace/export/${MLM_MODEL_CFG}_${_QUANT_CFG_TAG}" +_QUANT_CFG_TAG="${_QUANT_CFG_TAG%.yaml}" +_QUANT_CFG_TAG="${_QUANT_CFG_TAG%.yml}" +export EXPORT_DIR="/cicd/export/${MLM_MODEL_CFG}_${_QUANT_CFG_TAG}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/common/megatron_lm/quantize/quantize.sh` around lines 38 - 40, The inline export sets EXPORT_DIR to /scratchspace/export/... and leaves _QUANT_CFG_TAG with the recipe extension, causing a mismatch with export/export.sh which uses /cicd/export/... and strips the .yaml/.yml extension; update the inline logic to mirror export/export.sh by (1) using the same base path (/cicd/export) for EXPORT_DIR and (2) strip QUANT_CFG file extensions when computing _QUANT_CFG_TAG (remove .yaml/.yml), ensuring EXPORT_DIR derives from MLM_MODEL_CFG and the cleaned _QUANT_CFG_TAG so downstream tasks see the same path as export/export.sh (refer to symbols EXPORT_DIR, _QUANT_CFG_TAG, and QUANT_CFG).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/launcher/common/megatron_lm/export/export.sh`:
- Around line 30-32: Update the header doc comments to match the actual default
paths and documented variables used in the script: change the /scratchspace/...
defaults to the /cicd/... paths and add documentation for HF_MODEL_CKPT (the HF
checkpoint default) so the comment block reflects the real defaults for
MLM_MODEL_CKPT, EXPORT_DIR, and HF_MODEL_CKPT; reference the variable names
MLM_MODEL_CKPT, EXPORT_DIR, and HF_MODEL_CKPT in the updated comment so
operators are not misled.
---
Nitpick comments:
In `@tools/launcher/common/megatron_lm/quantize/quantize.sh`:
- Around line 38-40: The inline export sets EXPORT_DIR to
/scratchspace/export/... and leaves _QUANT_CFG_TAG with the recipe extension,
causing a mismatch with export/export.sh which uses /cicd/export/... and strips
the .yaml/.yml extension; update the inline logic to mirror export/export.sh by
(1) using the same base path (/cicd/export) for EXPORT_DIR and (2) strip
QUANT_CFG file extensions when computing _QUANT_CFG_TAG (remove .yaml/.yml),
ensuring EXPORT_DIR derives from MLM_MODEL_CFG and the cleaned _QUANT_CFG_TAG so
downstream tasks see the same path as export/export.sh (refer to symbols
EXPORT_DIR, _QUANT_CFG_TAG, and QUANT_CFG).
In `@tools/launcher/common/query.py`:
- Around line 210-211: Add a brief inline comment above the num_shards
adjustment explaining the sharding heuristic: we target ~100 samples per shard
and cap shards at 16 to avoid too many tiny shards, and ensure num_shards stays
at least 1 to prevent dataset.shard() failures; reference the adjustment logic
that checks "if args.num_shards * 100 > len(dataset): args.num_shards = max(1,
min(16, len(dataset) // 100))" and mention the rationale for the constants 100
and 16 so future readers know why those magic numbers were chosen.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c3f2c55d-967c-4cf9-8e1f-f20b08b83161
📒 Files selected for processing (8)
tools/launcher/common/megatron_lm/export/export.shtools/launcher/common/megatron_lm/quantize/quantize.shtools/launcher/common/query.pytools/launcher/common/vllm/gpqa_smoke.jsonltools/launcher/common/vllm/query.shtools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yamltools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yamltools/launcher/modules/Megatron-LM
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1609 +/- ##
==========================================
- Coverage 77.41% 77.00% -0.41%
==========================================
Files 480 482 +2
Lines 52499 53590 +1091
==========================================
+ Hits 40642 41268 +626
- Misses 11857 12322 +465
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
/claude review |
ChenhanYu
left a comment
There was a problem hiding this comment.
Reviewed the full diff plus query.sh/quantize.sh context. Overall a clean, well-scoped example — factoring export out of quantize.sh into a reusable export.sh with RUN_MMLU/RUN_EXPORT toggles is the right call for a 120B model that needs different parallelism per stage. Inter-task path handoff (/cicd/megatron-lm/...), the task_2 ---separator arg wiring, and the EXPORT_DIR tag that task_2 reads all check out. The query.py max(1, ...) fix is correct and necessary — with the 8-row smoke set and --num-shards 1, the old min(16, 8//100) collapsed to 0 shards.
A few things to address before merge:
1. (also flagged by CodeRabbit) export.sh header defaults are wrong. Lines 36-37 document /scratchspace/... defaults, but the code uses /cicd/... (lines 41 and 49). Update the comment to match.
2. EXPORT_DIR diverges between the two scripts. export.sh writes to /cicd/export/... while quantize.sh still writes to /scratchspace/export/..., and the QUANT_CFG->tag logic differs (export.sh strips .yaml/.yml, quantize.sh does not). In this pipeline RUN_EXPORT=false so quantize's export path is dead, but a future caller running quantize.sh with inline export would get a different dir and tag than the standalone path. Worth unifying the tag logic and the /cicd vs /scratchspace choice so the two paths can't drift.
3. MMLU --fraction bumped 0.01 -> 0.05 in the shared quantize.sh. This is a 5x longer MMLU eval for every example that still runs MMLU inline, not just Nemotron, and it isn't mentioned in the PR description. Please confirm it's intentional for all callers; if it's only meant for this model it shouldn't be in the shared default.
Minor / process:
- The smoke test only validates that the model serves and emits text — responses are dumped to
/cicd/vllm/...jsonland never graded, so thegpqa_smoke.jsonlanswer keys aren't checked by anything. Fine for a smoke test; just won't catch accuracy regressions. - PR template checkboxes are unfilled (notably tests + Changelog) and the Testing section is empty — a line on how this was validated would help, since CONTRIBUTING marks tests mandatory for new examples.
- A one-liner on what the Megatron-LM submodule bump (
86bf476->c69697d) pulls in would help reviewers.
Items 1-3 are the blockers; the rest is polish. Nice work.
There was a problem hiding this comment.
Claude Review Summary
This PR adds a Nemotron-3-Super-120B PTQ launcher example (quantize → export → vLLM smoke), a standalone export wrapper, an MMLU/export gating mechanism in quantize.sh, a small dataset-shard fix in query.py, and a vLLM image bump. It's all under tools/launcher/, so no algorithm/state/export-format changes — risk is contained to the launcher.
Findings
- CRITICAL: 0
- IMPORTANT: 1
- SUGGESTION: 3
Highlights
- [IMPORTANT Compatibility]
quantize.sh(inline-export branch) and the newexport.shderive differentEXPORT_DIRpaths for the same(MLM_MODEL_CFG, QUANT_CFG):/scratchspace/export/...<unstripped tag>vs/cicd/export/...<.yaml/.yml stripped>. Masked here byRUN_EXPORT=false, but a latent bug for any future pipeline that combinesRUN_EXPORT=truewith a recipe-pathQUANT_CFG. - [SUGGESTION]
task_2 --save /cicd/vllm/...super-nvfp4.jsonl—query.pytreats--saveas a directory, so this creates a directory whose name ends in.jsonland writes shards inside it; the YAML comment ("Inspect responses at/cicd/vllm/<model>.jsonl") implies a file, which is misleading. - [SUGGESTION Compatibility]
--fractionwas bumped from0.01→0.05and is hardcoded; existing Qwen3-8B / Qwen3-30B-A3B inline-MMLU runs will now take ~5× longer with no override. Consider anMMLU_FRACTIONenv var. - [SUGGESTION]
export.shOptional-env header documents/scratchspace/...defaults but the script actually defaults to/cicd/...(and addsHF_MODEL_CKPT).
Risk
Low. The PR is launcher-only and the example is cleanly gated behind RUN_MMLU=false RUN_EXPORT=false. The one Important compatibility item is a latent path divergence rather than an active regression.
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
|
/claude review |
There was a problem hiding this comment.
Claude review passed — no blocking issues found. LGTM
Re-reviewed against commit 98765b1a. Prior round's findings have been addressed:
EXPORT_DIRunified betweenquantize.shandexport.sh(/cicd/export/+.yaml/.ymlstrip on both sides) — the latent path-divergence bug is gone.export.shheader defaults now correctly document the/cicd/...paths andHF_MODEL_CKPT.- MMLU
--fractionstays at0.01(no shared-default change). --savein task_2 now points at a directory-style path and the comment matches ("Inspect responses under/cicd/vllm/<model>/").
Verified end-to-end:
- task_2
--modelpath matchesexport.shEXPORT_DIR derivation forMLM_MODEL_CFG=nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16,QUANT_CFG=models/Nemotron-3-Super-120B-A12B/super-nvfp4. - World-size accounting: task_0 (1·1·4·1=4, ntasks=4), task_1 (1·4·1·1=4, ntasks=4), task_2 (vLLM single-process w/ TP=4, ntasks=1).
query.pyshard fix: with 8 prompts and--num-shards 1,max(1, min(16, 0)) = 1, lines up with--shard-id-begin 0 --shard-id-step 1so the single shard gets processed.
Launcher-only PR; no algorithm/state/export-format changes. Risk is low.
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml`:
- Line 32: The QUANT_CFG entry in megatron_lm_ptq.yaml points to
models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib which is actually a
shared, model-agnostic recipe — clarify this by either adding an explicit inline
comment next to the QUANT_CFG key stating that super-nvfp4-max-calib.yaml is
reused across models (model-agnostic/shared), or create/move the recipe to a
neutral shared path/name (e.g.,
modelopt_recipes/models/shared/super-nvfp4-max-calib.yaml) and update QUANT_CFG
to that path so the reference is not misleading; update the QUANT_CFG value
and/or add the comment in the
NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml file and ensure the
recipe filename super-nvfp4-max-calib.yaml is the one referenced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 7f85e840-5ccb-4e92-ab4e-76b43e8829b1
📒 Files selected for processing (3)
tools/launcher/common/vllm/gpqa_sample.jsonltools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yamltools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml
💤 Files with no reviewable changes (1)
- tools/launcher/common/vllm/gpqa_sample.jsonl
🚧 Files skipped from review as they are similar to previous changes (1)
- tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml
| - --calib-size 32 | ||
| environment: | ||
| - MLM_MODEL_CFG: nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 | ||
| - QUANT_CFG: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Search for quantization configs and verify if 120B config is model-specific.
# Find all super-nvfp4-max-calib configs
echo "=== Finding all super-nvfp4-max-calib configs ==="
fd -t f 'super-nvfp4-max-calib' models/
# Check if there's a 550B-specific config
echo -e "\n=== Searching for 550B or Ultra quantization configs ==="
fd -t f -e yaml -e json -e py . models/ | rg -i '550b|ultra'
# Examine the 120B config for model-specific parameters
echo -e "\n=== Examining 120B config for model-specific parameters ==="
fd -t f 'super-nvfp4-max-calib' models/Nemotron-3-Super-120B-A12B/ --exec cat {}Repository: NVIDIA/Model-Optimizer
Length of output: 565
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Repo top-level =="
ls -la
echo -e "\n== Find any 'models' directory in repo =="
fd -td models .
echo -e "\n== Locate the specific YAML under review =="
fd -t f 'megatron_lm_ptq\.yaml' tools/launcher/examples -e '*.lock' -e '*.md'
echo -e "\n== Show the relevant lines around QUANT_CFG =="
TARGET="tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml"
if [ -f "$TARGET" ]; then
nl -ba "$TARGET" | sed -n '20,60p'
else
echo "Missing: $TARGET"
fi
echo -e "\n== Check whether QUANT_CFG path exists as a file in this repo =="
QUANT_CFG="models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib"
if [ -f "$QUANT_CFG" ]; then
echo "Exists: $QUANT_CFG"
nl -ba "$QUANT_CFG" | head -n 200
else
echo "Not found in repo: $QUANT_CFG"
echo "Look for similarly named config files:"
fd -t f 'super-nvfp4-max-calib' . | head -n 200
fiRepository: NVIDIA/Model-Optimizer
Length of output: 2536
🏁 Script executed:
#!/bin/bash
set -euo pipefail
TARGET="tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml"
echo "== Does target YAML exist? =="
if [ -f "$TARGET" ]; then
echo "FOUND: $TARGET"
else
echo "MISSING: $TARGET"
fi
echo -e "\n== tools/launcher/examples YAMLs named megatron_lm_ptq.yaml (top 50) =="
fd -t f 'megatron_lm_ptq\.yaml' tools/launcher/examples | head -n 50 || true
echo -e "\n== Locate the Nemotron Ultra 550B config file path in repo (top 50) =="
fd -t f 'NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16' . | head -n 50 || true
echo -e "\n== Print lines 20-60 of target YAML (if present) =="
if [ -f "$TARGET" ]; then
awk 'NR>=20 && NR<=60 {printf "%d\t%s\n", NR, $0}' "$TARGET"
fi
echo -e "\n== Search for QUANT_CFG target string 'super-nvfp4-max-calib' (top 50 matches) =="
rg -n "super-nvfp4-max-calib" . | head -n 50 || true
echo -e "\n== List files that contain the basename 'super-nvfp4-max-calib' under repo =="
fd -t f 'super-nvfp4-max-calib' . | head -n 50 || true
echo -e "\n== Does path 'models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib' exist as a file anywhere? =="
if [ -f "models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib" ]; then
echo "FOUND as file: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib"
else
echo "NOT found as file: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib"
fi
echo -e "\n== Search for 'Nemotron-3-Super-120B-A12B' occurrences (top 50) =="
rg -n "Nemotron-3-Super-120B-A12B" . | head -n 50 || trueRepository: NVIDIA/Model-Optimizer
Length of output: 8208
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search for QUANT_CFG usage in tools/launcher/common =="
rg -n "QUANT_CFG" tools/launcher/common || true
echo -e "\n== Show relevant parts of quantize.sh =="
QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh"
if [ -f "$QUANT_SCRIPT" ]; then
rg -n "QUANT_CFG" "$QUANT_SCRIPT" || true
awk 'NR>=1 && NR<=200 {print NR "\t" $0}' "$QUANT_SCRIPT" | sed -n '1,160p'
else
echo "Missing $QUANT_SCRIPT"
fi
echo -e "\n== Show relevant parts of export.sh (QUANT_CFG usage) =="
EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh"
if [ -f "$EXPORT_SCRIPT" ]; then
rg -n "QUANT_CFG" "$EXPORT_SCRIPT" || true
awk 'NR>=1 && NR<=200 {print NR "\t" $0}' "$EXPORT_SCRIPT" | sed -n '1,180p'
else
echo "Missing $EXPORT_SCRIPT"
fi
echo -e "\n== Inspect recipe file header for super-nvfp4-max-calib.yaml =="
RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml"
if [ -f "$RECIPE" ]; then
awk 'NR<=120 {print NR "\t" $0}' "$RECIPE" | sed -n '1,120p'
else
echo "Missing $RECIPE"
fiRepository: NVIDIA/Model-Optimizer
Length of output: 14798
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search for QUANT_CFG usage in tools/launcher/common =="
rg -n "QUANT_CFG" tools/launcher/common || true
echo -e "\n== Show relevant parts of quantize.sh =="
QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh"
if [ -f "$QUANT_SCRIPT" ]; then
echo -e "\n-- Lines containing QUANT_CFG in quantize.sh --"
rg -n "QUANT_CFG" "$QUANT_SCRIPT" || true
echo -e "\n-- Snippet around first QUANT_CFG hit --"
FIRST=$(rg -n "QUANT_CFG" "$QUANT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
if [ -n "${FIRST:-}" ]; then
START=$((FIRST-20)); END=$((FIRST+80))
awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$QUANT_SCRIPT"
else
awk 'NR<=220 {print NR "\t" $0}' "$QUANT_SCRIPT"
fi
else
echo "Missing $QUANT_SCRIPT"
fi
echo -e "\n== Show relevant parts of export.sh (QUANT_CFG usage) =="
EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh"
if [ -f "$EXPORT_SCRIPT" ]; then
echo -e "\n-- Lines containing QUANT_CFG in export.sh --"
rg -n "QUANT_CFG" "$EXPORT_SCRIPT" || true
FIRST=$(rg -n "QUANT_CFG" "$EXPORT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
if [ -n "${FIRST:-}" ]; then
START=$((FIRST-20)); END=$((FIRST+80))
awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$EXPORT_SCRIPT"
else
awk 'NR<=220 {print NR "\t" $0}' "$EXPORT_SCRIPT"
fi
else
echo "Missing $EXPORT_SCRIPT"
fi
echo -e "\n== Inspect recipe file header for super-nvfp4-max-calib.yaml =="
RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml"
if [ -f "$RECIPE" ]; then
awk 'NR<=160 {print NR "\t" $0}' "$RECIPE"
else
echo "Missing $RECIPE"
fiRepository: NVIDIA/Model-Optimizer
Length of output: 14190
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search for QUANT_CFG usage in tools/launcher/common =="
rg -n "QUANT_CFG" tools/launcher/common || true
echo -e "\n== Inspect quantize.sh how QUANT_CFG is used =="
QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh"
if [ -f "$QUANT_SCRIPT" ]; then
FIRST=$(rg -n "QUANT_CFG" "$QUANT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
if [ -n "${FIRST:-}" ]; then
START=$((FIRST-25)); END=$((FIRST+120))
awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$QUANT_SCRIPT"
else
echo "No QUANT_CFG string in $QUANT_SCRIPT"
fi
else
echo "Missing $QUANT_SCRIPT"
fi
echo -e "\n== Inspect export.sh how QUANT_CFG is used (if at all) =="
EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh"
if [ -f "$EXPORT_SCRIPT" ]; then
FIRST=$(rg -n "QUANT_CFG" "$EXPORT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
if [ -n "${FIRST:-}" ]; then
START=$((FIRST-25)); END=$((FIRST+120))
awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$EXPORT_SCRIPT"
else
echo "No QUANT_CFG string in $EXPORT_SCRIPT"
fi
else
echo "Missing $EXPORT_SCRIPT"
fi
echo -e "\n== Inspect first ~120 lines of the 120B recipe referenced =="
RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml"
if [ -f "$RECIPE" ]; then
awk 'NR<=120 {print NR "\t" $0}' "$RECIPE"
else
echo "Missing $RECIPE"
fiRepository: NVIDIA/Model-Optimizer
Length of output: 13258
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search for QUANT_CFG usage in tools/launcher/common =="
rg -n "QUANT_CFG" tools/launcher/common || true
echo -e "\n== Inspect quantize.sh how QUANT_CFG is used =="
QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh"
if [ -f "$QUANT_SCRIPT" ]; then
FIRST=$(rg -n "QUANT_CFG" "$QUANT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
if [ -n "${FIRST:-}" ]; then
START=$((FIRST-25)); END=$((FIRST+120))
awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$QUANT_SCRIPT"
else
echo "No QUANT_CFG string in $QUANT_SCRIPT"
fi
else
echo "Missing $QUANT_SCRIPT"
fi
echo -e "\n== Inspect export.sh how QUANT_CFG is used (if at all) =="
EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh"
if [ -f "$EXPORT_SCRIPT" ]; then
FIRST=$(rg -n "QUANT_CFG" "$EXPORT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
if [ -n "${FIRST:-}" ]; then
START=$((FIRST-25)); END=$((FIRST+120))
awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$EXPORT_SCRIPT"
else
echo "No QUANT_CFG string in $EXPORT_SCRIPT"
fi
else
echo "Missing $EXPORT_SCRIPT"
fi
echo -e "\n== Inspect first ~120 lines of the 120B recipe referenced =="
RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml"
if [ -f "$RECIPE" ]; then
awk 'NR<=120 {print NR "\t" $0}' "$RECIPE"
else
echo "Missing $RECIPE"
fiRepository: NVIDIA/Model-Optimizer
Length of output: 13258
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search for super-nvfp4-max-calib recipes in repo =="
fd -t f 'super-nvfp4-max-calib' . | sed -e 's|^\./||' | head -n 200
echo -e "\n== Search for Nemotron-3-Ultra-550B / Ultra in modelopt_recipes =="
rg -n "Ultra-550B|Ultra-5|550B|Nemotron-3-Ultra" modelopt_recipes | head -n 200 || true
echo -e "\n== Inspect QUANT_CFG resolution in tools/launcher/common/megatron_lm/quantize/task.py =="
TASK="tools/launcher/common/megatron_lm/quantize/task.py"
if [ -f "$TASK" ]; then
awk 'NR<=230 {print NR "\t" $0}' "$TASK"
else
echo "Missing $TASK"
fi
echo -e "\n== Search for any helper resolving QUANT_CFG into a recipe path =="
rg -n "quant_cfg|quantCfg|QUANT_CFG.*yaml|NVFP4_DEFAULT_CFG" tools/launcher/common/megatron_lm/quantize tools/launcher/common -S || true
echo -e "\n== Does the exact path without extension exist? =="
if [ -f "modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib" ]; then
echo "Found extension-less 120B config (unexpected)"
else
echo "Extension-less 120B config not found (expected .yaml)"
fi
ls -la modelopt_recipes/models/Nemotron-3-Super-120B-A12B | rg -n "super-nvfp4-max-calib" || trueRepository: NVIDIA/Model-Optimizer
Length of output: 8499
Clarify QUANT_CFG recipe reuse for the 550B pipeline
tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml sets QUANT_CFG to models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib. In this repo, the only matching super-nvfp4-max-calib.yaml recipe exists under modelopt_recipes/models/Nemotron-3-Super-120B-A12B/ (no 550B/Ultra counterpart), so this is likely intentional reuse—but the path is misleading. Add an explicit comment documenting that the recipe is model-agnostic/shared, or move/create a shared (non-120B-named) recipe target to avoid future confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml`
at line 32, The QUANT_CFG entry in megatron_lm_ptq.yaml points to
models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib which is actually a
shared, model-agnostic recipe — clarify this by either adding an explicit inline
comment next to the QUANT_CFG key stating that super-nvfp4-max-calib.yaml is
reused across models (model-agnostic/shared), or create/move the recipe to a
neutral shared path/name (e.g.,
modelopt_recipes/models/shared/super-nvfp4-max-calib.yaml) and update QUANT_CFG
to that path so the reference is not misleading; update the QUANT_CFG value
and/or add the comment in the
NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml file and ensure the
recipe filename super-nvfp4-max-calib.yaml is the one referenced.
There was a problem hiding this comment.
General question on the design of all scripts in this directory. Why do we need yet another export/quantize.sh on top of M-LM's export/quantize.sh?
There was a problem hiding this comment.
I think this calls the scripts in Megatron-LM which are under modules? that's a good question though why can't we just call the scripts in modules/Megatron-LM instead of wrapping them again? @ChenhanYu do you know
There was a problem hiding this comment.
we can address this in a future PR, thanks!
kevalmorabia97
left a comment
There was a problem hiding this comment.
Approving to unblock
|
What does this PR do?
Type of change: New example
New launcher example for Nemotron Super with PTQ + Export + VLLM smoke test on small GPQA-style dataset
Usage
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores