Nemotron Ultra & Super launcher examples by jenchen13 · Pull Request #1609 · NVIDIA/Model-Optimizer

jenchen13 · 2026-06-02T20:58:58Z

What does this PR do?

Type of change: New example

New launcher example for Nemotron Super with PTQ + Export + VLLM smoke test on small GPQA-style dataset

Usage

# Usage:
#   source .env-slurm
#   cd tools/launcher
#   uv run launch.py --yaml examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml --yes

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Added checkpoint export capability for quantized models to Hugging Face format.
- Introduced complete quantization pipelines with conditional MMLU evaluation and model export stages.
Bug Fixes
- Fixed num_shards calculation to prevent invalid minimum values.
Documentation
- Updated vLLM version requirements for optimal NVFP4 model performance.
- Enhanced quantization pipeline documentation with improved output paths and conditional execution details.
Chores
- Updated Megatron-LM module to latest version.
- Added sample dataset for model evaluation testing.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai · 2026-06-02T20:59:11Z

📝 Walkthrough

Walkthrough

This PR implements an end-to-end Megatron-LM PTQ pipeline for quantizing, exporting, and validating Nemotron-3 models. A new export wrapper script bridges quantized checkpoints to Hugging Face format; quantization orchestration becomes conditional and persists outputs under standardized /cicd/ paths; and two complete Slurm job YAMLs wire quantization, export, and vLLM smoke validation with explicit parallelism settings and per-task resource allocation.

Changes

Nemotron-3 PTQ Pipeline

Layer / File(s)	Summary
Export-to-HF wrapper script `tools/launcher/common/megatron_lm/export/export.sh`	New bash script that sources utilities, registers error handling, sets defaults for MLM checkpoint/export/HF paths, disables internal installation, forwards CLI args, invokes Megatron-LM export with explicit parallelism parameters, and outputs exported artifacts.
Quantization orchestration and conditional steps `tools/launcher/common/megatron_lm/quantize/quantize.sh`	Updates quantization script to document end-to-end PTQ flow, persist outputs under `/cicd/` paths, derive export directory name from `QUANT_CFG` basename, remove unused `CONVERT_EXE` variable, and wrap MMLU and export stages in conditional blocks with GPU-count and `EXPORT_PP` computation moved inside the export conditional.
Nemotron-3-Super-120B PTQ pipeline job `tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml`	New three-stage Slurm job YAML: task_0 quantizes with fast calibration (TP=1, PP=1, EP=4), task_1 exports with updated parallelism (TP=1, PP=4, EP=1), and task_2 runs vLLM smoke testing against exported checkpoint using GPQA samples. Each task includes container image, resource allocation, and time configuration.
Nemotron-3-Ultra-550B PTQ pipeline job `tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml`	New three-stage Slurm job YAML with equivalent structure: quantization (task_0), export (task_1), and vLLM generation test (task_2) with per-task container images and resource configurations.
vLLM smoke test data and version requirement `tools/launcher/common/vllm/gpqa_sample.jsonl`, `tools/launcher/common/vllm/query.sh`	Adds new GPQA-style JSONL dataset with 8 sample prompts requesting multiple-choice answers and justifications, and updates vLLM inline requirement note from v0.15.0+ to v0.21.0+ for NVFP4 support on Blackwell GPUs.
Configuration updates and utility fixes `tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yaml`, `tools/launcher/common/query.py`, `tools/launcher/modules/Megatron-LM`	Adjusts Nemotron-3 Bridge import example from 8-GPU to 4-GPU node configuration, fixes dataset sharding clamp in query utility to ensure minimum of 1 shard, and bumps Megatron-LM submodule to newer upstream commit.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

ChenhanYu
kevalmorabia97
mxinO

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Nemotron Ultra & Super launcher examples' directly corresponds to the main changes: adding new launcher example YAML files for both Nemotron-3-Super-120B and Nemotron-3-Ultra-550B models, plus supporting infrastructure.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns found: torch.load(weights_only=False), numpy.load(allow_pickle=True), eval/exec, # nosec, or problematic licenses absent from code changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jennifchen/nemotron_examples

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🧹 Nitpick comments (2)

tools/launcher/common/query.py (1)
210-211: 💤 Low value

LGTM!

The fix correctly prevents num_shards from becoming zero when the dataset is small, which would cause dataset.shard() to fail at line 223.

Optional: Consider documenting the sharding heuristics.

The magic numbers (100 samples per shard target, 16 shard cap) reflect non-obvious design decisions that would benefit from a brief comment.
📝 Suggested documentation
 if args.num_shards * 100 > len(dataset):
+    # Shrink num_shards to maintain ~100 samples/shard, capped at [1, 16]
     args.num_shards = max(1, min(16, len(dataset) // 100))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/common/query.py` around lines 210 - 211, Add a brief inline
comment above the num_shards adjustment explaining the sharding heuristic: we
target ~100 samples per shard and cap shards at 16 to avoid too many tiny
shards, and ensure num_shards stays at least 1 to prevent dataset.shard()
failures; reference the adjustment logic that checks "if args.num_shards * 100 >
len(dataset): args.num_shards = max(1, min(16, len(dataset) // 100))" and
mention the rationale for the constants 100 and 16 so future readers know why
those magic numbers were chosen.
tools/launcher/common/megatron_lm/quantize/quantize.sh (1)
38-40: ⚡ Quick win

Inline-export EXPORT_DIR diverges from the standalone export wrapper.

Here EXPORT_DIR is /scratchspace/export/... and _QUANT_CFG_TAG keeps any .yaml/.yml suffix. The wrapper export/export.sh instead uses /cicd/export/... and strips the extension (Lines 38-43 there). So a chained run (quantize inline export → a later vLLM task expecting the wrapper's path) would point at different directories whenever RUN_EXPORT=true and/or QUANT_CFG is a recipe file. This pipeline avoids it via RUN_EXPORT=false, but the mismatch is a latent bug.
♻️ Align base path and tag derivation with export.sh
 # If QUANT_CFG is a recipe, use the basename
 _QUANT_CFG_TAG="$(basename "${QUANT_CFG}")"
-export EXPORT_DIR="/scratchspace/export/${MLM_MODEL_CFG}_${_QUANT_CFG_TAG}"
+_QUANT_CFG_TAG="${_QUANT_CFG_TAG%.yaml}"
+_QUANT_CFG_TAG="${_QUANT_CFG_TAG%.yml}"
+export EXPORT_DIR="/cicd/export/${MLM_MODEL_CFG}_${_QUANT_CFG_TAG}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/common/megatron_lm/quantize/quantize.sh` around lines 38 - 40,
The inline export sets EXPORT_DIR to /scratchspace/export/... and leaves
_QUANT_CFG_TAG with the recipe extension, causing a mismatch with
export/export.sh which uses /cicd/export/... and strips the .yaml/.yml
extension; update the inline logic to mirror export/export.sh by (1) using the
same base path (/cicd/export) for EXPORT_DIR and (2) strip QUANT_CFG file
extensions when computing _QUANT_CFG_TAG (remove .yaml/.yml), ensuring
EXPORT_DIR derives from MLM_MODEL_CFG and the cleaned _QUANT_CFG_TAG so
downstream tasks see the same path as export/export.sh (refer to symbols
EXPORT_DIR, _QUANT_CFG_TAG, and QUANT_CFG).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/launcher/common/megatron_lm/export/export.sh`:
- Around line 30-32: Update the header doc comments to match the actual default
paths and documented variables used in the script: change the /scratchspace/...
defaults to the /cicd/... paths and add documentation for HF_MODEL_CKPT (the HF
checkpoint default) so the comment block reflects the real defaults for
MLM_MODEL_CKPT, EXPORT_DIR, and HF_MODEL_CKPT; reference the variable names
MLM_MODEL_CKPT, EXPORT_DIR, and HF_MODEL_CKPT in the updated comment so
operators are not misled.

---

Nitpick comments:
In `@tools/launcher/common/megatron_lm/quantize/quantize.sh`:
- Around line 38-40: The inline export sets EXPORT_DIR to
/scratchspace/export/... and leaves _QUANT_CFG_TAG with the recipe extension,
causing a mismatch with export/export.sh which uses /cicd/export/... and strips
the .yaml/.yml extension; update the inline logic to mirror export/export.sh by
(1) using the same base path (/cicd/export) for EXPORT_DIR and (2) strip
QUANT_CFG file extensions when computing _QUANT_CFG_TAG (remove .yaml/.yml),
ensuring EXPORT_DIR derives from MLM_MODEL_CFG and the cleaned _QUANT_CFG_TAG so
downstream tasks see the same path as export/export.sh (refer to symbols
EXPORT_DIR, _QUANT_CFG_TAG, and QUANT_CFG).

In `@tools/launcher/common/query.py`:
- Around line 210-211: Add a brief inline comment above the num_shards
adjustment explaining the sharding heuristic: we target ~100 samples per shard
and cap shards at 16 to avoid too many tiny shards, and ensure num_shards stays
at least 1 to prevent dataset.shard() failures; reference the adjustment logic
that checks "if args.num_shards * 100 > len(dataset): args.num_shards = max(1,
min(16, len(dataset) // 100))" and mention the rationale for the constants 100
and 16 so future readers know why those magic numbers were chosen.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c3f2c55d-967c-4cf9-8e1f-f20b08b83161

📥 Commits

Reviewing files that changed from the base of the PR and between f21977a and 96ce9b9.

📒 Files selected for processing (8)

tools/launcher/common/megatron_lm/export/export.sh
tools/launcher/common/megatron_lm/quantize/quantize.sh
tools/launcher/common/query.py
tools/launcher/common/vllm/gpqa_smoke.jsonl
tools/launcher/common/vllm/query.sh
tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yaml
tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml
tools/launcher/modules/Megatron-LM

codecov · 2026-06-02T21:13:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.00%. Comparing base (8f96832) to head (97a4b3b).
⚠️ Report is 12 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1609      +/-   ##
==========================================
- Coverage   77.41%   77.00%   -0.41%     
==========================================
  Files         480      482       +2     
  Lines       52499    53590    +1091     
==========================================
+ Hits        40642    41268     +626     
- Misses      11857    12322     +465

Flag	Coverage Δ
regression	`15.23% <ø> (+0.10%)`	⬆️
unit	`53.93% <ø> (+0.17%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ChenhanYu · 2026-06-02T22:57:35Z

/claude review

ChenhanYu

Reviewed the full diff plus query.sh/quantize.sh context. Overall a clean, well-scoped example — factoring export out of quantize.sh into a reusable export.sh with RUN_MMLU/RUN_EXPORT toggles is the right call for a 120B model that needs different parallelism per stage. Inter-task path handoff (/cicd/megatron-lm/...), the task_2 ---separator arg wiring, and the EXPORT_DIR tag that task_2 reads all check out. The query.py max(1, ...) fix is correct and necessary — with the 8-row smoke set and --num-shards 1, the old min(16, 8//100) collapsed to 0 shards.

A few things to address before merge:

1. (also flagged by CodeRabbit) export.sh header defaults are wrong. Lines 36-37 document /scratchspace/... defaults, but the code uses /cicd/... (lines 41 and 49). Update the comment to match.

2. EXPORT_DIR diverges between the two scripts. export.sh writes to /cicd/export/... while quantize.sh still writes to /scratchspace/export/..., and the QUANT_CFG->tag logic differs (export.sh strips .yaml/.yml, quantize.sh does not). In this pipeline RUN_EXPORT=false so quantize's export path is dead, but a future caller running quantize.sh with inline export would get a different dir and tag than the standalone path. Worth unifying the tag logic and the /cicd vs /scratchspace choice so the two paths can't drift.

3. MMLU --fraction bumped 0.01 -> 0.05 in the shared quantize.sh. This is a 5x longer MMLU eval for every example that still runs MMLU inline, not just Nemotron, and it isn't mentioned in the PR description. Please confirm it's intentional for all callers; if it's only meant for this model it shouldn't be in the shared default.

Minor / process:

The smoke test only validates that the model serves and emits text — responses are dumped to /cicd/vllm/...jsonl and never graded, so the gpqa_smoke.jsonl answer keys aren't checked by anything. Fine for a smoke test; just won't catch accuracy regressions.
PR template checkboxes are unfilled (notably tests + Changelog) and the Testing section is empty — a line on how this was validated would help, since CONTRIBUTING marks tests mandatory for new examples.
A one-liner on what the Megatron-LM submodule bump (86bf476 -> c69697d) pulls in would help reviewers.

Items 1-3 are the blockers; the rest is polish. Nice work.

claude

Claude Review Summary

This PR adds a Nemotron-3-Super-120B PTQ launcher example (quantize → export → vLLM smoke), a standalone export wrapper, an MMLU/export gating mechanism in quantize.sh, a small dataset-shard fix in query.py, and a vLLM image bump. It's all under tools/launcher/, so no algorithm/state/export-format changes — risk is contained to the launcher.

Findings

CRITICAL: 0
IMPORTANT: 1
SUGGESTION: 3

Highlights

[IMPORTANT Compatibility] quantize.sh (inline-export branch) and the new export.sh derive different EXPORT_DIR paths for the same (MLM_MODEL_CFG, QUANT_CFG): /scratchspace/export/...<unstripped tag> vs /cicd/export/...<.yaml/.yml stripped>. Masked here by RUN_EXPORT=false, but a latent bug for any future pipeline that combines RUN_EXPORT=true with a recipe-path QUANT_CFG.
[SUGGESTION] task_2 --save /cicd/vllm/...super-nvfp4.jsonl — query.py treats --save as a directory, so this creates a directory whose name ends in .jsonl and writes shards inside it; the YAML comment ("Inspect responses at /cicd/vllm/<model>.jsonl") implies a file, which is misleading.
[SUGGESTION Compatibility] --fraction was bumped from 0.01 → 0.05 and is hardcoded; existing Qwen3-8B / Qwen3-30B-A3B inline-MMLU runs will now take ~5× longer with no override. Consider an MMLU_FRACTION env var.
[SUGGESTION] export.sh Optional-env header documents /scratchspace/... defaults but the script actually defaults to /cicd/... (and adds HF_MODEL_CKPT).

Risk

Low. The PR is launcher-only and the example is cleanly gated behind RUN_MMLU=false RUN_EXPORT=false. The one Important compatibility item is a latent path divergence rather than an active regression.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 · 2026-06-03T17:41:27Z

/claude review

claude

Claude review passed — no blocking issues found. LGTM

Re-reviewed against commit 98765b1a. Prior round's findings have been addressed:

EXPORT_DIR unified between quantize.sh and export.sh (/cicd/export/ + .yaml/.yml strip on both sides) — the latent path-divergence bug is gone.
export.sh header defaults now correctly document the /cicd/... paths and HF_MODEL_CKPT.
MMLU --fraction stays at 0.01 (no shared-default change).
--save in task_2 now points at a directory-style path and the comment matches ("Inspect responses under /cicd/vllm/<model>/").

Verified end-to-end:

task_2 --model path matches export.sh EXPORT_DIR derivation for MLM_MODEL_CFG=nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16, QUANT_CFG=models/Nemotron-3-Super-120B-A12B/super-nvfp4.
World-size accounting: task_0 (1·1·4·1=4, ntasks=4), task_1 (1·4·1·1=4, ntasks=4), task_2 (vLLM single-process w/ TP=4, ntasks=1).
query.py shard fix: with 8 prompts and --num-shards 1, max(1, min(16, 0)) = 1, lines up with --shard-id-begin 0 --shard-id-step 1 so the single shard gets processed.

Launcher-only PR; no algorithm/state/export-format changes. Risk is low.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml`:
- Line 32: The QUANT_CFG entry in megatron_lm_ptq.yaml points to
models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib which is actually a
shared, model-agnostic recipe — clarify this by either adding an explicit inline
comment next to the QUANT_CFG key stating that super-nvfp4-max-calib.yaml is
reused across models (model-agnostic/shared), or create/move the recipe to a
neutral shared path/name (e.g.,
modelopt_recipes/models/shared/super-nvfp4-max-calib.yaml) and update QUANT_CFG
to that path so the reference is not misleading; update the QUANT_CFG value
and/or add the comment in the
NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml file and ensure the
recipe filename super-nvfp4-max-calib.yaml is the one referenced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7f85e840-5ccb-4e92-ab4e-76b43e8829b1

📥 Commits

Reviewing files that changed from the base of the PR and between 98765b1 and 97a4b3b.

📒 Files selected for processing (3)

tools/launcher/common/vllm/gpqa_sample.jsonl
tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml
tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml

💤 Files with no reviewable changes (1)

tools/launcher/common/vllm/gpqa_sample.jsonl

🚧 Files skipped from review as they are similar to previous changes (1)

tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml

coderabbitai · 2026-06-04T14:13:32Z

+      - --calib-size 32
+    environment:
+      - MLM_MODEL_CFG: nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
+      - QUANT_CFG: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Search for quantization configs and verify if 120B config is model-specific. # Find all super-nvfp4-max-calib configs echo "=== Finding all super-nvfp4-max-calib configs ===" fd -t f 'super-nvfp4-max-calib' models/ # Check if there's a 550B-specific config echo -e "\n=== Searching for 550B or Ultra quantization configs ===" fd -t f -e yaml -e json -e py . models/ | rg -i '550b|ultra' # Examine the 120B config for model-specific parameters echo -e "\n=== Examining 120B config for model-specific parameters ===" fd -t f 'super-nvfp4-max-calib' models/Nemotron-3-Super-120B-A12B/ --exec cat {}

Repository: NVIDIA/Model-Optimizer

Length of output: 565

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Repo top-level ==" ls -la echo -e "\n== Find any 'models' directory in repo ==" fd -td models . echo -e "\n== Locate the specific YAML under review ==" fd -t f 'megatron_lm_ptq\.yaml' tools/launcher/examples -e '*.lock' -e '*.md' echo -e "\n== Show the relevant lines around QUANT_CFG ==" TARGET="tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml" if [ -f "$TARGET" ]; then nl -ba "$TARGET" | sed -n '20,60p' else echo "Missing: $TARGET" fi echo -e "\n== Check whether QUANT_CFG path exists as a file in this repo ==" QUANT_CFG="models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib" if [ -f "$QUANT_CFG" ]; then echo "Exists: $QUANT_CFG" nl -ba "$QUANT_CFG" | head -n 200 else echo "Not found in repo: $QUANT_CFG" echo "Look for similarly named config files:" fd -t f 'super-nvfp4-max-calib' . | head -n 200 fi

Repository: NVIDIA/Model-Optimizer

Length of output: 2536

🏁 Script executed:

#!/bin/bash set -euo pipefail TARGET="tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml" echo "== Does target YAML exist? ==" if [ -f "$TARGET" ]; then echo "FOUND: $TARGET" else echo "MISSING: $TARGET" fi echo -e "\n== tools/launcher/examples YAMLs named megatron_lm_ptq.yaml (top 50) ==" fd -t f 'megatron_lm_ptq\.yaml' tools/launcher/examples | head -n 50 || true echo -e "\n== Locate the Nemotron Ultra 550B config file path in repo (top 50) ==" fd -t f 'NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16' . | head -n 50 || true echo -e "\n== Print lines 20-60 of target YAML (if present) ==" if [ -f "$TARGET" ]; then awk 'NR>=20 && NR<=60 {printf "%d\t%s\n", NR, $0}' "$TARGET" fi echo -e "\n== Search for QUANT_CFG target string 'super-nvfp4-max-calib' (top 50 matches) ==" rg -n "super-nvfp4-max-calib" . | head -n 50 || true echo -e "\n== List files that contain the basename 'super-nvfp4-max-calib' under repo ==" fd -t f 'super-nvfp4-max-calib' . | head -n 50 || true echo -e "\n== Does path 'models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib' exist as a file anywhere? ==" if [ -f "models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib" ]; then echo "FOUND as file: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib" else echo "NOT found as file: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib" fi echo -e "\n== Search for 'Nemotron-3-Super-120B-A12B' occurrences (top 50) ==" rg -n "Nemotron-3-Super-120B-A12B" . | head -n 50 || true

Repository: NVIDIA/Model-Optimizer

Length of output: 8208

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Search for QUANT_CFG usage in tools/launcher/common ==" rg -n "QUANT_CFG" tools/launcher/common || true echo -e "\n== Show relevant parts of quantize.sh ==" QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh" if [ -f "$QUANT_SCRIPT" ]; then rg -n "QUANT_CFG" "$QUANT_SCRIPT" || true awk 'NR>=1 && NR<=200 {print NR "\t" $0}' "$QUANT_SCRIPT" | sed -n '1,160p' else echo "Missing $QUANT_SCRIPT" fi echo -e "\n== Show relevant parts of export.sh (QUANT_CFG usage) ==" EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh" if [ -f "$EXPORT_SCRIPT" ]; then rg -n "QUANT_CFG" "$EXPORT_SCRIPT" || true awk 'NR>=1 && NR<=200 {print NR "\t" $0}' "$EXPORT_SCRIPT" | sed -n '1,180p' else echo "Missing $EXPORT_SCRIPT" fi echo -e "\n== Inspect recipe file header for super-nvfp4-max-calib.yaml ==" RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml" if [ -f "$RECIPE" ]; then awk 'NR<=120 {print NR "\t" $0}' "$RECIPE" | sed -n '1,120p' else echo "Missing $RECIPE" fi

Repository: NVIDIA/Model-Optimizer

Length of output: 14798

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Search for QUANT_CFG usage in tools/launcher/common ==" rg -n "QUANT_CFG" tools/launcher/common || true echo -e "\n== Show relevant parts of quantize.sh ==" QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh" if [ -f "$QUANT_SCRIPT" ]; then echo -e "\n-- Lines containing QUANT_CFG in quantize.sh --" rg -n "QUANT_CFG" "$QUANT_SCRIPT" || true echo -e "\n-- Snippet around first QUANT_CFG hit --" FIRST=$(rg -n "QUANT_CFG" "$QUANT_SCRIPT" | head -n 1 | cut -d: -f1 || true) if [ -n "${FIRST:-}" ]; then START=$((FIRST-20)); END=$((FIRST+80)) awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$QUANT_SCRIPT" else awk 'NR<=220 {print NR "\t" $0}' "$QUANT_SCRIPT" fi else echo "Missing $QUANT_SCRIPT" fi echo -e "\n== Show relevant parts of export.sh (QUANT_CFG usage) ==" EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh" if [ -f "$EXPORT_SCRIPT" ]; then echo -e "\n-- Lines containing QUANT_CFG in export.sh --" rg -n "QUANT_CFG" "$EXPORT_SCRIPT" || true FIRST=$(rg -n "QUANT_CFG" "$EXPORT_SCRIPT" | head -n 1 | cut -d: -f1 || true) if [ -n "${FIRST:-}" ]; then START=$((FIRST-20)); END=$((FIRST+80)) awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$EXPORT_SCRIPT" else awk 'NR<=220 {print NR "\t" $0}' "$EXPORT_SCRIPT" fi else echo "Missing $EXPORT_SCRIPT" fi echo -e "\n== Inspect recipe file header for super-nvfp4-max-calib.yaml ==" RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml" if [ -f "$RECIPE" ]; then awk 'NR<=160 {print NR "\t" $0}' "$RECIPE" else echo "Missing $RECIPE" fi

Repository: NVIDIA/Model-Optimizer

Length of output: 14190

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Search for QUANT_CFG usage in tools/launcher/common ==" rg -n "QUANT_CFG" tools/launcher/common || true echo -e "\n== Inspect quantize.sh how QUANT_CFG is used ==" QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh" if [ -f "$QUANT_SCRIPT" ]; then FIRST=$(rg -n "QUANT_CFG" "$QUANT_SCRIPT" | head -n 1 | cut -d: -f1 || true) if [ -n "${FIRST:-}" ]; then START=$((FIRST-25)); END=$((FIRST+120)) awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$QUANT_SCRIPT" else echo "No QUANT_CFG string in $QUANT_SCRIPT" fi else echo "Missing $QUANT_SCRIPT" fi echo -e "\n== Inspect export.sh how QUANT_CFG is used (if at all) ==" EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh" if [ -f "$EXPORT_SCRIPT" ]; then FIRST=$(rg -n "QUANT_CFG" "$EXPORT_SCRIPT" | head -n 1 | cut -d: -f1 || true) if [ -n "${FIRST:-}" ]; then START=$((FIRST-25)); END=$((FIRST+120)) awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$EXPORT_SCRIPT" else echo "No QUANT_CFG string in $EXPORT_SCRIPT" fi else echo "Missing $EXPORT_SCRIPT" fi echo -e "\n== Inspect first ~120 lines of the 120B recipe referenced ==" RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml" if [ -f "$RECIPE" ]; then awk 'NR<=120 {print NR "\t" $0}' "$RECIPE" else echo "Missing $RECIPE" fi

Repository: NVIDIA/Model-Optimizer

Length of output: 13258

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Search for QUANT_CFG usage in tools/launcher/common ==" rg -n "QUANT_CFG" tools/launcher/common || true echo -e "\n== Inspect quantize.sh how QUANT_CFG is used ==" QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh" if [ -f "$QUANT_SCRIPT" ]; then FIRST=$(rg -n "QUANT_CFG" "$QUANT_SCRIPT" | head -n 1 | cut -d: -f1 || true) if [ -n "${FIRST:-}" ]; then START=$((FIRST-25)); END=$((FIRST+120)) awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$QUANT_SCRIPT" else echo "No QUANT_CFG string in $QUANT_SCRIPT" fi else echo "Missing $QUANT_SCRIPT" fi echo -e "\n== Inspect export.sh how QUANT_CFG is used (if at all) ==" EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh" if [ -f "$EXPORT_SCRIPT" ]; then FIRST=$(rg -n "QUANT_CFG" "$EXPORT_SCRIPT" | head -n 1 | cut -d: -f1 || true) if [ -n "${FIRST:-}" ]; then START=$((FIRST-25)); END=$((FIRST+120)) awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$EXPORT_SCRIPT" else echo "No QUANT_CFG string in $EXPORT_SCRIPT" fi else echo "Missing $EXPORT_SCRIPT" fi echo -e "\n== Inspect first ~120 lines of the 120B recipe referenced ==" RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml" if [ -f "$RECIPE" ]; then awk 'NR<=120 {print NR "\t" $0}' "$RECIPE" else echo "Missing $RECIPE" fi

Repository: NVIDIA/Model-Optimizer

Length of output: 13258

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Search for super-nvfp4-max-calib recipes in repo ==" fd -t f 'super-nvfp4-max-calib' . | sed -e 's|^\./||' | head -n 200 echo -e "\n== Search for Nemotron-3-Ultra-550B / Ultra in modelopt_recipes ==" rg -n "Ultra-550B|Ultra-5|550B|Nemotron-3-Ultra" modelopt_recipes | head -n 200 || true echo -e "\n== Inspect QUANT_CFG resolution in tools/launcher/common/megatron_lm/quantize/task.py ==" TASK="tools/launcher/common/megatron_lm/quantize/task.py" if [ -f "$TASK" ]; then awk 'NR<=230 {print NR "\t" $0}' "$TASK" else echo "Missing $TASK" fi echo -e "\n== Search for any helper resolving QUANT_CFG into a recipe path ==" rg -n "quant_cfg|quantCfg|QUANT_CFG.*yaml|NVFP4_DEFAULT_CFG" tools/launcher/common/megatron_lm/quantize tools/launcher/common -S || true echo -e "\n== Does the exact path without extension exist? ==" if [ -f "modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib" ]; then echo "Found extension-less 120B config (unexpected)" else echo "Extension-less 120B config not found (expected .yaml)" fi ls -la modelopt_recipes/models/Nemotron-3-Super-120B-A12B | rg -n "super-nvfp4-max-calib" || true

Repository: NVIDIA/Model-Optimizer

Length of output: 8499

Clarify QUANT_CFG recipe reuse for the 550B pipeline

tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml sets QUANT_CFG to models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib. In this repo, the only matching super-nvfp4-max-calib.yaml recipe exists under modelopt_recipes/models/Nemotron-3-Super-120B-A12B/ (no 550B/Ultra counterpart), so this is likely intentional reuse—but the path is misleading. Add an explicit comment documenting that the recipe is model-agnostic/shared, or move/create a shared (non-120B-named) recipe target to avoid future confusion.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml` at line 32, The QUANT_CFG entry in megatron_lm_ptq.yaml points to models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib which is actually a shared, model-agnostic recipe — clarify this by either adding an explicit inline comment next to the QUANT_CFG key stating that super-nvfp4-max-calib.yaml is reused across models (model-agnostic/shared), or create/move the recipe to a neutral shared path/name (e.g., modelopt_recipes/models/shared/super-nvfp4-max-calib.yaml) and update QUANT_CFG to that path so the reference is not misleading; update the QUANT_CFG value and/or add the comment in the NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml file and ensure the recipe filename super-nvfp4-max-calib.yaml is the one referenced.

kevalmorabia97 · 2026-06-04T14:29:27Z

General question on the design of all scripts in this directory. Why do we need yet another export/quantize.sh on top of M-LM's export/quantize.sh?

I think this calls the scripts in Megatron-LM which are under modules? that's a good question though why can't we just call the scripts in modules/Megatron-LM instead of wrapping them again? @ChenhanYu do you know

we can address this in a future PR, thanks!

kevalmorabia97

Approving to unblock

github-actions · 2026-06-04T17:47:25Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-04 17:47 UTC

nemotron launcher example

96ce9b9

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 requested review from ChenhanYu and kevalmorabia97 June 2, 2026 20:59

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread tools/launcher/common/megatron_lm/export/export.sh Outdated

ChenhanYu reviewed Jun 2, 2026

View reviewed changes