Skip to content

Nemotron Ultra & Super launcher examples#1609

Merged
jenchen13 merged 3 commits into
mainfrom
jennifchen/nemotron_examples
Jun 4, 2026
Merged

Nemotron Ultra & Super launcher examples#1609
jenchen13 merged 3 commits into
mainfrom
jennifchen/nemotron_examples

Conversation

@jenchen13
Copy link
Copy Markdown
Contributor

@jenchen13 jenchen13 commented Jun 2, 2026

What does this PR do?

Type of change: New example

New launcher example for Nemotron Super with PTQ + Export + VLLM smoke test on small GPQA-style dataset

Usage

# Usage:
#   source .env-slurm
#   cd tools/launcher
#   uv run launch.py --yaml examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml --yes

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A
  • Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features

    • Added checkpoint export capability for quantized models to Hugging Face format.
    • Introduced complete quantization pipelines with conditional MMLU evaluation and model export stages.
  • Bug Fixes

    • Fixed num_shards calculation to prevent invalid minimum values.
  • Documentation

    • Updated vLLM version requirements for optimal NVFP4 model performance.
    • Enhanced quantization pipeline documentation with improved output paths and conditional execution details.
  • Chores

    • Updated Megatron-LM module to latest version.
    • Added sample dataset for model evaluation testing.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements an end-to-end Megatron-LM PTQ pipeline for quantizing, exporting, and validating Nemotron-3 models. A new export wrapper script bridges quantized checkpoints to Hugging Face format; quantization orchestration becomes conditional and persists outputs under standardized /cicd/ paths; and two complete Slurm job YAMLs wire quantization, export, and vLLM smoke validation with explicit parallelism settings and per-task resource allocation.

Changes

Nemotron-3 PTQ Pipeline

Layer / File(s) Summary
Export-to-HF wrapper script
tools/launcher/common/megatron_lm/export/export.sh
New bash script that sources utilities, registers error handling, sets defaults for MLM checkpoint/export/HF paths, disables internal installation, forwards CLI args, invokes Megatron-LM export with explicit parallelism parameters, and outputs exported artifacts.
Quantization orchestration and conditional steps
tools/launcher/common/megatron_lm/quantize/quantize.sh
Updates quantization script to document end-to-end PTQ flow, persist outputs under /cicd/ paths, derive export directory name from QUANT_CFG basename, remove unused CONVERT_EXE variable, and wrap MMLU and export stages in conditional blocks with GPU-count and EXPORT_PP computation moved inside the export conditional.
Nemotron-3-Super-120B PTQ pipeline job
tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml
New three-stage Slurm job YAML: task_0 quantizes with fast calibration (TP=1, PP=1, EP=4), task_1 exports with updated parallelism (TP=1, PP=4, EP=1), and task_2 runs vLLM smoke testing against exported checkpoint using GPQA samples. Each task includes container image, resource allocation, and time configuration.
Nemotron-3-Ultra-550B PTQ pipeline job
tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml
New three-stage Slurm job YAML with equivalent structure: quantization (task_0), export (task_1), and vLLM generation test (task_2) with per-task container images and resource configurations.
vLLM smoke test data and version requirement
tools/launcher/common/vllm/gpqa_sample.jsonl, tools/launcher/common/vllm/query.sh
Adds new GPQA-style JSONL dataset with 8 sample prompts requesting multiple-choice answers and justifications, and updates vLLM inline requirement note from v0.15.0+ to v0.21.0+ for NVFP4 support on Blackwell GPUs.
Configuration updates and utility fixes
tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yaml, tools/launcher/common/query.py, tools/launcher/modules/Megatron-LM
Adjusts Nemotron-3 Bridge import example from 8-GPU to 4-GPU node configuration, fixes dataset sharding clamp in query utility to ensure minimum of 1 shard, and bumps Megatron-LM submodule to newer upstream commit.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • ChenhanYu
  • kevalmorabia97
  • mxinO
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Nemotron Ultra & Super launcher examples' directly corresponds to the main changes: adding new launcher example YAML files for both Nemotron-3-Super-120B and Nemotron-3-Ultra-550B models, plus supporting infrastructure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns found: torch.load(weights_only=False), numpy.load(allow_pickle=True), eval/exec, # nosec, or problematic licenses absent from code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jennifchen/nemotron_examples

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🧹 Nitpick comments (2)
tools/launcher/common/query.py (1)

210-211: 💤 Low value

LGTM!

The fix correctly prevents num_shards from becoming zero when the dataset is small, which would cause dataset.shard() to fail at line 223.


Optional: Consider documenting the sharding heuristics.

The magic numbers (100 samples per shard target, 16 shard cap) reflect non-obvious design decisions that would benefit from a brief comment.

📝 Suggested documentation
 if args.num_shards * 100 > len(dataset):
+    # Shrink num_shards to maintain ~100 samples/shard, capped at [1, 16]
     args.num_shards = max(1, min(16, len(dataset) // 100))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/common/query.py` around lines 210 - 211, Add a brief inline
comment above the num_shards adjustment explaining the sharding heuristic: we
target ~100 samples per shard and cap shards at 16 to avoid too many tiny
shards, and ensure num_shards stays at least 1 to prevent dataset.shard()
failures; reference the adjustment logic that checks "if args.num_shards * 100 >
len(dataset): args.num_shards = max(1, min(16, len(dataset) // 100))" and
mention the rationale for the constants 100 and 16 so future readers know why
those magic numbers were chosen.
tools/launcher/common/megatron_lm/quantize/quantize.sh (1)

38-40: ⚡ Quick win

Inline-export EXPORT_DIR diverges from the standalone export wrapper.

Here EXPORT_DIR is /scratchspace/export/... and _QUANT_CFG_TAG keeps any .yaml/.yml suffix. The wrapper export/export.sh instead uses /cicd/export/... and strips the extension (Lines 38-43 there). So a chained run (quantize inline export → a later vLLM task expecting the wrapper's path) would point at different directories whenever RUN_EXPORT=true and/or QUANT_CFG is a recipe file. This pipeline avoids it via RUN_EXPORT=false, but the mismatch is a latent bug.

♻️ Align base path and tag derivation with export.sh
 # If QUANT_CFG is a recipe, use the basename
 _QUANT_CFG_TAG="$(basename "${QUANT_CFG}")"
-export EXPORT_DIR="/scratchspace/export/${MLM_MODEL_CFG}_${_QUANT_CFG_TAG}"
+_QUANT_CFG_TAG="${_QUANT_CFG_TAG%.yaml}"
+_QUANT_CFG_TAG="${_QUANT_CFG_TAG%.yml}"
+export EXPORT_DIR="/cicd/export/${MLM_MODEL_CFG}_${_QUANT_CFG_TAG}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/common/megatron_lm/quantize/quantize.sh` around lines 38 - 40,
The inline export sets EXPORT_DIR to /scratchspace/export/... and leaves
_QUANT_CFG_TAG with the recipe extension, causing a mismatch with
export/export.sh which uses /cicd/export/... and strips the .yaml/.yml
extension; update the inline logic to mirror export/export.sh by (1) using the
same base path (/cicd/export) for EXPORT_DIR and (2) strip QUANT_CFG file
extensions when computing _QUANT_CFG_TAG (remove .yaml/.yml), ensuring
EXPORT_DIR derives from MLM_MODEL_CFG and the cleaned _QUANT_CFG_TAG so
downstream tasks see the same path as export/export.sh (refer to symbols
EXPORT_DIR, _QUANT_CFG_TAG, and QUANT_CFG).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/launcher/common/megatron_lm/export/export.sh`:
- Around line 30-32: Update the header doc comments to match the actual default
paths and documented variables used in the script: change the /scratchspace/...
defaults to the /cicd/... paths and add documentation for HF_MODEL_CKPT (the HF
checkpoint default) so the comment block reflects the real defaults for
MLM_MODEL_CKPT, EXPORT_DIR, and HF_MODEL_CKPT; reference the variable names
MLM_MODEL_CKPT, EXPORT_DIR, and HF_MODEL_CKPT in the updated comment so
operators are not misled.

---

Nitpick comments:
In `@tools/launcher/common/megatron_lm/quantize/quantize.sh`:
- Around line 38-40: The inline export sets EXPORT_DIR to
/scratchspace/export/... and leaves _QUANT_CFG_TAG with the recipe extension,
causing a mismatch with export/export.sh which uses /cicd/export/... and strips
the .yaml/.yml extension; update the inline logic to mirror export/export.sh by
(1) using the same base path (/cicd/export) for EXPORT_DIR and (2) strip
QUANT_CFG file extensions when computing _QUANT_CFG_TAG (remove .yaml/.yml),
ensuring EXPORT_DIR derives from MLM_MODEL_CFG and the cleaned _QUANT_CFG_TAG so
downstream tasks see the same path as export/export.sh (refer to symbols
EXPORT_DIR, _QUANT_CFG_TAG, and QUANT_CFG).

In `@tools/launcher/common/query.py`:
- Around line 210-211: Add a brief inline comment above the num_shards
adjustment explaining the sharding heuristic: we target ~100 samples per shard
and cap shards at 16 to avoid too many tiny shards, and ensure num_shards stays
at least 1 to prevent dataset.shard() failures; reference the adjustment logic
that checks "if args.num_shards * 100 > len(dataset): args.num_shards = max(1,
min(16, len(dataset) // 100))" and mention the rationale for the constants 100
and 16 so future readers know why those magic numbers were chosen.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c3f2c55d-967c-4cf9-8e1f-f20b08b83161

📥 Commits

Reviewing files that changed from the base of the PR and between f21977a and 96ce9b9.

📒 Files selected for processing (8)
  • tools/launcher/common/megatron_lm/export/export.sh
  • tools/launcher/common/megatron_lm/quantize/quantize.sh
  • tools/launcher/common/query.py
  • tools/launcher/common/vllm/gpqa_smoke.jsonl
  • tools/launcher/common/vllm/query.sh
  • tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yaml
  • tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml
  • tools/launcher/modules/Megatron-LM

Comment thread tools/launcher/common/megatron_lm/export/export.sh Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.00%. Comparing base (8f96832) to head (97a4b3b).
⚠️ Report is 12 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1609      +/-   ##
==========================================
- Coverage   77.41%   77.00%   -0.41%     
==========================================
  Files         480      482       +2     
  Lines       52499    53590    +1091     
==========================================
+ Hits        40642    41268     +626     
- Misses      11857    12322     +465     
Flag Coverage Δ
regression 15.23% <ø> (+0.10%) ⬆️
unit 53.93% <ø> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ChenhanYu
Copy link
Copy Markdown
Collaborator

/claude review

Copy link
Copy Markdown
Collaborator

@ChenhanYu ChenhanYu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full diff plus query.sh/quantize.sh context. Overall a clean, well-scoped example — factoring export out of quantize.sh into a reusable export.sh with RUN_MMLU/RUN_EXPORT toggles is the right call for a 120B model that needs different parallelism per stage. Inter-task path handoff (/cicd/megatron-lm/...), the task_2 ---separator arg wiring, and the EXPORT_DIR tag that task_2 reads all check out. The query.py max(1, ...) fix is correct and necessary — with the 8-row smoke set and --num-shards 1, the old min(16, 8//100) collapsed to 0 shards.

A few things to address before merge:

1. (also flagged by CodeRabbit) export.sh header defaults are wrong. Lines 36-37 document /scratchspace/... defaults, but the code uses /cicd/... (lines 41 and 49). Update the comment to match.

2. EXPORT_DIR diverges between the two scripts. export.sh writes to /cicd/export/... while quantize.sh still writes to /scratchspace/export/..., and the QUANT_CFG->tag logic differs (export.sh strips .yaml/.yml, quantize.sh does not). In this pipeline RUN_EXPORT=false so quantize's export path is dead, but a future caller running quantize.sh with inline export would get a different dir and tag than the standalone path. Worth unifying the tag logic and the /cicd vs /scratchspace choice so the two paths can't drift.

3. MMLU --fraction bumped 0.01 -> 0.05 in the shared quantize.sh. This is a 5x longer MMLU eval for every example that still runs MMLU inline, not just Nemotron, and it isn't mentioned in the PR description. Please confirm it's intentional for all callers; if it's only meant for this model it shouldn't be in the shared default.

Minor / process:

  • The smoke test only validates that the model serves and emits text — responses are dumped to /cicd/vllm/...jsonl and never graded, so the gpqa_smoke.jsonl answer keys aren't checked by anything. Fine for a smoke test; just won't catch accuracy regressions.
  • PR template checkboxes are unfilled (notably tests + Changelog) and the Testing section is empty — a line on how this was validated would help, since CONTRIBUTING marks tests mandatory for new examples.
  • A one-liner on what the Megatron-LM submodule bump (86bf476 -> c69697d) pulls in would help reviewers.

Items 1-3 are the blockers; the rest is polish. Nice work.

Comment thread tools/launcher/common/megatron_lm/quantize/quantize.sh Outdated
Comment thread tools/launcher/common/megatron_lm/quantize/quantize.sh Outdated
Comment thread tools/launcher/common/megatron_lm/export/export.sh Outdated
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Review Summary

This PR adds a Nemotron-3-Super-120B PTQ launcher example (quantize → export → vLLM smoke), a standalone export wrapper, an MMLU/export gating mechanism in quantize.sh, a small dataset-shard fix in query.py, and a vLLM image bump. It's all under tools/launcher/, so no algorithm/state/export-format changes — risk is contained to the launcher.

Findings

  • CRITICAL: 0
  • IMPORTANT: 1
  • SUGGESTION: 3

Highlights

  • [IMPORTANT Compatibility] quantize.sh (inline-export branch) and the new export.sh derive different EXPORT_DIR paths for the same (MLM_MODEL_CFG, QUANT_CFG): /scratchspace/export/...<unstripped tag> vs /cicd/export/...<.yaml/.yml stripped>. Masked here by RUN_EXPORT=false, but a latent bug for any future pipeline that combines RUN_EXPORT=true with a recipe-path QUANT_CFG.
  • [SUGGESTION] task_2 --save /cicd/vllm/...super-nvfp4.jsonlquery.py treats --save as a directory, so this creates a directory whose name ends in .jsonl and writes shards inside it; the YAML comment ("Inspect responses at /cicd/vllm/<model>.jsonl") implies a file, which is misleading.
  • [SUGGESTION Compatibility] --fraction was bumped from 0.010.05 and is hardcoded; existing Qwen3-8B / Qwen3-30B-A3B inline-MMLU runs will now take ~5× longer with no override. Consider an MMLU_FRACTION env var.
  • [SUGGESTION] export.sh Optional-env header documents /scratchspace/... defaults but the script actually defaults to /cicd/... (and adds HF_MODEL_CKPT).

Risk

Low. The PR is launcher-only and the example is cleanly gated behind RUN_MMLU=false RUN_EXPORT=false. The one Important compatibility item is a latent path divergence rather than an active regression.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
@jenchen13
Copy link
Copy Markdown
Contributor Author

/claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude review passed — no blocking issues found. LGTM

Re-reviewed against commit 98765b1a. Prior round's findings have been addressed:

  • EXPORT_DIR unified between quantize.sh and export.sh (/cicd/export/ + .yaml/.yml strip on both sides) — the latent path-divergence bug is gone.
  • export.sh header defaults now correctly document the /cicd/... paths and HF_MODEL_CKPT.
  • MMLU --fraction stays at 0.01 (no shared-default change).
  • --save in task_2 now points at a directory-style path and the comment matches ("Inspect responses under /cicd/vllm/<model>/").

Verified end-to-end:

  • task_2 --model path matches export.sh EXPORT_DIR derivation for MLM_MODEL_CFG=nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16, QUANT_CFG=models/Nemotron-3-Super-120B-A12B/super-nvfp4.
  • World-size accounting: task_0 (1·1·4·1=4, ntasks=4), task_1 (1·4·1·1=4, ntasks=4), task_2 (vLLM single-process w/ TP=4, ntasks=1).
  • query.py shard fix: with 8 prompts and --num-shards 1, max(1, min(16, 0)) = 1, lines up with --shard-id-begin 0 --shard-id-step 1 so the single shard gets processed.

Launcher-only PR; no algorithm/state/export-format changes. Risk is low.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
@jenchen13 jenchen13 requested a review from ChenhanYu June 4, 2026 14:06
@jenchen13 jenchen13 changed the title Nemotron Super launcher example Nemotron Ultra & Super launcher examples Jun 4, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml`:
- Line 32: The QUANT_CFG entry in megatron_lm_ptq.yaml points to
models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib which is actually a
shared, model-agnostic recipe — clarify this by either adding an explicit inline
comment next to the QUANT_CFG key stating that super-nvfp4-max-calib.yaml is
reused across models (model-agnostic/shared), or create/move the recipe to a
neutral shared path/name (e.g.,
modelopt_recipes/models/shared/super-nvfp4-max-calib.yaml) and update QUANT_CFG
to that path so the reference is not misleading; update the QUANT_CFG value
and/or add the comment in the
NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml file and ensure the
recipe filename super-nvfp4-max-calib.yaml is the one referenced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7f85e840-5ccb-4e92-ab4e-76b43e8829b1

📥 Commits

Reviewing files that changed from the base of the PR and between 98765b1 and 97a4b3b.

📒 Files selected for processing (3)
  • tools/launcher/common/vllm/gpqa_sample.jsonl
  • tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml
  • tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml
💤 Files with no reviewable changes (1)
  • tools/launcher/common/vllm/gpqa_sample.jsonl
🚧 Files skipped from review as they are similar to previous changes (1)
  • tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yaml

- --calib-size 32
environment:
- MLM_MODEL_CFG: nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
- QUANT_CFG: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Search for quantization configs and verify if 120B config is model-specific.

# Find all super-nvfp4-max-calib configs
echo "=== Finding all super-nvfp4-max-calib configs ==="
fd -t f 'super-nvfp4-max-calib' models/

# Check if there's a 550B-specific config
echo -e "\n=== Searching for 550B or Ultra quantization configs ==="
fd -t f -e yaml -e json -e py . models/ | rg -i '550b|ultra'

# Examine the 120B config for model-specific parameters
echo -e "\n=== Examining 120B config for model-specific parameters ==="
fd -t f 'super-nvfp4-max-calib' models/Nemotron-3-Super-120B-A12B/ --exec cat {}

Repository: NVIDIA/Model-Optimizer

Length of output: 565


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Repo top-level =="
ls -la

echo -e "\n== Find any 'models' directory in repo =="
fd -td models .

echo -e "\n== Locate the specific YAML under review =="
fd -t f 'megatron_lm_ptq\.yaml' tools/launcher/examples -e '*.lock' -e '*.md'

echo -e "\n== Show the relevant lines around QUANT_CFG =="
TARGET="tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml"
if [ -f "$TARGET" ]; then
  nl -ba "$TARGET" | sed -n '20,60p'
else
  echo "Missing: $TARGET"
fi

echo -e "\n== Check whether QUANT_CFG path exists as a file in this repo =="
QUANT_CFG="models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib"
if [ -f "$QUANT_CFG" ]; then
  echo "Exists: $QUANT_CFG"
  nl -ba "$QUANT_CFG" | head -n 200
else
  echo "Not found in repo: $QUANT_CFG"
  echo "Look for similarly named config files:"
  fd -t f 'super-nvfp4-max-calib' . | head -n 200
fi

Repository: NVIDIA/Model-Optimizer

Length of output: 2536


🏁 Script executed:

#!/bin/bash
set -euo pipefail

TARGET="tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml"

echo "== Does target YAML exist? =="
if [ -f "$TARGET" ]; then
  echo "FOUND: $TARGET"
else
  echo "MISSING: $TARGET"
fi

echo -e "\n== tools/launcher/examples YAMLs named megatron_lm_ptq.yaml (top 50) =="
fd -t f 'megatron_lm_ptq\.yaml' tools/launcher/examples | head -n 50 || true

echo -e "\n== Locate the Nemotron Ultra 550B config file path in repo (top 50) =="
fd -t f 'NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16' . | head -n 50 || true

echo -e "\n== Print lines 20-60 of target YAML (if present) =="
if [ -f "$TARGET" ]; then
  awk 'NR>=20 && NR<=60 {printf "%d\t%s\n", NR, $0}' "$TARGET"
fi

echo -e "\n== Search for QUANT_CFG target string 'super-nvfp4-max-calib' (top 50 matches) =="
rg -n "super-nvfp4-max-calib" . | head -n 50 || true

echo -e "\n== List files that contain the basename 'super-nvfp4-max-calib' under repo =="
fd -t f 'super-nvfp4-max-calib' . | head -n 50 || true

echo -e "\n== Does path 'models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib' exist as a file anywhere? =="
if [ -f "models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib" ]; then
  echo "FOUND as file: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib"
else
  echo "NOT found as file: models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib"
fi

echo -e "\n== Search for 'Nemotron-3-Super-120B-A12B' occurrences (top 50) =="
rg -n "Nemotron-3-Super-120B-A12B" . | head -n 50 || true

Repository: NVIDIA/Model-Optimizer

Length of output: 8208


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for QUANT_CFG usage in tools/launcher/common =="
rg -n "QUANT_CFG" tools/launcher/common || true

echo -e "\n== Show relevant parts of quantize.sh =="
QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh"
if [ -f "$QUANT_SCRIPT" ]; then
  rg -n "QUANT_CFG" "$QUANT_SCRIPT" || true
  awk 'NR>=1 && NR<=200 {print NR "\t" $0}' "$QUANT_SCRIPT" | sed -n '1,160p'
else
  echo "Missing $QUANT_SCRIPT"
fi

echo -e "\n== Show relevant parts of export.sh (QUANT_CFG usage) =="
EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh"
if [ -f "$EXPORT_SCRIPT" ]; then
  rg -n "QUANT_CFG" "$EXPORT_SCRIPT" || true
  awk 'NR>=1 && NR<=200 {print NR "\t" $0}' "$EXPORT_SCRIPT" | sed -n '1,180p'
else
  echo "Missing $EXPORT_SCRIPT"
fi

echo -e "\n== Inspect recipe file header for super-nvfp4-max-calib.yaml =="
RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml"
if [ -f "$RECIPE" ]; then
  awk 'NR<=120 {print NR "\t" $0}' "$RECIPE" | sed -n '1,120p'
else
  echo "Missing $RECIPE"
fi

Repository: NVIDIA/Model-Optimizer

Length of output: 14798


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for QUANT_CFG usage in tools/launcher/common =="
rg -n "QUANT_CFG" tools/launcher/common || true

echo -e "\n== Show relevant parts of quantize.sh =="
QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh"
if [ -f "$QUANT_SCRIPT" ]; then
  echo -e "\n-- Lines containing QUANT_CFG in quantize.sh --"
  rg -n "QUANT_CFG" "$QUANT_SCRIPT" || true
  echo -e "\n-- Snippet around first QUANT_CFG hit --"
  FIRST=$(rg -n "QUANT_CFG" "$QUANT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
  if [ -n "${FIRST:-}" ]; then
    START=$((FIRST-20)); END=$((FIRST+80))
    awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$QUANT_SCRIPT"
  else
    awk 'NR<=220 {print NR "\t" $0}' "$QUANT_SCRIPT"
  fi
else
  echo "Missing $QUANT_SCRIPT"
fi

echo -e "\n== Show relevant parts of export.sh (QUANT_CFG usage) =="
EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh"
if [ -f "$EXPORT_SCRIPT" ]; then
  echo -e "\n-- Lines containing QUANT_CFG in export.sh --"
  rg -n "QUANT_CFG" "$EXPORT_SCRIPT" || true
  FIRST=$(rg -n "QUANT_CFG" "$EXPORT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
  if [ -n "${FIRST:-}" ]; then
    START=$((FIRST-20)); END=$((FIRST+80))
    awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$EXPORT_SCRIPT"
  else
    awk 'NR<=220 {print NR "\t" $0}' "$EXPORT_SCRIPT"
  fi
else
  echo "Missing $EXPORT_SCRIPT"
fi

echo -e "\n== Inspect recipe file header for super-nvfp4-max-calib.yaml =="
RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml"
if [ -f "$RECIPE" ]; then
  awk 'NR<=160 {print NR "\t" $0}' "$RECIPE"
else
  echo "Missing $RECIPE"
fi

Repository: NVIDIA/Model-Optimizer

Length of output: 14190


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for QUANT_CFG usage in tools/launcher/common =="
rg -n "QUANT_CFG" tools/launcher/common || true

echo -e "\n== Inspect quantize.sh how QUANT_CFG is used =="
QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh"
if [ -f "$QUANT_SCRIPT" ]; then
  FIRST=$(rg -n "QUANT_CFG" "$QUANT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
  if [ -n "${FIRST:-}" ]; then
    START=$((FIRST-25)); END=$((FIRST+120))
    awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$QUANT_SCRIPT"
  else
    echo "No QUANT_CFG string in $QUANT_SCRIPT"
  fi
else
  echo "Missing $QUANT_SCRIPT"
fi

echo -e "\n== Inspect export.sh how QUANT_CFG is used (if at all) =="
EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh"
if [ -f "$EXPORT_SCRIPT" ]; then
  FIRST=$(rg -n "QUANT_CFG" "$EXPORT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
  if [ -n "${FIRST:-}" ]; then
    START=$((FIRST-25)); END=$((FIRST+120))
    awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$EXPORT_SCRIPT"
  else
    echo "No QUANT_CFG string in $EXPORT_SCRIPT"
  fi
else
  echo "Missing $EXPORT_SCRIPT"
fi

echo -e "\n== Inspect first ~120 lines of the 120B recipe referenced =="
RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml"
if [ -f "$RECIPE" ]; then
  awk 'NR<=120 {print NR "\t" $0}' "$RECIPE"
else
  echo "Missing $RECIPE"
fi

Repository: NVIDIA/Model-Optimizer

Length of output: 13258


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for QUANT_CFG usage in tools/launcher/common =="
rg -n "QUANT_CFG" tools/launcher/common || true

echo -e "\n== Inspect quantize.sh how QUANT_CFG is used =="
QUANT_SCRIPT="tools/launcher/common/megatron_lm/quantize/quantize.sh"
if [ -f "$QUANT_SCRIPT" ]; then
  FIRST=$(rg -n "QUANT_CFG" "$QUANT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
  if [ -n "${FIRST:-}" ]; then
    START=$((FIRST-25)); END=$((FIRST+120))
    awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$QUANT_SCRIPT"
  else
    echo "No QUANT_CFG string in $QUANT_SCRIPT"
  fi
else
  echo "Missing $QUANT_SCRIPT"
fi

echo -e "\n== Inspect export.sh how QUANT_CFG is used (if at all) =="
EXPORT_SCRIPT="tools/launcher/common/megatron_lm/export/export.sh"
if [ -f "$EXPORT_SCRIPT" ]; then
  FIRST=$(rg -n "QUANT_CFG" "$EXPORT_SCRIPT" | head -n 1 | cut -d: -f1 || true)
  if [ -n "${FIRST:-}" ]; then
    START=$((FIRST-25)); END=$((FIRST+120))
    awk -v s=$START -v e=$END 'NR>=s && NR<=e {print NR "\t" $0}' "$EXPORT_SCRIPT"
  else
    echo "No QUANT_CFG string in $EXPORT_SCRIPT"
  fi
else
  echo "Missing $EXPORT_SCRIPT"
fi

echo -e "\n== Inspect first ~120 lines of the 120B recipe referenced =="
RECIPE="modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib.yaml"
if [ -f "$RECIPE" ]; then
  awk 'NR<=120 {print NR "\t" $0}' "$RECIPE"
else
  echo "Missing $RECIPE"
fi

Repository: NVIDIA/Model-Optimizer

Length of output: 13258


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for super-nvfp4-max-calib recipes in repo =="
fd -t f 'super-nvfp4-max-calib' . | sed -e 's|^\./||' | head -n 200

echo -e "\n== Search for Nemotron-3-Ultra-550B / Ultra in modelopt_recipes =="
rg -n "Ultra-550B|Ultra-5|550B|Nemotron-3-Ultra" modelopt_recipes | head -n 200 || true

echo -e "\n== Inspect QUANT_CFG resolution in tools/launcher/common/megatron_lm/quantize/task.py =="
TASK="tools/launcher/common/megatron_lm/quantize/task.py"
if [ -f "$TASK" ]; then
  awk 'NR<=230 {print NR "\t" $0}' "$TASK"
else
  echo "Missing $TASK"
fi

echo -e "\n== Search for any helper resolving QUANT_CFG into a recipe path =="
rg -n "quant_cfg|quantCfg|QUANT_CFG.*yaml|NVFP4_DEFAULT_CFG" tools/launcher/common/megatron_lm/quantize tools/launcher/common -S || true

echo -e "\n== Does the exact path without extension exist? =="
if [ -f "modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib" ]; then
  echo "Found extension-less 120B config (unexpected)"
else
  echo "Extension-less 120B config not found (expected .yaml)"
fi
ls -la modelopt_recipes/models/Nemotron-3-Super-120B-A12B | rg -n "super-nvfp4-max-calib" || true

Repository: NVIDIA/Model-Optimizer

Length of output: 8499


Clarify QUANT_CFG recipe reuse for the 550B pipeline

tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml sets QUANT_CFG to models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib. In this repo, the only matching super-nvfp4-max-calib.yaml recipe exists under modelopt_recipes/models/Nemotron-3-Super-120B-A12B/ (no 550B/Ultra counterpart), so this is likely intentional reuse—but the path is misleading. Add an explicit comment documenting that the recipe is model-agnostic/shared, or move/create a shared (non-120B-named) recipe target to avoid future confusion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml`
at line 32, The QUANT_CFG entry in megatron_lm_ptq.yaml points to
models/Nemotron-3-Super-120B-A12B/super-nvfp4-max-calib which is actually a
shared, model-agnostic recipe — clarify this by either adding an explicit inline
comment next to the QUANT_CFG key stating that super-nvfp4-max-calib.yaml is
reused across models (model-agnostic/shared), or create/move the recipe to a
neutral shared path/name (e.g.,
modelopt_recipes/models/shared/super-nvfp4-max-calib.yaml) and update QUANT_CFG
to that path so the reference is not misleading; update the QUANT_CFG value
and/or add the comment in the
NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml file and ensure the
recipe filename super-nvfp4-max-calib.yaml is the one referenced.

Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question on the design of all scripts in this directory. Why do we need yet another export/quantize.sh on top of M-LM's export/quantize.sh?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this calls the scripts in Megatron-LM which are under modules? that's a good question though why can't we just call the scripts in modules/Megatron-LM instead of wrapping them again? @ChenhanYu do you know

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can address this in a future PR, thanks!

Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock

@jenchen13 jenchen13 enabled auto-merge (squash) June 4, 2026 16:39
@jenchen13 jenchen13 merged commit 6b73e93 into main Jun 4, 2026
77 of 84 checks passed
@jenchen13 jenchen13 deleted the jennifchen/nemotron_examples branch June 4, 2026 17:47
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-04 17:47 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants