Add NVFP4 + QAD to the Nemotron-3-Nano-30B-A3B tutorial by kevalmorabia97 · Pull Request #1601 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-06-02T13:55:33Z

What does this PR do?

Type of change: documentation

Note: This is part 3 of 4 (depends on #1589 and #1600 for the tutorial commands to actually run):

Part 1 (Add Megatron-Bridge PTQ quantize + export example scripts #1589): Megatron-Bridge quantize.py + export.py.
Part 2 (Add Quantization Aware Distillation (QAD) to Megatron-Bridge example #1600): distill.py Quantization Aware Distillation (QAD).
Part 3 (this PR): add the NVFP4 + QAD experiments to the Nemotron-3-Nano-30B-A3B tutorial (docs only).
Part 4: repeat the NVFP4 + QAD experiments on a non-Nemotron model.

This PR targets main directly (the changes are docs-only and don't touch Part 1/2 code), but the new tutorial commands use examples/megatron_bridge/{quantize,export,distill}.py and so require #1589 + #1600 to be merged before they run.

Updates the Nemotron-3-Nano-30B-A3B-BF16 tutorial:

Adds an NVFP4 PTQ → QAD → export section (recovering the NVFP4 accuracy drop), and migrates the existing FP8 section to the examples/megatron_bridge quantize.py / export.py scripts.
Replace hf_ptq.py results with mbridge quantize.py (same defaults, slightly better results on average)
Wraps all tutorial commands in collapsible <details> blocks.
Reframes the tutorial as NVFP4 + QAD (instead of FP8)
Reduce AIME num_repeats from 64 to 32 as previous took too long to run.

Testing

Docs-only change; rendered Markdown / collapsible blocks verified and markdownlint + RST hooks pass.

Before your PR is "Ready for review"

Is this change backward compatible?: N/A (docs only)
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A (docs only)
Did you update Changelog?: ✅
Did you get Claude approval on this PR?: ❌ (will run /claude review)

Additional Information

Placeholder ? cells in the tutorial's Results tables (NVFP4 / NVFP4+QAD accuracy, NVFP4 vLLM throughput) will be filled in with the experiment results before this leaves draft.

copy-pr-bot · 2026-06-02T13:55:39Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-02T13:55:42Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: acc9624f-1bf8-4b08-951a-650bfd060bb8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kmorabia/nemotron-nvfp4-qad-experiments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-02T14:13:15Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.48%. Comparing base (6f08731) to head (1aa7968).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1601   +/-   ##
=======================================
  Coverage   77.48%   77.48%           
=======================================
  Files         489      489           
  Lines       54415    54415           
=======================================
  Hits        42165    42165           
  Misses      12250    12250

Flag	Coverage Δ
unit	`54.00% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…1600) ### What does this PR do? Type of change: new example **Note:** This is **part 2 of 4** (builds on #1589): - **Part 1 (#1589):** Megatron-Bridge `quantize.py` + `export.py` support and tests. - **Part 2 (this PR):** extend `distill.py` for quantization-aware distillation (QAD) — load a quantized Megatron checkpoint as the student. - **Part 3:** #1601 - **Part 4:** repeat the NVFP4 + QAD experiments on a non-Nemotron model. Extends `examples/megatron_bridge/distill.py` to initialize the student from a **Megatron checkpoint** (a quantized checkpoint from `quantize.py`, or a pruned one) via `--student_megatron_path`, enabling **Quantization Aware Distillation (QAD)**: - `--student_hf_path` still builds the student architecture; `--student_megatron_path` supplies the (optionally quantized) weights. - For a quantized checkpoint, the ModelOpt quantize mode + base weights are restored onto the **plain student before the knowledge-distillation conversion** (`restore_sharded_modelopt_state` is a no-op once a model is already converted), so the distilled checkpoint stays exportable as a quantized model with `export.py`. **Upstream dependency / workaround:** `DistillationProvider.provide()` has no seam to transform the student before the KD conversion, so this patches `provide()` at the class level (via an `id()`-keyed registry, because the provider proxies instance-attribute assignment to its teacher once the teacher is set). A companion Megatron-Bridge PR adds a first-class `DistillationProvider.student_pre_conversion_hook`; from nemo:26.06 onwards the workaround should be removed and replaced with that hook (a removal note in `distill.py` documents exactly how). ### Usage ```bash # 1) PTQ -> quantized Megatron checkpoint (part 1) torchrun --nproc_per_node 2 quantize.py \ --hf_model_name_or_path Qwen/Qwen3-8B --quant_cfg fp8 --tp_size 2 \ --export_megatron_path /tmp/Qwen3-8B-FP8-megatron # 2) QAD: distill the quantized student from the unquantized teacher torchrun --nproc_per_node 8 distill.py \ --teacher_hf_path Qwen/Qwen3-8B \ --student_hf_path Qwen/Qwen3-8B \ --student_megatron_path /tmp/Qwen3-8B-FP8-megatron \ --data_paths 1.0 tokenized/data_text_document \ --train_iters 1000 --output_dir /output/qwen3_8b_qad # 3) export the distilled quantized checkpoint (part 1) torchrun --nproc_per_node 1 export.py \ --hf_model_name_or_path Qwen/Qwen3-8B \ --megatron_path /output/qwen3_8b_qad/checkpoints \ --export_unified_hf_path /tmp/qwen3_8b_qad_fp8_hf ``` ### Testing `tests/examples/megatron_bridge/test_qad.py` (validated on a 2-GPU NeMo `26.04` container): quantize a tiny Qwen3 at TP=2 → QAD distill from the quantized student → `export.py` to a unified HF checkpoint, asserting `hf_quant_config.json` is written (proves the quantize mode survived QAD). Includes a commented-out vLLM deployment check, validated locally (full flow passes; vLLM loads the export as `quantization=modelopt`). Existing normal/Puzzletron distillation tests still pass. ### Before your PR is "*Ready for review*" - Is this change backward compatible?: N/A (new example feature; default behavior unchanged when `--student_megatron_path` is not set) - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A (no new dependencies) - Did you write any new necessary tests?: ✅ - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ - Did you get Claude approval on this PR?: ✅ ### Additional Information Depends on a companion Megatron-Bridge PR adding `DistillationProvider.student_pre_conversion_hook` (the upstream replacement for the class-level `provide()` workaround). The Nemotron-3 tutorial NVFP4 + QAD experiments ship in part 3.  ## Summary by CodeRabbit * **New Features** * Quantization Aware Distillation (QAD) workflow to recover accuracy of quantized Megatron students and distill from quantized checkpoints. * CLI option to initialize a distillation student from a Megatron checkpoint and a structure-only load path for bridging. * **Documentation** * Expanded runnable quantize → QAD → export guidance and best-practice tips. * **Tests** * End-to-end test validating quantize → QAD → export artifacts. * **Chores / UX** * Clearer rank-aware messages, improved tokenizer padding handling, and more consistent export behavior (fixed export dtype).  --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

- Add an NVFP4 PTQ -> QAD -> export section to the Nemotron-3-Nano-30B-A3B tutorial to recover the NVFP4 accuracy drop, and migrate the existing FP8 quantization section to the examples/megatron_bridge quantize.py / export.py scripts. Add placeholder rows for the NVFP4 / NVFP4+QAD accuracy and NVFP4 vLLM throughput numbers (to be filled in once the experiments land). - Wrap all tutorial commands in collapsible <details> blocks. - Reframe the tutorial as NVFP4 + QAD (instead of FP8) in the root README "Latest News", CHANGELOG, and the pruning / minitron-vs-puzzletron references. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kmorabia/nemotron-nvfp4-qad-experiments branch from 05f0ed8 to d8817a6 Compare June 2, 2026 13:59

kevalmorabia97 force-pushed the kmorabia/nemotron-nvfp4-qad-experiments branch 2 times, most recently from 97692b4 to 5955df1 Compare June 4, 2026 19:14

kevalmorabia97 mentioned this pull request Jun 4, 2026

Add Quantization Aware Distillation (QAD) to Megatron-Bridge example #1600

Merged

kevalmorabia97 force-pushed the kmorabia/nemotron-nvfp4-qad-experiments branch 2 times, most recently from c7ed502 to d8870cf Compare June 5, 2026 18:46

kevalmorabia97 force-pushed the kmorabia/nemotron-nvfp4-qad-experiments branch from d8870cf to 1aa7968 Compare June 6, 2026 03:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NVFP4 + QAD to the Nemotron-3-Nano-30B-A3B tutorial#1601

Add NVFP4 + QAD to the Nemotron-3-Nano-30B-A3B tutorial#1601
kevalmorabia97 wants to merge 1 commit into
mainfrom
kmorabia/nemotron-nvfp4-qad-experiments

kevalmorabia97 commented Jun 2, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

Review skipped

Uh oh!

codecov Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kevalmorabia97 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codecov Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kevalmorabia97 commented Jun 2, 2026 •

edited

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

codecov Bot commented Jun 2, 2026 •

edited

Loading