Add NVFP4 + QAD to the Nemotron-3-Nano-30B-A3B tutorial#1601
Add NVFP4 + QAD to the Nemotron-3-Nano-30B-A3B tutorial#1601kevalmorabia97 wants to merge 1 commit into
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
05f0ed8 to
d8817a6
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1601 +/- ##
=======================================
Coverage 77.48% 77.48%
=======================================
Files 489 489
Lines 54415 54415
=======================================
Hits 42165 42165
Misses 12250 12250
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
97692b4 to
5955df1
Compare
c7ed502 to
d8870cf
Compare
…1600) ### What does this PR do? Type of change: new example **Note:** This is **part 2 of 4** (builds on #1589): - **Part 1 (#1589):** Megatron-Bridge `quantize.py` + `export.py` support and tests. - **Part 2 (this PR):** extend `distill.py` for quantization-aware distillation (QAD) — load a quantized Megatron checkpoint as the student. - **Part 3:** #1601 - **Part 4:** repeat the NVFP4 + QAD experiments on a non-Nemotron model. Extends `examples/megatron_bridge/distill.py` to initialize the student from a **Megatron checkpoint** (a quantized checkpoint from `quantize.py`, or a pruned one) via `--student_megatron_path`, enabling **Quantization Aware Distillation (QAD)**: - `--student_hf_path` still builds the student architecture; `--student_megatron_path` supplies the (optionally quantized) weights. - For a quantized checkpoint, the ModelOpt quantize mode + base weights are restored onto the **plain student before the knowledge-distillation conversion** (`restore_sharded_modelopt_state` is a no-op once a model is already converted), so the distilled checkpoint stays exportable as a quantized model with `export.py`. **Upstream dependency / workaround:** `DistillationProvider.provide()` has no seam to transform the student before the KD conversion, so this patches `provide()` at the class level (via an `id()`-keyed registry, because the provider proxies instance-attribute assignment to its teacher once the teacher is set). A companion Megatron-Bridge PR adds a first-class `DistillationProvider.student_pre_conversion_hook`; from nemo:26.06 onwards the workaround should be removed and replaced with that hook (a removal note in `distill.py` documents exactly how). ### Usage ```bash # 1) PTQ -> quantized Megatron checkpoint (part 1) torchrun --nproc_per_node 2 quantize.py \ --hf_model_name_or_path Qwen/Qwen3-8B --quant_cfg fp8 --tp_size 2 \ --export_megatron_path /tmp/Qwen3-8B-FP8-megatron # 2) QAD: distill the quantized student from the unquantized teacher torchrun --nproc_per_node 8 distill.py \ --teacher_hf_path Qwen/Qwen3-8B \ --student_hf_path Qwen/Qwen3-8B \ --student_megatron_path /tmp/Qwen3-8B-FP8-megatron \ --data_paths 1.0 tokenized/data_text_document \ --train_iters 1000 --output_dir /output/qwen3_8b_qad # 3) export the distilled quantized checkpoint (part 1) torchrun --nproc_per_node 1 export.py \ --hf_model_name_or_path Qwen/Qwen3-8B \ --megatron_path /output/qwen3_8b_qad/checkpoints \ --export_unified_hf_path /tmp/qwen3_8b_qad_fp8_hf ``` ### Testing `tests/examples/megatron_bridge/test_qad.py` (validated on a 2-GPU NeMo `26.04` container): quantize a tiny Qwen3 at TP=2 → QAD distill from the quantized student → `export.py` to a unified HF checkpoint, asserting `hf_quant_config.json` is written (proves the quantize mode survived QAD). Includes a commented-out vLLM deployment check, validated locally (full flow passes; vLLM loads the export as `quantization=modelopt`). Existing normal/Puzzletron distillation tests still pass. ### Before your PR is "*Ready for review*" - Is this change backward compatible?: N/A (new example feature; default behavior unchanged when `--student_megatron_path` is not set) - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A (no new dependencies) - Did you write any new necessary tests?: ✅ - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ - Did you get Claude approval on this PR?: ✅ ### Additional Information Depends on a companion Megatron-Bridge PR adding `DistillationProvider.student_pre_conversion_hook` (the upstream replacement for the class-level `provide()` workaround). The Nemotron-3 tutorial NVFP4 + QAD experiments ship in part 3. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Quantization Aware Distillation (QAD) workflow to recover accuracy of quantized Megatron students and distill from quantized checkpoints. * CLI option to initialize a distillation student from a Megatron checkpoint and a structure-only load path for bridging. * **Documentation** * Expanded runnable quantize → QAD → export guidance and best-practice tips. * **Tests** * End-to-end test validating quantize → QAD → export artifacts. * **Chores / UX** * Clearer rank-aware messages, improved tokenizer padding handling, and more consistent export behavior (fixed export dtype). <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
- Add an NVFP4 PTQ -> QAD -> export section to the Nemotron-3-Nano-30B-A3B tutorial to recover the NVFP4 accuracy drop, and migrate the existing FP8 quantization section to the examples/megatron_bridge quantize.py / export.py scripts. Add placeholder rows for the NVFP4 / NVFP4+QAD accuracy and NVFP4 vLLM throughput numbers (to be filled in once the experiments land). - Wrap all tutorial commands in collapsible <details> blocks. - Reframe the tutorial as NVFP4 + QAD (instead of FP8) in the root README "Latest News", CHANGELOG, and the pruning / minitron-vs-puzzletron references. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
d8870cf to
1aa7968
Compare
What does this PR do?
Type of change: documentation
Note: This is part 3 of 4 (depends on #1589 and #1600 for the tutorial commands to actually run):
quantize.py+export.py.distill.pyQuantization Aware Distillation (QAD).This PR targets
maindirectly (the changes are docs-only and don't touch Part 1/2 code), but the new tutorial commands useexamples/megatron_bridge/{quantize,export,distill}.pyand so require #1589 + #1600 to be merged before they run.Updates the Nemotron-3-Nano-30B-A3B-BF16 tutorial:
examples/megatron_bridgequantize.py/export.pyscripts.hf_ptq.pyresults with mbridgequantize.py(same defaults, slightly better results on average)<details>blocks.Testing
Docs-only change; rendered Markdown / collapsible blocks verified and markdownlint + RST hooks pass.
Before your PR is "Ready for review"
CONTRIBUTING.md: N/A/claude review)Additional Information
Placeholder
?cells in the tutorial's Results tables (NVFP4 / NVFP4+QAD accuracy, NVFP4 vLLM throughput) will be filled in with the experiment results before this leaves draft.