Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback fix by kevalmorabia97 · Pull Request #4881 · NVIDIA/Megatron-LM

kevalmorabia97 · 2026-05-19T23:14:41Z

Summary

prune.py and quantize.py now use ModelOpt 0.45's shared get_megatron_calibration_forward_loop (one sample per row + per-row trim + EOS-at-row-end), guarded by a try-import so 0.44 falls back to the prior inline path.
Unified defaults across both scripts and aligned with the M-Bridge counterparts: --calib-dataset nemotron-post-training-dataset-v2, --calib-size 1024, --calib-max-sequence-length 4096, --calib-batch-size 1. Conservative defaults sized for MoE robustness (top-K routing → fewer tokens per expert → more samples × longer seq needed for stable amax / scoring statistics).

Results

Qwen3-8B (TP=1 PP=2 prune; TP=2 PP=1 quantize). MMLU 5% fraction, 0-shot, bs=4. "Original" = pre-PR inline per-example calibration; "Shared" = ModelOpt 0.45 shared loop.

Minitron prune (Qwen3-8B → 30L/3584/11776 ≈ 6B params)

Calibration	Dataset	seq_len	calib_size	calib_bs	MMLU
Original (inline `pack=True` WAR)	cnn_dailymail	512	512	1 / 16	0.530
Shared	cnn_dailymail	512	512	16	0.555 (+2.5)
Shared	nemotron-v2	2048	512	16	0.585
Shared	nemotron-v2	2048	1024	16	0.588
Shared	nemotron-v2	4096	512	16	0.587

NVFP4 quantize (`NVFP4_DEFAULT_CFG`)

Calibration	Dataset	seq_len	calib_size	calib_bs	MMLU
Original (`get_calib_dataloader` pad+truncate)	cnn_dailymail	512	512	1	0.680
Shared	cnn_dailymail	512	512	16	0.705 (+2.5)
Shared	nemotron-v2	2048	512	16	0.699
Shared	nemotron-v2	2048	1024	16	0.690
Shared	nemotron-v2	4096	512	8	0.710 (+3.0)

Takeaway: the shared loop is ≥ original on every workload tested, with +2.5 to +3.0 pt MMLU wins where the original underperformed. Differences across (seq_len, calib_size, calib_batch_size) on the shared path are within MMLU noise floor for dense Qwen3-8B; defaults err on the conservative side for MoE pruning.

🤖 Generated with Claude Code

github-actions · 2026-05-19T23:14:51Z

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

Add the oncall reviewer (optional reviewer)
Add required review teams based on your changes

See the contribution guide for more details.

copy-pr-bot · 2026-05-19T23:14:52Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

The new modelopt 0.45 shared util get_megatron_calibration_forward_loop unifies prune/quantize calibration with pack=True. Wrap both prune.py and quantize.py with try-import + _HAS_SHARED_CALIB so they continue to work on modelopt 0.44 (inline pack=True for prune, legacy local-JSONL / HF-dataset pad+truncate for quantize) and use the shared util on 0.45+. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 · 2026-05-20T16:40:16Z

/ok to test ab1d17c

svcnvidia-nemo-ci marked this pull request as draft May 19, 2026 23:14

copy-pr-bot Bot temporarily deployed to public May 19, 2026 23:15 Inactive

copy-pr-bot Bot temporarily deployed to test May 19, 2026 23:15 Inactive

kevalmorabia97 changed the title ~~Make ModelOpt calibration loop dual-compatible with 0.44 and 0.45~~ Use new and better shared ModelOpt calibration loop if available (0.45.0 release or main) May 19, 2026

kevalmorabia97 changed the title ~~Use new and better shared ModelOpt calibration loop if available (0.45.0 release or main)~~ Use new and shared ModelOpt calibration loop (better; supports mbs>1) if available (0.45.0 release or main) May 19, 2026

copy-pr-bot Bot temporarily deployed to public May 19, 2026 23:18 Inactive

copy-pr-bot Bot temporarily deployed to public May 19, 2026 23:26 Inactive

kevalmorabia97 force-pushed the kmorabia/modelopt-calib-loop-dual-compat branch from 676fa82 to ab1d17c Compare May 20, 2026 16:24

kevalmorabia97 changed the title ~~Use new and shared ModelOpt calibration loop (better; supports mbs>1) if available (0.45.0 release or main)~~ Use new and better shared ModelOpt calibration loop if available (0.45.0 release or main) May 20, 2026

kevalmorabia97 mentioned this pull request May 20, 2026

Create shared Megatron calibration forward loop for prune / quantize NVIDIA/Model-Optimizer#1501

Open

4 tasks

kevalmorabia97 marked this pull request as ready for review May 20, 2026 16:40

kevalmorabia97 requested review from AAnoosheh, ChenhanYu and jenchen13 and removed request for jenchen13 May 20, 2026 16:40

kevalmorabia97 changed the title ~~Use new and better shared ModelOpt calibration loop if available (0.45.0 release or main)~~ Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback May 20, 2026

copy-pr-bot Bot temporarily deployed to public May 20, 2026 16:41 Inactive

svcnvidia-nemo-ci requested a review from a team May 20, 2026 16:42

copy-pr-bot Bot temporarily deployed to test May 20, 2026 16:44 Inactive

svcnvidia-nemo-ci added the complexity: medium label May 20, 2026

copy-pr-bot Bot temporarily deployed to public May 20, 2026 16:45 Inactive

copy-pr-bot Bot temporarily deployed to public May 20, 2026 16:47 Inactive

copy-pr-bot Bot temporarily deployed to public May 20, 2026 16:48 Inactive

copy-pr-bot Bot temporarily deployed to public May 20, 2026 17:31 Inactive

kevalmorabia97 requested review from yueshen2016 and removed request for a team and AAnoosheh May 20, 2026 18:36

kevalmorabia97 changed the title ~~Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback~~ Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback fix May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback fix#4881

Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback fix#4881
kevalmorabia97 wants to merge 1 commit into
NVIDIA:mainfrom
kevalmorabia97:kmorabia/modelopt-calib-loop-dual-compat

kevalmorabia97 commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

copy-pr-bot Bot commented May 19, 2026

Uh oh!

kevalmorabia97 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kevalmorabia97 commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Minitron prune (Qwen3-8B → 30L/3584/11776 ≈ 6B params)

NVFP4 quantize (NVFP4_DEFAULT_CFG)

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

copy-pr-bot Bot commented May 19, 2026

Uh oh!

kevalmorabia97 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kevalmorabia97 commented May 19, 2026 •

edited

Loading

NVFP4 quantize (`NVFP4_DEFAULT_CFG`)