Skip to content

Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback fix#4881

Open
kevalmorabia97 wants to merge 1 commit into
NVIDIA:mainfrom
kevalmorabia97:kmorabia/modelopt-calib-loop-dual-compat
Open

Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback fix#4881
kevalmorabia97 wants to merge 1 commit into
NVIDIA:mainfrom
kevalmorabia97:kmorabia/modelopt-calib-loop-dual-compat

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Contributor

@kevalmorabia97 kevalmorabia97 commented May 19, 2026

Related: NVIDIA/Model-Optimizer#1501

Summary

  • prune.py and quantize.py now use ModelOpt 0.45's shared get_megatron_calibration_forward_loop (one sample per row + per-row trim + EOS-at-row-end), guarded by a try-import so 0.44 falls back to the prior inline path.
  • Unified defaults across both scripts and aligned with the M-Bridge counterparts: --calib-dataset nemotron-post-training-dataset-v2, --calib-size 1024, --calib-max-sequence-length 4096, --calib-batch-size 1. Conservative defaults sized for MoE robustness (top-K routing → fewer tokens per expert → more samples × longer seq needed for stable amax / scoring statistics).

Results

Qwen3-8B (TP=1 PP=2 prune; TP=2 PP=1 quantize). MMLU 5% fraction, 0-shot, bs=4. "Original" = pre-PR inline per-example calibration; "Shared" = ModelOpt 0.45 shared loop.

Minitron prune (Qwen3-8B → 30L/3584/11776 ≈ 6B params)

Calibration Dataset seq_len calib_size calib_bs MMLU
Original (inline pack=True WAR) cnn_dailymail 512 512 1 / 16 0.530
Shared cnn_dailymail 512 512 16 0.555 (+2.5)
Shared nemotron-v2 2048 512 16 0.585
Shared nemotron-v2 2048 1024 16 0.588
Shared nemotron-v2 4096 512 16 0.587

NVFP4 quantize (NVFP4_DEFAULT_CFG)

Calibration Dataset seq_len calib_size calib_bs MMLU
Original (get_calib_dataloader pad+truncate) cnn_dailymail 512 512 1 0.680
Shared cnn_dailymail 512 512 16 0.705 (+2.5)
Shared nemotron-v2 2048 512 16 0.699
Shared nemotron-v2 2048 1024 16 0.690
Shared nemotron-v2 4096 512 8 0.710 (+3.0)

Takeaway: the shared loop is ≥ original on every workload tested, with +2.5 to +3.0 pt MMLU wins where the original underperformed. Differences across (seq_len, calib_size, calib_batch_size) on the shared path are within MMLU noise floor for dense Qwen3-8B; defaults err on the conservative side for MoE pruning.

🤖 Generated with Claude Code

@svcnvidia-nemo-ci svcnvidia-nemo-ci marked this pull request as draft May 19, 2026 23:14
@github-actions
Copy link
Copy Markdown
Contributor

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

  1. Add the oncall reviewer (optional reviewer)
  2. Add required review teams based on your changes

See the contribution guide for more details.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 19, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kevalmorabia97 kevalmorabia97 changed the title Make ModelOpt calibration loop dual-compatible with 0.44 and 0.45 Use new and better shared ModelOpt calibration loop if available (0.45.0 release or main) May 19, 2026
@kevalmorabia97 kevalmorabia97 changed the title Use new and better shared ModelOpt calibration loop if available (0.45.0 release or main) Use new and shared ModelOpt calibration loop (better; supports mbs>1) if available (0.45.0 release or main) May 19, 2026
The new modelopt 0.45 shared util get_megatron_calibration_forward_loop
unifies prune/quantize calibration with pack=True. Wrap both prune.py and
quantize.py with try-import + _HAS_SHARED_CALIB so they continue to work
on modelopt 0.44 (inline pack=True for prune, legacy local-JSONL /
HF-dataset pad+truncate for quantize) and use the shared util on 0.45+.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/modelopt-calib-loop-dual-compat branch from 676fa82 to ab1d17c Compare May 20, 2026 16:24
@kevalmorabia97 kevalmorabia97 changed the title Use new and shared ModelOpt calibration loop (better; supports mbs>1) if available (0.45.0 release or main) Use new and better shared ModelOpt calibration loop if available (0.45.0 release or main) May 20, 2026
@kevalmorabia97 kevalmorabia97 marked this pull request as ready for review May 20, 2026 16:40
@kevalmorabia97
Copy link
Copy Markdown
Contributor Author

/ok to test ab1d17c

@kevalmorabia97 kevalmorabia97 requested review from AAnoosheh, ChenhanYu and jenchen13 and removed request for jenchen13 May 20, 2026 16:40
@kevalmorabia97 kevalmorabia97 changed the title Use new and better shared ModelOpt calibration loop if available (0.45.0 release or main) Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback May 20, 2026
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team May 20, 2026 16:42
@kevalmorabia97 kevalmorabia97 requested review from yueshen2016 and removed request for a team and AAnoosheh May 20, 2026 18:36
@kevalmorabia97 kevalmorabia97 changed the title Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback Use shared ModelOpt calibration loop on 0.45+ with 0.44 fallback fix May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants