Support auto_quantize for Megatron expert parallelism#1513
Conversation
Signed-off-by: realAsma <akuriparambi@nvidia.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## jennifchen/super_nvfp4_recipe #1513 +/- ##
=================================================================
- Coverage 72.74% 69.38% -3.37%
=================================================================
Files 473 473
Lines 51574 51590 +16
=================================================================
- Hits 37520 35794 -1726
- Misses 14054 15796 +1742
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Jenny Chen <jennifchen@nvidia.com>
What does this PR do?
Type of change: Bug fix
This PR enables
auto_quantizefor Megatron expert parallel MoE flows by including the expert model parallel group when aggregating scores and costs and when synchronizing selected recipes. It also derives the search budget from the no-quant candidate costs incandidate_stats, so sharded expert layers use global candidate costs instead of local module weights.Usage
Testing
python -m pytest tests/gpu_megatron/torch/quantization/plugins/test_megatron.py::test_auto_quantize_moe_ep -xvsin NGC PyTorch 26.01 (1 passedin 134.37s).Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
Base branch:
jennifchen/super_nvfp4_recipe.