Fix sparsity-only export emitting empty hf_quant_config.json#1375
Fix sparsity-only export emitting empty hf_quant_config.json#1375
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (4)
📝 WalkthroughWalkthroughMultiple import statements were reordered across example files. Additionally, quantization export logic was changed to treat a checkpoint as quantized only when metadata includes ChangesvLLM Examples Import Reordering
Quantization Export Logic
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/export/unified_export_hf.py (1)
1192-1210: ⚡ Quick winPlease add a regression test for both export branches.
Add one test for sparsity-only export (assert no
hf_quant_config.json/ noquantization_config) and one for true quantized export (assert both are present).As per coding guidelines: “Testing: ... add/adjust tests if the PR touches export output correctness.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/export/unified_export_hf.py` around lines 1192 - 1210, Add two regression tests covering both branches in unified_export_hf.py: one for sparsity-only export and one for true quantized export. For sparsity-only, call the export function (the code path computing is_quantized_export using quantization_details from hf_quant_config) with an hf_quant_config where "quantization": {"quant_algo": None, "kv_cache_quant_algo": None} and assert that no hf_quant_config.json file is written in export_dir and that the exported metadata/returned hf_quant_config does not include a quantization_config. For the quantized case, provide an hf_quant_config where "quantization": {"quant_algo": "<some_algo>"} and assert hf_quant_config.json exists in export_dir and that the export invoked convert_hf_quant_config_format (and the exported metadata contains the expected quantization_config). Name tests to reflect branches (e.g., test_export_sparsity_only_no_quant_config and test_export_true_quantized_writes_quant_config) and clean up temporary export_dir after assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 1192-1210: Add two regression tests covering both branches in
unified_export_hf.py: one for sparsity-only export and one for true quantized
export. For sparsity-only, call the export function (the code path computing
is_quantized_export using quantization_details from hf_quant_config) with an
hf_quant_config where "quantization": {"quant_algo": None,
"kv_cache_quant_algo": None} and assert that no hf_quant_config.json file is
written in export_dir and that the exported metadata/returned hf_quant_config
does not include a quantization_config. For the quantized case, provide an
hf_quant_config where "quantization": {"quant_algo": "<some_algo>"} and assert
hf_quant_config.json exists in export_dir and that the export invoked
convert_hf_quant_config_format (and the exported metadata contains the expected
quantization_config). Name tests to reflect branches (e.g.,
test_export_sparsity_only_no_quant_config and
test_export_true_quantized_writes_quant_config) and clean up temporary
export_dir after assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0fbcaa6b-538a-4d83-b54f-b53ae8af9034
📒 Files selected for processing (5)
examples/vllm_serve/fakequant_worker.pyexamples/vllm_serve/vllm_ptq_utils.pyexamples/vllm_serve/vllm_reload_utils.pyexamples/vllm_serve/vllm_serve_fakequant.pymodelopt/torch/export/unified_export_hf.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1375 +/- ##
==========================================
- Coverage 76.90% 76.90% -0.01%
==========================================
Files 471 471
Lines 50562 50565 +3
==========================================
Hits 38886 38886
- Misses 11676 11679 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
shengliangxu
left a comment
There was a problem hiding this comment.
LGTM, by the way do you know what codebase/system is still reading hf_quant_config.json?
Signed-off-by: Kai Xu <kaix@nvidia.com>
e0901f7 to
d306766
Compare
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
8e6d31f to
09ffe48
Compare
### What does this PR do? Type of change: ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> <!-- Details about the change. --> Bug fix. Fix sparsity-only export writing `hf_quant_config.json` with null `quant_algo`. ### Testing <!-- Mention how have you tested your change if applicable. --> ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain why. --> - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A <!--- Mandatory --> - Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory for new features or examples. --> - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. --> ### Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved quantization metadata handling in model export to correctly identify quantized checkpoints based on algorithm presence. * **Style** * Reorganized imports across example files for consistency. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Kai Xu <kaix@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
#1368 #1373 #1359 #1361 #1325 #1369 #1370 #1371 #1375 #1386 #1353 #1356 #1390 (#1385) ## Cherry-picked PRs - #1352 - #1351 - #1330 - #1354 - #1355 - #1360 - #1342 - #1324 - #1340 - #1368 - #1373 - #1359 - #1361 - #1325 - #1369 - #1370 - #1371 - #1375 - #1386 - #1353 - #1356 - #1390 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added Python 3.14 support (basic unit tests verified; production defaults on Python 3.12) * Added Windows CUDA 13.x installation guidance * Introduced LLM ONNX export utilities with quantization support * Extended Medusa mode support in speculative decoding pipeline * **Bug Fixes** * Fixed FP8 quantization for vision transformer multi-head attention * Improved MoE expert handling in quantization calibration and inference * Enhanced ONNX graph utilities for FP8 weight transformation * **Documentation** * Comprehensive Minitron pruning + distillation + quantization + vLLM tutorials with ablation studies * Megatron data preparation guide for tokenization workflows * Puzzletron distillation results and cross-reference updates <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com> Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com> Signed-off-by: ynankani <ynankani@nvidia.com> Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: vipandya <vipandya@nvidia.com> Signed-off-by: dmoodie <dmoodie@nvidia.com> Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com> Signed-off-by: Ye Yu <yeyu@nvidia.com> Signed-off-by: Kai Xu <kaix@nvidia.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: CodeRabbit <noreply@coderabbit.ai> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com> Co-authored-by: Jenny Chen <jennifchen@nvidia.com> Co-authored-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com> Co-authored-by: ynankani <ynankani@nvidia.com> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Co-authored-by: vishalpandya1990 <vishalpandya1990@gmail.com> Co-authored-by: dthienan-nv <dmoodie@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Hrishith Thadicherla <99313418+hthadicherla@users.noreply.github.com> Co-authored-by: yeyu-nvidia <yeyu@nvidia.com> Co-authored-by: kaix-nv <kaix@nvidia.com> Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>
What does this PR do?
Type of change: ?
Bug fix. Fix sparsity-only export writing
hf_quant_config.jsonwith nullquant_algo.Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
Bug Fixes
Style