Fix sparsity-only export emitting empty hf_quant_config.json by kaix-nv · Pull Request #1375 · NVIDIA/Model-Optimizer

kaix-nv · 2026-04-29T22:28:46Z

What does this PR do?

Type of change: ?

Bug fix. Fix sparsity-only export writing hf_quant_config.json with null quant_algo.

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Bug Fixes
- Export now correctly detects when a checkpoint is truly quantized and avoids writing unnecessary quantization metadata/files when no quantization algorithm is specified.
Style
- Reorganized imports across example modules for consistency and clearer import ordering.

coderabbitai · 2026-04-29T22:29:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 06f95962-f83f-49d4-8be3-e61f31b9ff2d

📥 Commits

Reviewing files that changed from the base of the PR and between e0901f7 and d306766.

📒 Files selected for processing (5)

examples/vllm_serve/fakequant_worker.py
examples/vllm_serve/vllm_ptq_utils.py
examples/vllm_serve/vllm_reload_utils.py
examples/vllm_serve/vllm_serve_fakequant.py
modelopt/torch/export/unified_export_hf.py

✅ Files skipped from review due to trivial changes (4)

examples/vllm_serve/vllm_ptq_utils.py
examples/vllm_serve/vllm_reload_utils.py
examples/vllm_serve/vllm_serve_fakequant.py
examples/vllm_serve/fakequant_worker.py

📝 Walkthrough

Walkthrough

Multiple import statements were reordered across example files. Additionally, quantization export logic was changed to treat a checkpoint as quantized only when metadata includes quant_algo or kv_cache_quant_algo, avoiding hf_quant_config.json and quantization_config in config.json when neither is present.

Changes

vLLM Examples Import Reordering

Layer / File(s)	Summary
Import ordering `examples/vllm_serve/fakequant_worker.py`, `examples/vllm_serve/vllm_ptq_utils.py`, `examples/vllm_serve/vllm_reload_utils.py`, `examples/vllm_serve/vllm_serve_fakequant.py`	Reordered module imports within each file; no changes to runtime logic, public APIs, or behavior.

Quantization Export Logic

Layer / File(s)	Summary
Quantization detection `modelopt/torch/export/unified_export_hf.py`	Compute `quantization_details = (hf_quant_config or {}).get("quantization", {})` and determine `is_quantized_export` by checking `quant_algo` or `kv_cache_quant_algo` presence.
Behavioral change `modelopt/torch/export/unified_export_hf.py`	If no quant algos present, set `hf_quant_config = None` to prevent writing `hf_quant_config.json` and avoid adding `quantization_config` to `config.json`; otherwise retain prior behavior and convert/write HF quant config.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main bug fix in the changeset - preventing sparsity-only exports from creating empty hf_quant_config.json files. This directly corresponds to the primary logic change in unified_export_hf.py.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected in modified files: torch.load, numpy.load, trust_remote_code, eval/exec, or nosec comments.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kaix/sa_export_fix

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-29T22:32:05Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-04 21:11 UTC

coderabbitai

🧹 Nitpick comments (1)

modelopt/torch/export/unified_export_hf.py (1)

1192-1210: ⚡ Quick win

Please add a regression test for both export branches.

Add one test for sparsity-only export (assert no hf_quant_config.json / no quantization_config) and one for true quantized export (assert both are present).

As per coding guidelines: “Testing: ... add/adjust tests if the PR touches export output correctness.”

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/unified_export_hf.py` around lines 1192 - 1210, Add two
regression tests covering both branches in unified_export_hf.py: one for
sparsity-only export and one for true quantized export. For sparsity-only, call
the export function (the code path computing is_quantized_export using
quantization_details from hf_quant_config) with an hf_quant_config where
"quantization": {"quant_algo": None, "kv_cache_quant_algo": None} and assert
that no hf_quant_config.json file is written in export_dir and that the exported
metadata/returned hf_quant_config does not include a quantization_config. For
the quantized case, provide an hf_quant_config where "quantization":
{"quant_algo": "<some_algo>"} and assert hf_quant_config.json exists in
export_dir and that the export invoked convert_hf_quant_config_format (and the
exported metadata contains the expected quantization_config). Name tests to
reflect branches (e.g., test_export_sparsity_only_no_quant_config and
test_export_true_quantized_writes_quant_config) and clean up temporary
export_dir after assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 1192-1210: Add two regression tests covering both branches in
unified_export_hf.py: one for sparsity-only export and one for true quantized
export. For sparsity-only, call the export function (the code path computing
is_quantized_export using quantization_details from hf_quant_config) with an
hf_quant_config where "quantization": {"quant_algo": None,
"kv_cache_quant_algo": None} and assert that no hf_quant_config.json file is
written in export_dir and that the exported metadata/returned hf_quant_config
does not include a quantization_config. For the quantized case, provide an
hf_quant_config where "quantization": {"quant_algo": "<some_algo>"} and assert
hf_quant_config.json exists in export_dir and that the export invoked
convert_hf_quant_config_format (and the exported metadata contains the expected
quantization_config). Name tests to reflect branches (e.g.,
test_export_sparsity_only_no_quant_config and
test_export_true_quantized_writes_quant_config) and clean up temporary
export_dir after assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0fbcaa6b-538a-4d83-b54f-b53ae8af9034

📥 Commits

Reviewing files that changed from the base of the PR and between c07ac21 and e0901f7.

📒 Files selected for processing (5)

examples/vllm_serve/fakequant_worker.py
examples/vllm_serve/vllm_ptq_utils.py
examples/vllm_serve/vllm_reload_utils.py
examples/vllm_serve/vllm_serve_fakequant.py
modelopt/torch/export/unified_export_hf.py

codecov · 2026-04-29T22:41:15Z

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 76.90%. Comparing base (acfab41) to head (09ffe48).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/export/unified_export_hf.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1375      +/-   ##
==========================================
- Coverage   76.90%   76.90%   -0.01%     
==========================================
  Files         471      471              
  Lines       50562    50565       +3     
==========================================
  Hits        38886    38886              
- Misses      11676    11679       +3

Flag	Coverage Δ
examples	`41.56% <75.00%> (+0.89%)`	⬆️
gpu	`59.67% <75.00%> (-0.66%)`	⬇️
regression	`14.89% <0.00%> (+0.06%)`	⬆️
unit	`52.80% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

shengliangxu

LGTM, by the way do you know what codebase/system is still reading hf_quant_config.json?

Signed-off-by: Kai Xu <kaix@nvidia.com>

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

### What does this PR do? Type of change: ?   Bug fix. Fix sparsity-only export writing `hf_quant_config.json` with null `quant_algo`. ### Testing  ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ / ❌ / N/A  - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A  - Did you write any new necessary tests?: ✅ / ❌ / N/A  - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ / ❌ / N/A  ### Additional Information   ## Summary by CodeRabbit * **Bug Fixes** * Improved quantization metadata handling in model export to correctly identify quantized checkpoints based on algorithm presence. * **Style** * Reorganized imports across example files for consistency.  Signed-off-by: Kai Xu <kaix@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

#1368 #1373 #1359 #1361 #1325 #1369 #1370 #1371 #1375 #1386 #1353 #1356 #1390 (#1385) ## Cherry-picked PRs - #1352 - #1351 - #1330 - #1354 - #1355 - #1360 - #1342 - #1324 - #1340 - #1368 - #1373 - #1359 - #1361 - #1325 - #1369 - #1370 - #1371 - #1375 - #1386 - #1353 - #1356 - #1390  ## Summary by CodeRabbit * **New Features** * Added Python 3.14 support (basic unit tests verified; production defaults on Python 3.12) * Added Windows CUDA 13.x installation guidance * Introduced LLM ONNX export utilities with quantization support * Extended Medusa mode support in speculative decoding pipeline * **Bug Fixes** * Fixed FP8 quantization for vision transformer multi-head attention * Improved MoE expert handling in quantization calibration and inference * Enhanced ONNX graph utilities for FP8 weight transformation * **Documentation** * Comprehensive Minitron pruning + distillation + quantization + vLLM tutorials with ablation studies * Megatron data preparation guide for tokenization workflows * Puzzletron distillation results and cross-reference updates  --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com> Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com> Signed-off-by: ynankani <ynankani@nvidia.com> Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: vipandya <vipandya@nvidia.com> Signed-off-by: dmoodie <dmoodie@nvidia.com> Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com> Signed-off-by: Ye Yu <yeyu@nvidia.com> Signed-off-by: Kai Xu <kaix@nvidia.com> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: CodeRabbit <noreply@coderabbit.ai> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Asha Anoosheh <aanoosheh@nvidia.com> Co-authored-by: Jenny Chen <jennifchen@nvidia.com> Co-authored-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com> Co-authored-by: ynankani <ynankani@nvidia.com> Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Co-authored-by: vishalpandya1990 <vishalpandya1990@gmail.com> Co-authored-by: dthienan-nv <dmoodie@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Hrishith Thadicherla <99313418+hthadicherla@users.noreply.github.com> Co-authored-by: yeyu-nvidia <yeyu@nvidia.com> Co-authored-by: kaix-nv <kaix@nvidia.com> Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>

kaix-nv requested review from a team as code owners April 29, 2026 22:28

kaix-nv requested a review from sugunav14 April 29, 2026 22:28

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 30, 2026

shengliangxu approved these changes May 1, 2026

View reviewed changes

kaix-nv closed this May 4, 2026

kevalmorabia97 removed the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label May 4, 2026

kevalmorabia97 deleted the kaix/sa_export_fix branch May 4, 2026 18:16

kaix-nv restored the kaix/sa_export_fix branch May 4, 2026 18:18

kaix-nv reopened this May 4, 2026

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label May 4, 2026

Fix sparsity-only export emitting empty hf_quant_config.json

d306766

Signed-off-by: Kai Xu <kaix@nvidia.com>

kaix-nv force-pushed the kaix/sa_export_fix branch from e0901f7 to d306766 Compare May 4, 2026 18:39

kaix-nv enabled auto-merge (squash) May 4, 2026 18:40

kaix-nv requested review from meenchen May 4, 2026 18:40

fix code quality

09ffe48

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

kevalmorabia97 force-pushed the kaix/sa_export_fix branch from 8e6d31f to 09ffe48 Compare May 4, 2026 19:35

meenchen approved these changes May 4, 2026

View reviewed changes

kaix-nv merged commit 06ef935 into main May 4, 2026
47 checks passed

kaix-nv deleted the kaix/sa_export_fix branch May 4, 2026 21:10

kevalmorabia97 mentioned this pull request May 5, 2026

[Cherry-pick] PRs #1352 #1351 #1330 #1354 #1355 #1360 #1342 #1324 #1340 #1368 #1373 #1359 #1361 #1325 #1369 #1370 #1371 #1375 #1386 #1353 #1356 #1390 #1385

Merged

kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sparsity-only export emitting empty hf_quant_config.json#1375

Fix sparsity-only export emitting empty hf_quant_config.json#1375
kaix-nv merged 2 commits intomainfrom
kaix/sa_export_fix

kaix-nv commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

codecov Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

shengliangxu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kaix-nv commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kaix-nv commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading

codecov Bot commented Apr 29, 2026 •

edited

Loading