Add AutoQuantize recipe support to mtq.auto_quantize#1523
Conversation
Signed-off-by: Juhi Mittal <juhim@nvidia.com>
📝 WalkthroughWalkthroughThis PR introduces AutoQuantize recipe support, enabling users to define auto-quantization configurations declaratively in recipe files. It adds new Pydantic schema models for recipe validation, refactors the auto_quantize function to accept explicit parameters, integrates recipe loading with fail-fast validation, and provides test coverage and example recipes. ChangesAutoQuantize Recipe Feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Line 1087: The conditional that chooses the auto-quantize branch incorrectly
treats falsy numeric values as "unset" — change the check so it explicitly tests
presence of the CLI value: in the expression that currently reads "if
isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits",
replace the truthy check with an explicit presence check for
args.auto_quantize_bits (e.g., use "args.auto_quantize_bits is not None") so
that values like 0 or 0.0 are honored; keep the ModelOptAutoQuantizeRecipe
isinstance check as-is.
In `@modelopt/recipe/config.py`:
- Around line 112-117: The qformat field currently accepts any string but should
be validated against the allowed keys; update the ModeloptField declaration for
qformat (and/or add a pydantic validator on the recipe class handling kv_cache)
to reject values not in KV_QUANT_CFG_CHOICES or the literal 'none' (allowing
None), raising a clear schema/validation error at recipe-load time instead of
allowing a later KeyError; ensure you reference the qformat field,
ModeloptField, and KV_QUANT_CFG_CHOICES when implementing the check so invalid
inputs are caught early.
In `@tests/unit/recipe/test_loader.py`:
- Around line 286-293: The test contains a function-local import "import
modelopt.torch.quantization as mtq" inside
test_load_recipe_autoquantize_candidates_match_presets; move that import to
module scope (top of tests/unit/recipe/test_loader.py) so mtq is imported during
collection rather than inside the test, then remove the local import from the
function and leave the assertions using mtq.NVFP4_DEFAULT_CFG and
mtq.FP8_DEFAULT_CFG unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 48396567-3825-425a-b877-f63b60bb6545
📒 Files selected for processing (4)
examples/llm_ptq/hf_ptq.pymodelopt/recipe/config.pymodelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yamltests/unit/recipe/test_loader.py
| # All auto_quantize() knobs are resolved here before calling the helper. | ||
| # Helper is a leaf orchestrator — it does not know whether inputs came from | ||
| # CLI args or a recipe. | ||
| if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits: |
There was a problem hiding this comment.
Use explicit is not None for --auto_quantize_bits gating.
Line 1087 uses a truthy check, so --auto_quantize_bits 0.0 skips auto-quantize and silently takes the mono-quantization path.
Proposed fix
- if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits:
+ if isinstance(recipe, ModelOptAutoQuantizeRecipe) or args.auto_quantize_bits is not None:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/llm_ptq/hf_ptq.py` at line 1087, The conditional that chooses the
auto-quantize branch incorrectly treats falsy numeric values as "unset" — change
the check so it explicitly tests presence of the CLI value: in the expression
that currently reads "if isinstance(recipe, ModelOptAutoQuantizeRecipe) or
args.auto_quantize_bits", replace the truthy check with an explicit presence
check for args.auto_quantize_bits (e.g., use "args.auto_quantize_bits is not
None") so that values like 0 or 0.0 are honored; keep the
ModelOptAutoQuantizeRecipe isinstance check as-is.
| qformat: str | None = ModeloptField( | ||
| default=None, | ||
| title="KV cache quantization format", | ||
| description="One of the entries in KV_QUANT_CFG_CHOICES, or 'none' to disable. " | ||
| "If omitted, the runtime --kv_cache_qformat CLI flag is used.", | ||
| ) |
There was a problem hiding this comment.
Validate kv_cache.qformat at recipe-load time.
Line 112 accepts any string, but downstream lookup expects a fixed set of keys; invalid values will fail later as a KeyError instead of a schema error.
Proposed fix
class AutoQuantizeKVCache(ModeloptBaseConfig):
"""KV-cache configuration for an AutoQuantize recipe (optional)."""
+ _SUPPORTED_QFORMATS = {
+ "none",
+ "fp8_cast",
+ "fp8",
+ "fp8_affine",
+ "nvfp4_cast",
+ "nvfp4",
+ "nvfp4_affine",
+ "nvfp4_rotate",
+ }
+
qformat: str | None = ModeloptField(
default=None,
title="KV cache quantization format",
description="One of the entries in KV_QUANT_CFG_CHOICES, or 'none' to disable. "
"If omitted, the runtime --kv_cache_qformat CLI flag is used.",
)
+
+ `@field_validator`("qformat")
+ `@classmethod`
+ def _validate_qformat(cls, v: str | None) -> str | None:
+ if v is not None and v not in cls._SUPPORTED_QFORMATS:
+ raise ValueError(f"Unsupported kv_cache.qformat: {v}")
+ return v🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt/recipe/config.py` around lines 112 - 117, The qformat field
currently accepts any string but should be validated against the allowed keys;
update the ModeloptField declaration for qformat (and/or add a pydantic
validator on the recipe class handling kv_cache) to reject values not in
KV_QUANT_CFG_CHOICES or the literal 'none' (allowing None), raising a clear
schema/validation error at recipe-load time instead of allowing a later
KeyError; ensure you reference the qformat field, ModeloptField, and
KV_QUANT_CFG_CHOICES when implementing the check so invalid inputs are caught
early.
| def test_load_recipe_autoquantize_candidates_match_presets(): | ||
| """Built-in AutoQuantize recipe's $imported candidates equal mtq.X_DEFAULT_CFG dicts.""" | ||
| import modelopt.torch.quantization as mtq | ||
|
|
||
| recipe = load_recipe("general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast") | ||
| candidates = recipe.auto_quantize.candidate_formats | ||
| assert candidates[0].model_dump(exclude_unset=True) == mtq.NVFP4_DEFAULT_CFG | ||
| assert candidates[1].model_dump(exclude_unset=True) == mtq.FP8_DEFAULT_CFG |
There was a problem hiding this comment.
Move the new in-test import to module scope.
Line 288 introduces a function-local import without a justification comment. In this test suite, imports should be at file top so failures surface during collection, not mid-test.
Proposed fix
import pytest
+import modelopt.torch.quantization as mtq
from modelopt.recipe.config import (
ModelOptAutoQuantizeRecipe,
@@
def test_load_recipe_autoquantize_candidates_match_presets():
"""Built-in AutoQuantize recipe's $imported candidates equal mtq.X_DEFAULT_CFG dicts."""
- import modelopt.torch.quantization as mtq
-
recipe = load_recipe("general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast")As per coding guidelines: “Imports inside functions or test methods without explicit justification… Imports belong at the top of the file…”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/recipe/test_loader.py` around lines 286 - 293, The test contains a
function-local import "import modelopt.torch.quantization as mtq" inside
test_load_recipe_autoquantize_candidates_match_presets; move that import to
module scope (top of tests/unit/recipe/test_loader.py) so mtq is imported during
collection rather than inside the test, then remove the local import from the
function and leave the assertions using mtq.NVFP4_DEFAULT_CFG and
mtq.FP8_DEFAULT_CFG unchanged.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1523 +/- ##
===========================================
+ Coverage 66.36% 76.69% +10.33%
===========================================
Files 476 476
Lines 51811 51838 +27
===========================================
+ Hits 34384 39759 +5375
+ Misses 17427 12079 -5348
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add AutoQuantize YAML based recipe support to
mtq.auto_quantizeWhat does this PR do?
Type of change: New feature.
Extends the recipe system (PR #1423) to support
mtq.auto_quantize. Userscan now run autoquant via a single
--recipe <name>flag instead ofcombining
--auto_quantize_bits,--qformat,--auto_quantize_method,etc. The recipe carries the full search spec — candidate formats, budget,
scoring method, KV cache scheme — as a typed YAML.
Mirrors the existing PTQ recipe pattern (PR #1423): recipe is authoritative
for the search; CLI flags supply runtime concerns (dataset, calib size,
batch size).
Usage
Example recipe (
modelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml):Key design points
constraintsshapemtq.auto_quantizenested dict exactly — zero-transformation dispatch via.model_dump(exclude_none=True). Future-compat with PR #1497 (cost models).kv_cache.qformatfield, not per-candidate$import(avoids duplication when KV is shared across candidates).effective_bits, candidates, etc.). CLI may fall back only for orthogonal post-step fields — today onlykv_cache.qformat.--auto_quantize_bits + --recipeerrors out explicitly.auto_quantize()helper layoutquantize_main.Testing
Unit tests (
tests/unit/recipe/test_loader.py, 7 tests):method=gradient,num_score_steps=128,score_checkpoint=None)$importedcandidates byte-identical tomtq.NVFP4_DEFAULT_CFG/FP8_DEFAULT_CFG(single source of truth)auto_quantizesection,<2candidates,effective_bitsoutside(0, 16]kv_cachefield is optionalEquivalence smoke on
Qwen/Qwen3-8Bat--calib_size 512:hf_quant_config.jsonis byte-identical between the two paths.Backward compatibility
✅ Yes. All four existing flows preserved:
--qformat nvfp4)--auto_quantize_bits 4.8)--recipe general/ptq/...)One new explicit error:
--auto_quantize_bits + --recipe(previously would silently honor recipe). Fails fast with a clear message.Files changed
modelopt/recipe/config.py— Pydantic schema (AutoQuantizeConfig, etc.) +RecipeType.AUTO_QUANTIZEenum + dispatch entryexamples/llm_ptq/hf_ptq.py— dispatch site resolves recipe/CLI knobs and passes them toauto_quantize()as kw-only kwargs; helper signature is pure value-drivenmodelopt_recipes/general/auto_quantize/nvfp4_fp8_at_4p8bits-kv_fp8_cast.yaml— example recipetests/unit/recipe/test_loader.py— 7 unit testsChecklist
git commit -s -S)/claude review)Summary by CodeRabbit
New Features
Tests