[OMNIML-4775] Move built-in PTQ quantization configs to YAML#1423
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMigrate quantization presets and numerics from inline Python literals to schema-backed YAML files, enhance config loader list/$import handling, add many numerics/unit/preset YAMLs, update examples to load presets safely, and add dynamic test discovery plus import-related tests. ChangesQuantization Config Schema-Backed Migration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1423 +/- ##
==========================================
+ Coverage 76.87% 76.88% +0.01%
==========================================
Files 474 474
Lines 51560 51578 +18
==========================================
+ Hits 39635 39656 +21
+ Misses 11925 11922 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
df8c002 to
aeaae95
Compare
89a3e34 to
e24eb16
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
NVFP4 and Mamba-MoE presets re-inlined the same weight+input quantizer pairs that units/w4a4_nvfp4_nvfp4 and units/w8a8_fp8_fp8 already encode. Switch the 5 NVFP4 presets (default + awq lite/clip/full + svdquant) and the 4 mamba_moe presets to $import those units. Add three new units following the component_numerics convention: - attention_qkv_fp8 for the FP8 q/k/v bmm + softmax block shared by model/ and diffusers/ nvfp4_fp8_mha presets - block_sparse_moe_nvfp4 for the *block_sparse_moe* W+A pair shared by nvfp4_mlp_only, nvfp4_experts_only, nvfp4_omlp_only - experts_nvfp4 for the *.experts.* W+A pair shared by nvfp4_mlp_only and nvfp4_experts_only Net -118/+121 lines across 18 files; no behavior change (the model nvfp4_fp8_mha glob *q/*k/*v_bmm is replaced by the bracket form *[qkv]_bmm, which matches the same names). Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
e24eb16 to
56aa639
Compare
main added W4A16_NVFP4_CFG (weight-only NVFP4 for all linears) after the YAML-conversion branch was forked, so it landed in choices but had no YAML preset and no load_config call after rebase. Add the YAML preset and the load_config call to match the rest of the constants. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
modelopt_recipes/configs/ptq/units/kv_nvfp4_rotate.yaml (1)
30-33:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
v_bmm_quantizeris missingrotate: truein a KV-rotate preset.Line 30 configures V with
$import: nvfp4only, so this preset rotates K but not V despite the file’s KV-rotate intent.Proposed fix
- quantizer_name: '*v_bmm_quantizer' cfg: $import: nvfp4 + rotate: true🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt_recipes/configs/ptq/units/kv_nvfp4_rotate.yaml` around lines 30 - 33, The V quantizer entry using quantizer_name '*v_bmm_quantizer' in the KV-rotate preset is missing rotate: true; update the cfg for that quantizer (the block with $import: nvfp4) to include rotate: true so V is rotated just like K (i.e., modify the cfg under '*v_bmm_quantizer' to add rotate: true alongside the existing $import).modelopt/torch/opt/config_loader.py (1)
513-529:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSupport
list[T]snippet splicing forT | list[T]fields.This still rejects bare
$importof alist[T]snippet when the containing field is annotated asT | list[T](for exampleQuantizerCfgEntry.cfg).imported.schema_typeis compared against the whole union, so onlyTsnippets append;list[T]snippets never splice even when the runtime value is already in list mode.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/opt/config_loader.py` around lines 513 - 529, The current splice logic rejects an imported list snippet when the target field type is a union like T | list[T] because imported.schema_type is compared only to the full element_schema; update the check around _schema_equal(imported.schema_type, list_schema) and the element handling in config_loader.py to also consider when element_schema is a Union that contains a list variant: unwrap element_schema via _unwrap_schema_type (and handle get_origin == typing.Union) and if any union member equals a list type that matches imported.schema_type then treat the import as a list and return list(imported.data); use existing helpers _schema_equal and _unwrap_schema_type and the symbols imported.schema_type, element_schema, element_schema_unwrapped, and list_schema to locate and modify the branch so both plain T and list[T] union members splice correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/diffusers/quantization/quantize.py`:
- Around line 118-125: The code currently sets apply_int8_percentile_calibrator
= True for all INT8 configs regardless of algorithm, which enables the
percentile calibrator for SmoothQuant; update the logic so
apply_int8_percentile_calibrator is set to True only when self.config.format ==
QuantFormat.INT8 AND self.config.algo != QuantAlgo.SMOOTHQUANT AND
self.config.collect_method != CollectMethod.DEFAULT. Modify the block around
apply_int8_percentile_calibrator (referencing apply_int8_percentile_calibrator,
self.config.format, QuantFormat.INT8, self.config.algo, QuantAlgo.SMOOTHQUANT,
and self.config.collect_method) to include that extra algo != SMOOTHQUANT guard
when assigning the flag.
In `@modelopt_recipes/configs/ptq/presets/model/nvfp4_experts_only.yaml`:
- Around line 27-37: The file defines explicit quantizer patterns
'*mlp.experts*weight_quantizer' and '*mlp.experts*input_quantizer' which
duplicate patterns already provided by the imported unit experts_nvfp4 (which
defines '*.experts.*weight_quantizer' and '*.experts.*input_quantizer'); remove
the two explicit quantizer entries (the blocks configuring quantizer_name:
'*mlp.experts*weight_quantizer' and quantizer_name:
'*mlp.experts*input_quantizer') so the configuration relies solely on the
experts_nvfp4 import to avoid duplicate/overlapping quantizer definitions.
In `@modelopt_recipes/configs/ptq/presets/model/nvfp4_mlp_only.yaml`:
- Around line 27-37: quant_cfg currently contains overlapping patterns: the
explicit pattern '*mlp*weight_quantizer' and the imported unit experts_nvfp4
(which defines '*.experts.*weight_quantizer') both match expert-layer
quantizers, causing duplicate entries; fix by making the mlp-specific patterns
more precise (e.g., change '*mlp*weight_quantizer' and '*mlp*input_quantizer' to
only target non-expert mlp paths such as '*mlp.weight_quantizer' or
'*mlp.experts*' as appropriate) or remove/disable the conflicting entries so
that experts_nvfp4 exclusively configures '*.experts.*weight_quantizer' and
'*.experts.*input_quantizer' while quant_cfg retains only the non-expert mlp
patterns.
In `@modelopt_recipes/configs/ptq/presets/model/w4a8_mxfp4_fp8.yaml`:
- Line 25: The YAML key "algorithm" is left null in this W4A8 preset; update the
"algorithm" field to an explicit algorithm name matching sibling W4A8 presets
(e.g., the same string used in other W4A8 preset files) so the preset behavior
is unambiguous—edit the "algorithm" entry in this preset to the explicit
algorithm identifier used by its peers.
In `@modelopt_recipes/configs/ptq/presets/README.md`:
- Around line 4-5: Change the README wording that refers to `*_CFG` as “dicts”
to state they are schema-backed QuantizeConfig objects (with mapping-style
access); update the sentence mentioning `*_CFG` and
`modelopt.torch.quantization.config` (e.g., `FP8_DEFAULT_CFG`) to say these are
QuantizeConfig instances rather than plain dicts and note that they support
mapping-style access for backwards-compatible lookup.
In `@modelopt/torch/quantization/config.py`:
- Around line 1320-1379: KV preset YAMLs intentionally omit the "algorithm"
field but loading them through load_config(..., schema_type=QuantizeConfig)
populates a default "max" value and can clobber the base config when merged;
update the FP8_KV_CFG, FP8_AFFINE_KV_CFG, NVFP4_AFFINE_KV_CFG, NVFP4_KV_CFG,
NVFP4_KV_ROTATE_CFG (and any other *_KV_CFG) to be loaded without applying
QuantizeConfig defaults (e.g., call load_config with a plain
dict/schema_type=dict or with an option that disables default-filling) so the
loaded config preserves a missing "algorithm" key for proper merge behavior.
In `@tests/unit/recipe/test_loader.py`:
- Around line 1461-1464: The failing tests call an undefined helper _cfg_to_dict
when asserting the quant_cfg structure; add a small helper function in
tests/unit/recipe/test_loader.py (or inline the conversion) that serializes the
cfg object into a plain dict (mirror what the original assertions expect) and
use it for the two assertions that reference _cfg_to_dict (the ones asserting
_cfg_to_dict(data["quant_cfg"][0]["cfg"]) and the similar assertion at lines
~1486-1489); implement the helper to accept the cfg object and return a
dict/list representation of fields like "num_bits" and "block_sizes" so the
assertions can validate the union-typed list import path without a NameError.
---
Outside diff comments:
In `@modelopt_recipes/configs/ptq/units/kv_nvfp4_rotate.yaml`:
- Around line 30-33: The V quantizer entry using quantizer_name
'*v_bmm_quantizer' in the KV-rotate preset is missing rotate: true; update the
cfg for that quantizer (the block with $import: nvfp4) to include rotate: true
so V is rotated just like K (i.e., modify the cfg under '*v_bmm_quantizer' to
add rotate: true alongside the existing $import).
In `@modelopt/torch/opt/config_loader.py`:
- Around line 513-529: The current splice logic rejects an imported list snippet
when the target field type is a union like T | list[T] because
imported.schema_type is compared only to the full element_schema; update the
check around _schema_equal(imported.schema_type, list_schema) and the element
handling in config_loader.py to also consider when element_schema is a Union
that contains a list variant: unwrap element_schema via _unwrap_schema_type (and
handle get_origin == typing.Union) and if any union member equals a list type
that matches imported.schema_type then treat the import as a list and return
list(imported.data); use existing helpers _schema_equal and _unwrap_schema_type
and the symbols imported.schema_type, element_schema, element_schema_unwrapped,
and list_schema to locate and modify the branch so both plain T and list[T]
union members splice correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 48fef37b-6fbf-4823-9076-7b400163a3bf
📒 Files selected for processing (90)
examples/diffusers/quantization/config.pyexamples/diffusers/quantization/quantize.pyexamples/llm_autodeploy/run_auto_quantize.pymodelopt/torch/opt/config_loader.pymodelopt/torch/quantization/config.pymodelopt_recipes/configs/numerics/fp8.yamlmodelopt_recipes/configs/numerics/int4_per_block.yamlmodelopt_recipes/configs/numerics/int8.yamlmodelopt_recipes/configs/numerics/int8_per_channel.yamlmodelopt_recipes/configs/numerics/mxfp4.yamlmodelopt_recipes/configs/numerics/mxfp6.yamlmodelopt_recipes/configs/numerics/mxfp8.yamlmodelopt_recipes/configs/numerics/mxint8.yamlmodelopt_recipes/configs/numerics/nvfp4.yamlmodelopt_recipes/configs/numerics/nvfp4_bs32.yamlmodelopt_recipes/configs/numerics/nvfp4_static.yamlmodelopt_recipes/configs/ptq/presets/README.mdmodelopt_recipes/configs/ptq/presets/diffusers/fp8.yamlmodelopt_recipes/configs/ptq/presets/diffusers/int8.yamlmodelopt_recipes/configs/ptq/presets/diffusers/nvfp4.yamlmodelopt_recipes/configs/ptq/presets/diffusers/nvfp4_fp8_mha.yamlmodelopt_recipes/configs/ptq/presets/kv/fp8.yamlmodelopt_recipes/configs/ptq/presets/kv/fp8_affine.yamlmodelopt_recipes/configs/ptq/presets/kv/nvfp4.yamlmodelopt_recipes/configs/ptq/presets/kv/nvfp4_affine.yamlmodelopt_recipes/configs/ptq/presets/kv/nvfp4_rotate.yamlmodelopt_recipes/configs/ptq/presets/model/fp8.yamlmodelopt_recipes/configs/ptq/presets/model/fp8_2d_blockwise_weight_only.yamlmodelopt_recipes/configs/ptq/presets/model/fp8_per_channel_per_token.yamlmodelopt_recipes/configs/ptq/presets/model/int4_awq.yamlmodelopt_recipes/configs/ptq/presets/model/int4_blockwise_weight_only.yamlmodelopt_recipes/configs/ptq/presets/model/int8.yamlmodelopt_recipes/configs/ptq/presets/model/int8_smoothquant.yamlmodelopt_recipes/configs/ptq/presets/model/int8_weight_only.yamlmodelopt_recipes/configs/ptq/presets/model/mamba_moe_fp8_aggressive.yamlmodelopt_recipes/configs/ptq/presets/model/mamba_moe_fp8_conservative.yamlmodelopt_recipes/configs/ptq/presets/model/mamba_moe_nvfp4_aggressive.yamlmodelopt_recipes/configs/ptq/presets/model/mamba_moe_nvfp4_conservative.yamlmodelopt_recipes/configs/ptq/presets/model/mxfp4.yamlmodelopt_recipes/configs/ptq/presets/model/mxfp4_mlp_weight_only.yamlmodelopt_recipes/configs/ptq/presets/model/mxfp6.yamlmodelopt_recipes/configs/ptq/presets/model/mxfp8.yamlmodelopt_recipes/configs/ptq/presets/model/mxint8.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_awq_clip.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_awq_full.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_awq_lite.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_experts_only.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_fp8_mha.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_mlp_only.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_mlp_weight_only.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_omlp_only.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_svdquant.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_w4a4_weight_local_hessian.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_w4a4_weight_mse_fp8_sweep.yamlmodelopt_recipes/configs/ptq/presets/model/w4a16_nvfp4.yamlmodelopt_recipes/configs/ptq/presets/model/w4a8_awq_beta.yamlmodelopt_recipes/configs/ptq/presets/model/w4a8_mxfp4_fp8.yamlmodelopt_recipes/configs/ptq/presets/model/w4a8_nvfp4_fp8.yamlmodelopt_recipes/configs/ptq/units/README.mdmodelopt_recipes/configs/ptq/units/attention_qkv_fp8.yamlmodelopt_recipes/configs/ptq/units/base_disable_all.yamlmodelopt_recipes/configs/ptq/units/block_sparse_moe_nvfp4.yamlmodelopt_recipes/configs/ptq/units/default_disabled_quantizers.yamlmodelopt_recipes/configs/ptq/units/experts_nvfp4.yamlmodelopt_recipes/configs/ptq/units/kv_fp8.yamlmodelopt_recipes/configs/ptq/units/kv_fp8_affine.yamlmodelopt_recipes/configs/ptq/units/kv_fp8_cast.yamlmodelopt_recipes/configs/ptq/units/kv_nvfp4.yamlmodelopt_recipes/configs/ptq/units/kv_nvfp4_affine.yamlmodelopt_recipes/configs/ptq/units/kv_nvfp4_cast.yamlmodelopt_recipes/configs/ptq/units/kv_nvfp4_rotate.yamlmodelopt_recipes/configs/ptq/units/mamba_moe_disabled_quantizers.yamlmodelopt_recipes/configs/ptq/units/w4a4_nvfp4_nvfp4.yamlmodelopt_recipes/configs/ptq/units/w8a8_fp8_fp8.yamlmodelopt_recipes/general/ptq/fp8_default-kv_fp8.yamlmodelopt_recipes/general/ptq/fp8_default-kv_fp8_cast.yamlmodelopt_recipes/general/ptq/nvfp4_default-kv_fp8.yamlmodelopt_recipes/general/ptq/nvfp4_default-kv_fp8_cast.yamlmodelopt_recipes/general/ptq/nvfp4_default-kv_none-gptq.yamlmodelopt_recipes/general/ptq/nvfp4_default-kv_nvfp4_cast.yamlmodelopt_recipes/general/ptq/nvfp4_experts_only-kv_fp8.yamlmodelopt_recipes/general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yamlmodelopt_recipes/general/ptq/nvfp4_mlp_only-kv_fp8.yamlmodelopt_recipes/general/ptq/nvfp4_mlp_only_mse-kv_fp8_cast.yamlmodelopt_recipes/general/ptq/nvfp4_omlp_only-kv_fp8.yamlmodelopt_recipes/general/speculative_decoding/dflash.yamlmodelopt_recipes/general/speculative_decoding/eagle3.yamlmodelopt_recipes/models/Step3.5-Flash/nvfp4-mlp-only.yamltests/unit/recipe/test_loader.py
The "update descriptions" commit (d834078) accidentally deleted the `metadata.recipe_type` and `metadata.description` keys from general/speculative_decoding/eagle3.yaml and dflash.yaml while shortening their header comments. The recipe loader requires `metadata.recipe_type`, so `test_load_recipe_eagle_builtin` and `test_load_recipe_dflash_builtin` failed with "Recipe file ... must contain a 'metadata.recipe_type' field." Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Restore backward compatibility for callers that read the legacy hard-coded *_CFG values as plain dicts (e.g. hf_ptq.py's _set_kv_cache_constant_amax does `entry["cfg"]` and expects a dict, not a QuantizerAttributeConfig pydantic instance). - Add `_load_quantize_config_dict(path)` helper that wraps `load_config(..., schema_type=QuantizeConfig).model_dump(exclude_unset=True)` and switch all 38 *_CFG module-level constants to use it; their annotations are now `dict[str, Any]` instead of `QuantizeConfig`. - Simplify `_load_quantizer_cfg_dict_list` from a 17-line two-branch function plus a `_quantizer_cfg_entry_to_dict` helper down to a 4-line list/single normalization. The three call sites (`base_disable_all`, `default_disabled_quantizers`, `mamba_moe_disabled_quantizers`) all load schema-typed YAMLs so the Mapping/`isinstance(QuantizerCfgEntry)` fallbacks were defensive-only. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
modelopt/torch/quantization/config.py (1)
1216-1220: ⚡ Quick winRestore explicit type handling in the quantizer-snippet loader.
This helper now assumes every
load_config()result is aQuantizerCfgEntry. If one of these snippets ever comes back as a plain mapping, module import will fail with'dict' object has no attribute model_dump'instead of a clear schema error. Please mirror the defensive shape check used in_load_quantizer_attribute_dicthere too.Suggested hardening
def _load_quantizer_cfg_dict_list(config_path: str) -> list[dict[str, Any]]: """Load a QuantizerCfgEntry or QuantizerCfgListConfig snippet as public dict entries.""" config = load_config(config_path) entries = config if isinstance(config, list) else [config] - return [e.model_dump(exclude_unset=True) for e in entries] + result: list[dict[str, Any]] = [] + for entry in entries: + if isinstance(entry, QuantizerCfgEntry): + result.append(entry.model_dump(exclude_unset=True)) + elif isinstance(entry, Mapping): + result.append(dict(entry)) + else: + raise TypeError( + f"{config_path} must declare QuantizerCfgEntry or a list of QuantizerCfgEntry." + ) + return result🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modelopt/torch/quantization/config.py` around lines 1216 - 1220, In _load_quantizer_cfg_dict_list, restore defensive type handling like _load_quantizer_attribute_dict: after calling load_config(config_path) and normalizing entries to a list, iterate entries and for each item return item directly if it's a plain dict, call item.model_dump(exclude_unset=True) if it is a model-like object (e.g., QuantizerCfgEntry / QuantizerCfgListConfig), and otherwise raise a clear TypeError indicating the snippet at config_path had an unexpected shape; reference the helper name (_load_quantizer_cfg_dict_list), the load_config result, and the expected QuantizerCfgEntry type in the error message so imports fail with a schema error instead of an attribute error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1216-1220: In _load_quantizer_cfg_dict_list, restore defensive
type handling like _load_quantizer_attribute_dict: after calling
load_config(config_path) and normalizing entries to a list, iterate entries and
for each item return item directly if it's a plain dict, call
item.model_dump(exclude_unset=True) if it is a model-like object (e.g.,
QuantizerCfgEntry / QuantizerCfgListConfig), and otherwise raise a clear
TypeError indicating the snippet at config_path had an unexpected shape;
reference the helper name (_load_quantizer_cfg_dict_list), the load_config
result, and the expected QuantizerCfgEntry type in the error message so imports
fail with a schema error instead of an attribute error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 86f18af0-3b4d-42b0-8e9c-91fbae29c5af
📒 Files selected for processing (1)
modelopt/torch/quantization/config.py
The prior commit retyped the *_CFG module-level constants from QuantizeConfig to dict[str, Any], but SUPPORT_QUANT_FORMAT in examples/llm_autodeploy/run_auto_quantize.py was still annotated as dict[str, QuantizeConfig], which mypy flagged. Switch to dict[str, dict[str, Any]] and drop the now-unused QuantizeConfig import. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Large refactor (90 files / 2776 LOC) that extracts hard-coded *_CFG dicts in modelopt/torch/quantization/config.py into YAML presets composed via the existing $import system. Design check passes — this is a continuation of the already-merged $import infrastructure (PR #1253), not a new abstraction, so the design-review protocol is satisfied. The refactor is mostly mechanical and the YAML files visibly correspond to the previous Python dicts.
Three things a human owner should look at:
-
Behavioral change in
examples/diffusers/quantization/quantize.py— the newapply_int8_percentile_calibratorflag explicitly excludesQuantAlgo.SMOOTHQUANT, but the original code appliedreset_set_int8_configwhenevercollect_method != DEFAULT, including with smoothquant (and thereset_set_int8_configdocstring even referencesINT8_SMOOTHQUANT_CFG). This drops the smoothquant + non-default collect-method codepath for Conv2d input quantizers. It may be an intentional fix for a latent bug (smoothquant + percentile-Conv2d combo + global mutation), but it is not called out in the PR description. Worth confirming intent. -
pytestwas not run locally (per the PR's own Testing section). For a refactor of this scope touchingmodelopt.torch.quantization.config(an extensively used public surface), CI / a local run of the quantization unit tests should be confirmed green before merge — the existing*_CFGconstants are imported widely and a single mismatched YAML field would only surface there. -
Subtle field additions from numerics snippets: e.g. the new
attention_qkv_fp8unit importsnumerics/fp8whose YAML setsaxis: null, so*[qkv]_bmm_quantizer.cfgnow containsaxis: nullwhere the old hardcodedNVFP4_FP8_MHA_CONFIGhad{"num_bits": (4, 3)}only. Same pattern infp8_per_channel_per_token, etc. These should be no-ops semantically (axis defaults to None), but are an observable surface change for any consumer that does"axis" in cfgstyle probes — worth a quick confirmation.
Other notes: the _list_element_schema extension to handle Union types in config_loader.py and the new _resolve_list_import dict-into-list-element path are needed for w4a8_awq_beta.yaml's list-of-dicts cfg, and the new tests (test_import_dict_snippet_imports_in_union_typed_list_field, test_import_dict_snippet_in_union_typed_list_field_with_inline_item) cover them. The auto-discovery of built-in snippets in test_loader.py is a nice scalability improvement over the hard-coded list.
No licensing changes beyond the standard NVIDIA header on new files.
|
@cjluo-nv thanks for the review. Addressing the three points:
(Note: the |
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review of large YAML refactor (90 files, +2055/-721) extracting hardcoded *_CFG dicts in modelopt/torch/quantization/config.py into YAML presets composed via the existing $import system.
Design review: Passes — this is a continuation of the already-merged $import infrastructure (PR #1253), not a new abstraction.
Previous review comments — classification & status:
- Critical comments addressed:
_cfg_to_dicthelper added in tests;apply_int8_percentile_calibratorcorrectly gates onalgo != SMOOTHQUANT; explicit*mlp.experts*duplicates removed fromnvfp4_experts_only.yaml; README wording corrected. - Critical comments explicitly justified-not-adopted (rationale checked and reasonable):
nvfp4_mlp_only.yamloverlapping*mlp*and*.experts.*patterns: mirrors original Python_nvfp4_selective_quant_cfg(["*mlp*", "*block_sparse_moe*", "*.experts.*"]); idempotent. ✓w4a8_mxfp4_fp8.yamlalgorithm: null: matches main behavior for MX-family configs. ✓kv_nvfp4_rotate.yamlV missingrotate: true: verified against originalNVFP4_KV_ROTATE_CFG— V intentionally omitsrotateon main; YAML matches. ✓- KV
*_KV_CFGdefault-filledalgorithm: "max": confirmed all consumers extract["quant_cfg"]only, so the default doesn't reach merge paths. ✓
- Outside-diff config_loader Union[T, list[T]] splice case: not addressed but only relevant for hypothetical future schemas (current YAMLs work).
Why nudge instead of approve:
- Behavioral change in
examples/diffusers/quantization/quantize.py: the newapply_int8_percentile_calibratorflag drops the SmoothQuant + non-default collect-method codepath. PR body frames this as a CLI-help-text alignment, but the originalreset_set_int8_configdocstring explicitly referencesINT8_SMOOTHQUANT_CFG. Worth a human owner confirming this is intentional. pytestwas not run locally per the PR's own Testing section. For a refactor of this scope touching the widely-imported*_CFGconstants, full CI green should be confirmed before merge.- Subtle observable surface changes: numerics snippets like
numerics/fp8carryaxis: null, so dumped configs now contain"axis": Nonekeys where the originals had only{"num_bits": (4, 3)}. No-op semantically but breaks"axis" in cfgprobes if any downstream consumer does that. - PR size (90 files / 2776 LOC) — though cohesive and not reasonably splittable for this kind of mechanical migration.
No licensing concerns: only standard NVIDIA headers on new files.
Walking back the apply_int8_percentile_calibrator SmoothQuant exclusion adopted in 296934a. The CLI-help-vs-code mismatch (`--percentile` / `--collect-method` advertise "works for INT8, not including smoothquant") is a pre-existing latent issue and should be addressed in a dedicated PR rather than bundled into this otherwise-mechanical YAML refactor. Behavior now matches main exactly: the percentile calibrator applies whenever `collect_method != DEFAULT`, regardless of algo. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review focused on whether further behavior changes have crept in beyond what the prior round flagged. Design protocol: passes — this remains a mechanical migration onto the already-merged $import system (PR #1253), not a new abstraction.
Status of prior critical comments:
- ✅
apply_int8_percentile_calibratorSmoothQuant gate — author intentionally walked it back to keep refactor mechanical (matches main). - ✅
nvfp4_experts_only.yamlduplicate*mlp.experts*patterns removed. - ✅
_cfg_to_dicttest helper added. - ✅ README "dicts" → "QuantizeConfig constants" wording updated.
⚠️ Justified-not-adopted (rationale verified):nvfp4_mlp_only.yamloverlap (mirrors_nvfp4_selective_quant_cfg, idempotent);w4a8_mxfp4_fp8.yamlalgorithm: null(matches MXFP-family convention on main);kv_nvfp4_rotate.yamlV missingrotate: true(matches mainNVFP4_KV_ROTATE_CFG);*_KV_CFGdefault-filledalgorithm: "max"is unreachable because every consumer extracts["quant_cfg"]only.- 🟡 Open from prior round (minor, not addressed):
config_loader.pyUnion[T, list[T]]splice path only matters for hypothetical future schemas — current YAMLs all work.
New behavior changes worth a human eye:
examples/diffusers/quantization/quantize.pynow doesbase_cfg = copy.deepcopy(base_cfg)plus amodel_dumpshim before applyingreset_set_int8_config. On main,reset_set_int8_configwas silently mutating the module-levelmtq.INT8_SMOOTHQUANT_CFG/ localINT8_DEFAULT_CONFIGglobals across calls — the new code finally honors the existing# Build a fresh config dict so we never mutate the global constantscomment. This is a latent-bug fix, not a regression, but it is a real behavioral delta that isn't called out in the PR body.choicesset inmodelopt/torch/quantization/config.pynow includesMXFP6_DEFAULT_CFGandNVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG. Both constants existed on main but were missing fromchoices; this widens the set of valid--qformat-style strings accepted by CLIs that gate onmtq.config.choices. Likely intentional cleanup but worth confirming._load_quantizer_cfg_dict_listwas simplified: it no longer has the dict/non-dict defensive branch that_load_quantizer_attribute_dictkeeps. Any future snippet whose root validates as a plain dict (rather thanQuantizerCfgEntry/list[QuantizerCfgEntry]) will fail withAttributeError: 'dict' object has no attribute 'model_dump'instead of a clearTypeError. Today's snippets are all schema-tagged so it doesn't bite, but the asymmetry with the sibling helper is worth a sanity-check call from the owner.
Other concerns from prior round still standing:
- PR is 90 files / +2052/-721 — cohesive, but
pytestwas not run locally per the PR body's own Testing section. For a refactor ofmtq.*_CFG(extensively imported), full CI green should be confirmed pre-merge. - The
numerics/fp8.yamletc. continue to emitaxis: nullinto dumped configs, matching main's explicit"axis": None. No-op semantically but observable for any consumer that does"axis" in cfgprobes.
Licensing: only standard NVIDIA headers on new files — no concern.
Recommend an owner with architectural context confirm CI green and sign off on the (latent-bug-fix) deepcopy semantics in the diffusers example and the choices set additions.
|
@cjluo-nv thanks for the re-review. #1 deepcopy in #2 #3 |
The `if hasattr(base_cfg, "model_dump"): base_cfg = base_cfg.model_dump( exclude_unset=True)` block was added when the `*_CFG` constants were QuantizeConfig pydantic instances. Commit 51c8e5f switched them back to plain dicts, so hasattr is always False and the line is dead. Keep the explicit `base_cfg = copy.deepcopy(base_cfg)` — it honors the existing "Build a fresh config dict so we never mutate the global constants" comment, which on main was aspirational only. Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review of mechanical YAML refactor (90 files, +2050/-721) extracting hardcoded *_CFG dicts into YAML presets composed via the already-merged $import system. Design protocol passes — continuation of PR #1253 infrastructure, not a new abstraction.
Status of prior critical comments:
- ✅
_cfg_to_dicthelper added intests/unit/recipe/test_loader.py:91-94 - ✅
nvfp4_experts_only.yaml— explicit*mlp.experts*duplicates removed - ✅ README wording corrected ("dicts" → "QuantizeConfig constants")
- ✅ Diffusers
quantize.py—copy.deepcopy(base_cfg)honors the existing comment; latent-bug fix called out in PR body - ✅
apply_int8_percentile_calibratorSmoothQuant gate — author intentionally reverted to keep refactor mechanical (matches main) - ✅ Justified-not-adopted (rationale verified):
nvfp4_mlp_only.yamloverlap (idempotent, mirrors_nvfp4_selective_quant_cfg);w4a8_mxfp4_fp8.yamlalgorithm: null(matches MXFP family);kv_nvfp4_rotate.yamlV missingrotate: true(matches main);*_KV_CFGdefault-filledalgorithm: "max"is unreachable (every consumer extracts["quant_cfg"]only) - 🟡 Open minor:
config_loader.pyUnion[T, list[T]]splice path not handled — only matters for hypothetical future schemas; current YAMLs work
Why nudge over approve:
_load_quantizer_cfg_dict_listwas simplified to drop the defensiveisinstance(item, Mapping)branch that the sibling_load_quantizer_attribute_dictkeeps. Fine today (all snippets are schema-tagged) but a future non-schema snippet would fail withAttributeError: 'dict' has no attribute 'model_dump'rather than a clearTypeError. Minor asymmetry worth a sanity-check.choicesset additions (MXFP6_DEFAULT_CFG,NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG) widen the set of valid--qformatstrings — likely intentional cleanup but a real public-API surface change not called out in the changelog/PR body summary.- Per the PR body's own Testing section,
pytestwas not run locally. For a refactor of the widely-importedmtq.*_CFGconstants, full CI green should be confirmed pre-merge. - Standard NVIDIA headers only — no licensing concerns.
Recommend a human owner with architectural context confirm CI is green and sign off on the latent-bug-fix deepcopy semantics in the diffusers example and the choices set additions.
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Re-review of mechanical YAML refactor (90 files, +2050/-721) extracting hardcoded *_CFG dicts into YAML presets via the already-merged $import system. Design protocol passes — continuation of PR #1253 infrastructure, not a new abstraction.
Status of prior critical comments:
- ✅
_cfg_to_dicthelper added intests/unit/recipe/test_loader.py:91-94. - ✅
nvfp4_experts_only.yaml— explicit*mlp.experts*duplicates removed. - ✅ README wording corrected ("dicts" → "QuantizeConfig constants").
- ✅ Diffusers
quantize.pycopy.deepcopy(base_cfg)honors the existing comment; called out in PR body as latent-bug fix. - ✅
apply_int8_percentile_calibratorSmoothQuant gate — author intentionally reverted to keep refactor mechanical (matches main). - ✅ Justified-not-adopted (rationale verified):
nvfp4_mlp_only.yamloverlap (idempotent, mirrors_nvfp4_selective_quant_cfg);w4a8_mxfp4_fp8.yamlalgorithm: null(matches MXFP family);kv_nvfp4_rotate.yamlV missingrotate: true(matches main);*_KV_CFGdefault-filledalgorithm: "max"is unreachable (every consumer extracts["quant_cfg"]only). - 🟡 Open minor:
config_loader.pyUnion[T, list[T]]splice path not handled — only matters for hypothetical future schemas; current YAMLs work.
Why nudge over approve:
_load_quantizer_cfg_dict_listwas simplified to drop the defensiveisinstance(item, Mapping)branch that the sibling_load_quantizer_attribute_dictkeeps — fine today (all snippets are schema-tagged) but the asymmetry with the sibling helper is worth a sanity-check.choicesset additions (MXFP6_DEFAULT_CFG,NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG) widen a public API surface — called out in PR body but worth human confirmation.- Diffusers
deepcopyis a real behavioral delta (latent-bug fix); called out in PR body but warrants human sign-off given it affects a shipped example. - PR size (90 files / 2771 LOC) — cohesive but large; per updated PR body, full unit test suite was run locally (2329 passed, 12 skipped); CI green should be confirmed pre-merge.
Standard NVIDIA headers only — no licensing concerns. Recommend a human owner with architectural context confirm CI green and sign off on the latent-bug-fix deepcopy semantics in the diffusers example, the choices set additions, and the _load_quantizer_cfg_dict_list defensive-branch removal.
cjluo-nv
left a comment
There was a problem hiding this comment.
LGTM if no behavior change
What does this PR do?
Type of change: refactor
This PR moves the built-in PTQ quantization config definitions out of hard-coded Python dictionaries and into schema-backed YAML config files, and factors shared blocks into reusable composable snippets.
modelopt_recipes/configs/numerics/.modelopt_recipes/configs/ptq/presets/model/.modelopt_recipes/configs/ptq/presets/kv/.modelopt_recipes/configs/ptq/presets/diffusers/and re-pointsexamples/diffusers/quantization/config.pyconstants at them viaload_config.kv_fp8_affine,kv_nvfp4,kv_nvfp4_affine,kv_nvfp4_rotate,kv_*_castvariants) undermodelopt_recipes/configs/ptq/units/.component_numerics[_type]convention:attention_qkv_fp8— FP8 E4M3 on attention q/k/v bmm and softmax quantizers; shared bymodel/anddiffusers/nvfp4_fp8_mhapresets.block_sparse_moe_nvfp4— NVFP4 W4A4 on*block_sparse_moe*weight/input quantizers; shared bynvfp4_mlp_only,nvfp4_experts_only,nvfp4_omlp_only.experts_nvfp4— NVFP4 W4A4 on*.experts.*weight/input quantizers; shared bynvfp4_mlp_onlyandnvfp4_experts_only.$importthe existingw4a4_nvfp4_nvfp4/w8a8_fp8_fp8units instead of re-inlining the same weight+input quantizer pairs.W4A16_NVFP4_CFGto YAML (presets/model/w4a16_nvfp4.yaml) composed from the existingunits/w4_nvfp4snippet.modelopt.torch.quantization.configbuilt-in config constants to loadQuantizeConfigobjects from YAML withload_config(..., schema_type=QuantizeConfig).model_dump(exclude_unset=True)via a new_load_quantize_config_dicthelper; the constants remain plaindict[str, Any]for backwards compatibility with consumers that do mapping-style mutation (e.g.entry["cfg"]assignment)._load_quantizer_cfg_dict_list) down to a 4-line list/single normalization now that the three call sites all load schema-typed YAMLs.Latent-bug fixes surfaced by the refactor
Two small correctness fixes are included alongside the mechanical refactor; flagging them explicitly:
examples/diffusers/quantization/quantize.py— adds an explicitbase_cfg = copy.deepcopy(base_cfg)before applying runtime overrides. The existing# Build a fresh config dict so we never mutate the global constantscomment had been aspirational only; in practicereset_set_int8_configaccumulatedPercentileCalibratorentries intomtq.INT8_SMOOTHQUANT_CFG/INT8_DEFAULT_CONFIGacross repeated calls, andset_quant_config_attraddedtrt_high_precision_dtypekeys into globally-shared cfg dicts. The deepcopy makes the code match the comment.choicesset inmodelopt/torch/quantization/config.py— addsMXFP6_DEFAULT_CFGandNVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFGto the documented public set of validmtq.*_CFGnames. Both constants exist on main but were missing fromchoices, so CLIs that gate onmtq.config.choices(e.g.,hf_ptq.py --qformat) couldn't reach them even though the configs themselves were fully supported.Usage
Existing Python imports continue to work:
The built-in constants are plain
dict[str, Any](sparse — only explicitly-set fields are present), but their definitions now come from YAML snippets and presets composed through the existing$importsystem.Reusable YAML snippets can be composed through
$import, for example:Testing
Local checks run:
nox -s "unit-3.10(torch_211, tf_latest)"— 2329 passed, 12 skipped.nox -s pre_commit_all— all hooks pass (ruff check / ruff format / mypy / YAML format / license / bandit / markdownlint).$importresolution sanity check across all changed config files.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
This PR was previously stacked on #1405, which has since merged to
main. The branch has been rebased ontomainand no longer depends on any other open PR.Summary by CodeRabbit
New Features
Refactor
Tests
Documentation
Chores