Skip to content

[OMNIML-4775] Move built-in PTQ quantization configs to YAML#1423

Merged
shengliangxu merged 21 commits into
mainfrom
shengliangx/all-yaml-configs
May 20, 2026
Merged

[OMNIML-4775] Move built-in PTQ quantization configs to YAML#1423
shengliangxu merged 21 commits into
mainfrom
shengliangx/all-yaml-configs

Conversation

@shengliangxu
Copy link
Copy Markdown
Collaborator

@shengliangxu shengliangxu commented May 9, 2026

What does this PR do?

Type of change: refactor

This PR moves the built-in PTQ quantization config definitions out of hard-coded Python dictionaries and into schema-backed YAML config files, and factors shared blocks into reusable composable snippets.

  • Adds reusable numeric config snippets under modelopt_recipes/configs/numerics/.
  • Adds YAML presets for the built-in model PTQ configs under modelopt_recipes/configs/ptq/presets/model/.
  • Adds YAML presets for KV-cache quantization configs under modelopt_recipes/configs/ptq/presets/kv/.
  • Adds YAML presets for the Diffusers-specific PTQ configs under modelopt_recipes/configs/ptq/presets/diffusers/ and re-points examples/diffusers/quantization/config.py constants at them via load_config.
  • Adds reusable KV quantization units (kv_fp8_affine, kv_nvfp4, kv_nvfp4_affine, kv_nvfp4_rotate, kv_*_cast variants) under modelopt_recipes/configs/ptq/units/.
  • Adds reusable model-side units following the component_numerics[_type] convention:
    • attention_qkv_fp8 — FP8 E4M3 on attention q/k/v bmm and softmax quantizers; shared by model/ and diffusers/ nvfp4_fp8_mha presets.
    • block_sparse_moe_nvfp4 — NVFP4 W4A4 on *block_sparse_moe* weight/input quantizers; shared by nvfp4_mlp_only, nvfp4_experts_only, nvfp4_omlp_only.
    • experts_nvfp4 — NVFP4 W4A4 on *.experts.* weight/input quantizers; shared by nvfp4_mlp_only and nvfp4_experts_only.
  • Switches the existing 5 NVFP4 presets (default + awq lite/clip/full + svdquant) and 4 mamba_moe presets to $import the existing w4a4_nvfp4_nvfp4 / w8a8_fp8_fp8 units instead of re-inlining the same weight+input quantizer pairs.
  • Moves the recently-added W4A16_NVFP4_CFG to YAML (presets/model/w4a16_nvfp4.yaml) composed from the existing units/w4_nvfp4 snippet.
  • Updates modelopt.torch.quantization.config built-in config constants to load QuantizeConfig objects from YAML with load_config(..., schema_type=QuantizeConfig).model_dump(exclude_unset=True) via a new _load_quantize_config_dict helper; the constants remain plain dict[str, Any] for backwards compatibility with consumers that do mapping-style mutation (e.g. entry["cfg"] assignment).
  • Simplifies the cfg-list loader (_load_quantizer_cfg_dict_list) down to a 4-line list/single normalization now that the three call sites all load schema-typed YAMLs.
  • Adds/updates recipe loader coverage for built-in schema-backed config snippets.

Latent-bug fixes surfaced by the refactor

Two small correctness fixes are included alongside the mechanical refactor; flagging them explicitly:

  • examples/diffusers/quantization/quantize.py — adds an explicit base_cfg = copy.deepcopy(base_cfg) before applying runtime overrides. The existing # Build a fresh config dict so we never mutate the global constants comment had been aspirational only; in practice reset_set_int8_config accumulated PercentileCalibrator entries into mtq.INT8_SMOOTHQUANT_CFG/INT8_DEFAULT_CONFIG across repeated calls, and set_quant_config_attr added trt_high_precision_dtype keys into globally-shared cfg dicts. The deepcopy makes the code match the comment.
  • choices set in modelopt/torch/quantization/config.py — adds MXFP6_DEFAULT_CFG and NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG to the documented public set of valid mtq.*_CFG names. Both constants exist on main but were missing from choices, so CLIs that gate on mtq.config.choices (e.g., hf_ptq.py --qformat) couldn't reach them even though the configs themselves were fully supported.

Usage

Existing Python imports continue to work:

import modelopt.torch.quantization as mtq

cfg = mtq.FP8_DEFAULT_CFG
model = mtq.quantize(model, cfg, forward_loop)

The built-in constants are plain dict[str, Any] (sparse — only explicitly-set fields are present), but their definitions now come from YAML snippets and presets composed through the existing $import system.

Reusable YAML snippets can be composed through $import, for example:

# modelopt-schema: modelopt.torch.quantization.config.QuantizeConfig
imports:
  base_disable_all: configs/ptq/units/base_disable_all
  w4a4_nvfp4_nvfp4: configs/ptq/units/w4a4_nvfp4_nvfp4
  default_disabled_quantizers: configs/ptq/units/default_disabled_quantizers

algorithm: max
quant_cfg:
  - $import: base_disable_all
  - $import: w4a4_nvfp4_nvfp4
  - $import: default_disabled_quantizers

Testing

Local checks run:

  • nox -s "unit-3.10(torch_211, tf_latest)" — 2329 passed, 12 skipped.
  • nox -s pre_commit_all — all hooks pass (ruff check / ruff format / mypy / YAML format / license / bandit / markdownlint).
  • YAML parse + $import resolution sanity check across all changed config files.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ Existing built-in Python config constants keep the same public names and dict semantics.
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅ Adds/updates recipe loader coverage for schema-backed built-in snippets.
  • Did you update Changelog?: N/A
  • Did you get Claude approval on this PR?: ❌

Additional Information

This PR was previously stacked on #1405, which has since merged to main. The branch has been rebased onto main and no longer depends on any other open PR.

Summary by CodeRabbit

  • New Features

    • Many new quantization numeric configs and PTQ presets added (INT4/INT8/MXFP4/MXFP6/MXFP8/MXINT8/NVFP4), plus Diffusers, KV-cache (affine/cast/rotate) and MLP/MoE-targeted presets.
  • Refactor

    • Presets and shared snippets migrated to schema-backed YAML sources and centralized loading; INT8 percentile calibration avoids mutating shared base configs.
  • Tests

    • Tests now discover packaged config snippets at runtime and validate import/append behaviors.
  • Documentation

    • Presets README and numerous header descriptions updated.
  • Chores

    • Minor typing and script improvements.

Review Change Stack

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 9, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Migrate quantization presets and numerics from inline Python literals to schema-backed YAML files, enhance config loader list/$import handling, add many numerics/unit/preset YAMLs, update examples to load presets safely, and add dynamic test discovery plus import-related tests.

Changes

Quantization Config Schema-Backed Migration

Layer / File(s) Summary
Examples: load diffusers presets as QuantizeConfig dicts
examples/diffusers/quantization/config.py, examples/diffusers/quantization/quantize.py, examples/llm_autodeploy/run_auto_quantize.py
Diffusers presets and example usage now load QuantizeConfig YAML presets via load_config(..., QuantizeConfig).model_dump(exclude_unset=True); quantize runtime avoids mutating global base configs by deep-copying and converting pydantic-like objects before applying INT8 percentile overrides; autodeploy typing refined.
Config loader: Union unwrapping & dict-import append
modelopt/torch/opt/config_loader.py
Unwrap Union/Optional element schemas and allow appending imported dict payloads to lists when element schema unwraps to dict.
Numerics YAML: INT8/INT4/MX/NVFP4 definitions*
modelopt_recipes/configs/numerics/*.yaml
Add new numerics configs (int8, int8_per_channel, int4_per_block, mxfp4/mxfp6/mxfp8/mxint8, nvfp4 variants) as reusable building blocks.
PTQ units library
modelopt_recipes/configs/ptq/units/*.yaml
Add base/default disabling snippets, KV-cache variants (affine/cast/rotate), expert/block-sparse MoE, attention_qkv_fp8, and mamba_moe disabled quantizers for $import composition.
Diffusers presets & example exports
modelopt_recipes/configs/ptq/presets/diffusers/*, examples/diffusers/quantization/config.py
Add Diffusers FP8/INT8/NVFP4/NVFP4+FP8-MHA presets and export them from examples via load_config(..., QuantizeConfig).model_dump(exclude_unset=True); update quantize runtime to deep-copy and safely convert Pydantic-like configs before mutation.
KV-cache partial presets
modelopt_recipes/configs/ptq/presets/kv/*
Add FP8 and NVFP4 KV-cache partial presets (affine/cast/rotate) to be merged into primary presets.
Core model presets: INT8/FP8/INT4
modelopt_recipes/configs/ptq/presets/model/{int8,fp8,int4}*.yaml
Add INT8 (per-channel/per-tensor, smoothquant, weight-only), FP8 (per-channel-per-token, 2D blockwise weight-only), and INT4 (AWQ, blockwise) presets.
MX family presets*
modelopt_recipes/configs/ptq/presets/model/{mxfp4,mxfp6,mxfp8,mxint8}*.yaml
Add MXFP4/MXFP6/MXFP8/MXINT8 presets with weight-only MLP/MoE variants.
NVFP4 core & special presets
modelopt_recipes/configs/ptq/presets/model/nvfp4*.yaml
Add NVFP4 base, AWQ/SVDQuant/local-hessian/MSE sweep, MLP/expert/omlp weight-only variants, Mamba-MoE and FP8 attention specializations.
Composed PTQ recipes & model recipe updates
modelopt_recipes/general/ptq/*.yaml, modelopt_recipes/models/Step3.5-Flash/nvfp4-mlp-only.yaml
Update recipe descriptions and switch Step3.5-Flash to import numerics/presets instead of inline quantizer cfgs.
Core preset migration to YAML
modelopt/torch/quantization/config.py
Replace many inline preset constants with QuantizeConfig instances loaded from preset YAML; add loader helpers to convert schema-backed fragments to legacy dict/list shapes; update preset choices registry.
Examples & quantize runtime changes
examples/diffusers/quantization/quantize.py, examples/llm_autodeploy/run_auto_quantize.py
Deep-copy and safely convert Pydantic-like preset objects before mutation in quantize runtime; annotate SUPPORT_QUANT_FORMAT mapping and minor naming fix.
Tests: dynamic snippet discovery & import tests
tests/unit/recipe/test_loader.py
Discover builtin YAML snippets at runtime using importlib.resources.files; add tests verifying dict-snippet $import behavior into union-typed list fields.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/Model-Optimizer#1405: The main PR’s config/quantization refactors (e.g., modelopt/torch/opt/config_loader.py list-element $import handling and typed QuantizeConfig/QuantizerCfgEntry-driven preset construction used by examples/diffusers/quantization/*) directly overlap with PR #1405’s schema-backed config loading and quantizer entry schematization work.
  • NVIDIA/Model-Optimizer#1253: Both PRs touch the YAML $import resolution machinery in modelopt/torch/opt/config_loader.py—the main PR refines _list_element_schema/_resolve_imports list handling (including dict element appends), directly building on the retrieved PR’s new composable $import loader behavior.

Suggested reviewers

  • meenchen
  • kevalmorabia97
  • realAsma
  • cjluo-nv
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security violations found in Python code changes. No unsafe deserialization, eval/exec, hardcoded credentials, or prohibited patterns. Uses yaml.safe_load. No new non-permissive dependencies.
Title check ✅ Passed The title '[OMNIML-4775] Move built-in PTQ quantization configs to YAML' clearly and specifically describes the main change: migration of hardcoded Python configuration dictionaries to YAML-based configs. It is concise, directly related to the primary refactoring objective.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shengliangx/all-yaml-configs

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-20 15:21 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

❌ Patch coverage is 94.44444% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.88%. Comparing base (a5bc6f8) to head (23dad34).

Files with missing lines Patch % Lines
modelopt/torch/quantization/config.py 94.64% 3 Missing ⚠️
modelopt/torch/opt/config_loader.py 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1423      +/-   ##
==========================================
+ Coverage   76.87%   76.88%   +0.01%     
==========================================
  Files         474      474              
  Lines       51560    51578      +18     
==========================================
+ Hits        39635    39656      +21     
+ Misses      11925    11922       -3     
Flag Coverage Δ
examples 41.80% <91.66%> (+0.96%) ⬆️
gpu 59.77% <91.66%> (-0.58%) ⬇️
regression 15.27% <91.66%> (+0.13%) ⬆️
unit 52.64% <94.44%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shengliangxu shengliangxu force-pushed the shengliangx/all-yaml-configs branch from df8c002 to aeaae95 Compare May 15, 2026 17:41
@shengliangxu shengliangxu changed the base branch from main to shengliangx/schematize-cfg May 16, 2026 00:19
Base automatically changed from shengliangx/schematize-cfg to main May 18, 2026 15:40
@shengliangxu shengliangxu force-pushed the shengliangx/all-yaml-configs branch from 89a3e34 to e24eb16 Compare May 18, 2026 18:01
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
NVFP4 and Mamba-MoE presets re-inlined the same weight+input quantizer
pairs that units/w4a4_nvfp4_nvfp4 and units/w8a8_fp8_fp8 already encode.
Switch the 5 NVFP4 presets (default + awq lite/clip/full + svdquant) and
the 4 mamba_moe presets to $import those units.

Add three new units following the component_numerics convention:
- attention_qkv_fp8 for the FP8 q/k/v bmm + softmax block shared by
  model/ and diffusers/ nvfp4_fp8_mha presets
- block_sparse_moe_nvfp4 for the *block_sparse_moe* W+A pair shared by
  nvfp4_mlp_only, nvfp4_experts_only, nvfp4_omlp_only
- experts_nvfp4 for the *.experts.* W+A pair shared by nvfp4_mlp_only
  and nvfp4_experts_only

Net -118/+121 lines across 18 files; no behavior change (the model
nvfp4_fp8_mha glob *q/*k/*v_bmm is replaced by the bracket form
*[qkv]_bmm, which matches the same names).

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/all-yaml-configs branch from e24eb16 to 56aa639 Compare May 18, 2026 18:03
@shengliangxu shengliangxu marked this pull request as ready for review May 18, 2026 18:04
@shengliangxu shengliangxu requested review from a team as code owners May 18, 2026 18:04
main added W4A16_NVFP4_CFG (weight-only NVFP4 for all linears) after the
YAML-conversion branch was forked, so it landed in choices but had no
YAML preset and no load_config call after rebase. Add the YAML preset
and the load_config call to match the rest of the constants.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
modelopt_recipes/configs/ptq/units/kv_nvfp4_rotate.yaml (1)

30-33: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

v_bmm_quantizer is missing rotate: true in a KV-rotate preset.

Line 30 configures V with $import: nvfp4 only, so this preset rotates K but not V despite the file’s KV-rotate intent.

Proposed fix
   - quantizer_name: '*v_bmm_quantizer'
     cfg:
       $import: nvfp4
+      rotate: true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt_recipes/configs/ptq/units/kv_nvfp4_rotate.yaml` around lines 30 -
33, The V quantizer entry using quantizer_name '*v_bmm_quantizer' in the
KV-rotate preset is missing rotate: true; update the cfg for that quantizer (the
block with $import: nvfp4) to include rotate: true so V is rotated just like K
(i.e., modify the cfg under '*v_bmm_quantizer' to add rotate: true alongside the
existing $import).
modelopt/torch/opt/config_loader.py (1)

513-529: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Support list[T] snippet splicing for T | list[T] fields.

This still rejects bare $import of a list[T] snippet when the containing field is annotated as T | list[T] (for example QuantizerCfgEntry.cfg). imported.schema_type is compared against the whole union, so only T snippets append; list[T] snippets never splice even when the runtime value is already in list mode.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/opt/config_loader.py` around lines 513 - 529, The current
splice logic rejects an imported list snippet when the target field type is a
union like T | list[T] because imported.schema_type is compared only to the full
element_schema; update the check around _schema_equal(imported.schema_type,
list_schema) and the element handling in config_loader.py to also consider when
element_schema is a Union that contains a list variant: unwrap element_schema
via _unwrap_schema_type (and handle get_origin == typing.Union) and if any union
member equals a list type that matches imported.schema_type then treat the
import as a list and return list(imported.data); use existing helpers
_schema_equal and _unwrap_schema_type and the symbols imported.schema_type,
element_schema, element_schema_unwrapped, and list_schema to locate and modify
the branch so both plain T and list[T] union members splice correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/diffusers/quantization/quantize.py`:
- Around line 118-125: The code currently sets apply_int8_percentile_calibrator
= True for all INT8 configs regardless of algorithm, which enables the
percentile calibrator for SmoothQuant; update the logic so
apply_int8_percentile_calibrator is set to True only when self.config.format ==
QuantFormat.INT8 AND self.config.algo != QuantAlgo.SMOOTHQUANT AND
self.config.collect_method != CollectMethod.DEFAULT. Modify the block around
apply_int8_percentile_calibrator (referencing apply_int8_percentile_calibrator,
self.config.format, QuantFormat.INT8, self.config.algo, QuantAlgo.SMOOTHQUANT,
and self.config.collect_method) to include that extra algo != SMOOTHQUANT guard
when assigning the flag.

In `@modelopt_recipes/configs/ptq/presets/model/nvfp4_experts_only.yaml`:
- Around line 27-37: The file defines explicit quantizer patterns
'*mlp.experts*weight_quantizer' and '*mlp.experts*input_quantizer' which
duplicate patterns already provided by the imported unit experts_nvfp4 (which
defines '*.experts.*weight_quantizer' and '*.experts.*input_quantizer'); remove
the two explicit quantizer entries (the blocks configuring quantizer_name:
'*mlp.experts*weight_quantizer' and quantizer_name:
'*mlp.experts*input_quantizer') so the configuration relies solely on the
experts_nvfp4 import to avoid duplicate/overlapping quantizer definitions.

In `@modelopt_recipes/configs/ptq/presets/model/nvfp4_mlp_only.yaml`:
- Around line 27-37: quant_cfg currently contains overlapping patterns: the
explicit pattern '*mlp*weight_quantizer' and the imported unit experts_nvfp4
(which defines '*.experts.*weight_quantizer') both match expert-layer
quantizers, causing duplicate entries; fix by making the mlp-specific patterns
more precise (e.g., change '*mlp*weight_quantizer' and '*mlp*input_quantizer' to
only target non-expert mlp paths such as '*mlp.weight_quantizer' or
'*mlp.experts*' as appropriate) or remove/disable the conflicting entries so
that experts_nvfp4 exclusively configures '*.experts.*weight_quantizer' and
'*.experts.*input_quantizer' while quant_cfg retains only the non-expert mlp
patterns.

In `@modelopt_recipes/configs/ptq/presets/model/w4a8_mxfp4_fp8.yaml`:
- Line 25: The YAML key "algorithm" is left null in this W4A8 preset; update the
"algorithm" field to an explicit algorithm name matching sibling W4A8 presets
(e.g., the same string used in other W4A8 preset files) so the preset behavior
is unambiguous—edit the "algorithm" entry in this preset to the explicit
algorithm identifier used by its peers.

In `@modelopt_recipes/configs/ptq/presets/README.md`:
- Around line 4-5: Change the README wording that refers to `*_CFG` as “dicts”
to state they are schema-backed QuantizeConfig objects (with mapping-style
access); update the sentence mentioning `*_CFG` and
`modelopt.torch.quantization.config` (e.g., `FP8_DEFAULT_CFG`) to say these are
QuantizeConfig instances rather than plain dicts and note that they support
mapping-style access for backwards-compatible lookup.

In `@modelopt/torch/quantization/config.py`:
- Around line 1320-1379: KV preset YAMLs intentionally omit the "algorithm"
field but loading them through load_config(..., schema_type=QuantizeConfig)
populates a default "max" value and can clobber the base config when merged;
update the FP8_KV_CFG, FP8_AFFINE_KV_CFG, NVFP4_AFFINE_KV_CFG, NVFP4_KV_CFG,
NVFP4_KV_ROTATE_CFG (and any other *_KV_CFG) to be loaded without applying
QuantizeConfig defaults (e.g., call load_config with a plain
dict/schema_type=dict or with an option that disables default-filling) so the
loaded config preserves a missing "algorithm" key for proper merge behavior.

In `@tests/unit/recipe/test_loader.py`:
- Around line 1461-1464: The failing tests call an undefined helper _cfg_to_dict
when asserting the quant_cfg structure; add a small helper function in
tests/unit/recipe/test_loader.py (or inline the conversion) that serializes the
cfg object into a plain dict (mirror what the original assertions expect) and
use it for the two assertions that reference _cfg_to_dict (the ones asserting
_cfg_to_dict(data["quant_cfg"][0]["cfg"]) and the similar assertion at lines
~1486-1489); implement the helper to accept the cfg object and return a
dict/list representation of fields like "num_bits" and "block_sizes" so the
assertions can validate the union-typed list import path without a NameError.

---

Outside diff comments:
In `@modelopt_recipes/configs/ptq/units/kv_nvfp4_rotate.yaml`:
- Around line 30-33: The V quantizer entry using quantizer_name
'*v_bmm_quantizer' in the KV-rotate preset is missing rotate: true; update the
cfg for that quantizer (the block with $import: nvfp4) to include rotate: true
so V is rotated just like K (i.e., modify the cfg under '*v_bmm_quantizer' to
add rotate: true alongside the existing $import).

In `@modelopt/torch/opt/config_loader.py`:
- Around line 513-529: The current splice logic rejects an imported list snippet
when the target field type is a union like T | list[T] because
imported.schema_type is compared only to the full element_schema; update the
check around _schema_equal(imported.schema_type, list_schema) and the element
handling in config_loader.py to also consider when element_schema is a Union
that contains a list variant: unwrap element_schema via _unwrap_schema_type (and
handle get_origin == typing.Union) and if any union member equals a list type
that matches imported.schema_type then treat the import as a list and return
list(imported.data); use existing helpers _schema_equal and _unwrap_schema_type
and the symbols imported.schema_type, element_schema, element_schema_unwrapped,
and list_schema to locate and modify the branch so both plain T and list[T]
union members splice correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48fef37b-6fbf-4823-9076-7b400163a3bf

📥 Commits

Reviewing files that changed from the base of the PR and between f5650bd and 56aa639.

📒 Files selected for processing (90)
  • examples/diffusers/quantization/config.py
  • examples/diffusers/quantization/quantize.py
  • examples/llm_autodeploy/run_auto_quantize.py
  • modelopt/torch/opt/config_loader.py
  • modelopt/torch/quantization/config.py
  • modelopt_recipes/configs/numerics/fp8.yaml
  • modelopt_recipes/configs/numerics/int4_per_block.yaml
  • modelopt_recipes/configs/numerics/int8.yaml
  • modelopt_recipes/configs/numerics/int8_per_channel.yaml
  • modelopt_recipes/configs/numerics/mxfp4.yaml
  • modelopt_recipes/configs/numerics/mxfp6.yaml
  • modelopt_recipes/configs/numerics/mxfp8.yaml
  • modelopt_recipes/configs/numerics/mxint8.yaml
  • modelopt_recipes/configs/numerics/nvfp4.yaml
  • modelopt_recipes/configs/numerics/nvfp4_bs32.yaml
  • modelopt_recipes/configs/numerics/nvfp4_static.yaml
  • modelopt_recipes/configs/ptq/presets/README.md
  • modelopt_recipes/configs/ptq/presets/diffusers/fp8.yaml
  • modelopt_recipes/configs/ptq/presets/diffusers/int8.yaml
  • modelopt_recipes/configs/ptq/presets/diffusers/nvfp4.yaml
  • modelopt_recipes/configs/ptq/presets/diffusers/nvfp4_fp8_mha.yaml
  • modelopt_recipes/configs/ptq/presets/kv/fp8.yaml
  • modelopt_recipes/configs/ptq/presets/kv/fp8_affine.yaml
  • modelopt_recipes/configs/ptq/presets/kv/nvfp4.yaml
  • modelopt_recipes/configs/ptq/presets/kv/nvfp4_affine.yaml
  • modelopt_recipes/configs/ptq/presets/kv/nvfp4_rotate.yaml
  • modelopt_recipes/configs/ptq/presets/model/fp8.yaml
  • modelopt_recipes/configs/ptq/presets/model/fp8_2d_blockwise_weight_only.yaml
  • modelopt_recipes/configs/ptq/presets/model/fp8_per_channel_per_token.yaml
  • modelopt_recipes/configs/ptq/presets/model/int4_awq.yaml
  • modelopt_recipes/configs/ptq/presets/model/int4_blockwise_weight_only.yaml
  • modelopt_recipes/configs/ptq/presets/model/int8.yaml
  • modelopt_recipes/configs/ptq/presets/model/int8_smoothquant.yaml
  • modelopt_recipes/configs/ptq/presets/model/int8_weight_only.yaml
  • modelopt_recipes/configs/ptq/presets/model/mamba_moe_fp8_aggressive.yaml
  • modelopt_recipes/configs/ptq/presets/model/mamba_moe_fp8_conservative.yaml
  • modelopt_recipes/configs/ptq/presets/model/mamba_moe_nvfp4_aggressive.yaml
  • modelopt_recipes/configs/ptq/presets/model/mamba_moe_nvfp4_conservative.yaml
  • modelopt_recipes/configs/ptq/presets/model/mxfp4.yaml
  • modelopt_recipes/configs/ptq/presets/model/mxfp4_mlp_weight_only.yaml
  • modelopt_recipes/configs/ptq/presets/model/mxfp6.yaml
  • modelopt_recipes/configs/ptq/presets/model/mxfp8.yaml
  • modelopt_recipes/configs/ptq/presets/model/mxint8.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_awq_clip.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_awq_full.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_awq_lite.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_experts_only.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_fp8_mha.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_mlp_only.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_mlp_weight_only.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_omlp_only.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_svdquant.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_w4a4_weight_local_hessian.yaml
  • modelopt_recipes/configs/ptq/presets/model/nvfp4_w4a4_weight_mse_fp8_sweep.yaml
  • modelopt_recipes/configs/ptq/presets/model/w4a16_nvfp4.yaml
  • modelopt_recipes/configs/ptq/presets/model/w4a8_awq_beta.yaml
  • modelopt_recipes/configs/ptq/presets/model/w4a8_mxfp4_fp8.yaml
  • modelopt_recipes/configs/ptq/presets/model/w4a8_nvfp4_fp8.yaml
  • modelopt_recipes/configs/ptq/units/README.md
  • modelopt_recipes/configs/ptq/units/attention_qkv_fp8.yaml
  • modelopt_recipes/configs/ptq/units/base_disable_all.yaml
  • modelopt_recipes/configs/ptq/units/block_sparse_moe_nvfp4.yaml
  • modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml
  • modelopt_recipes/configs/ptq/units/experts_nvfp4.yaml
  • modelopt_recipes/configs/ptq/units/kv_fp8.yaml
  • modelopt_recipes/configs/ptq/units/kv_fp8_affine.yaml
  • modelopt_recipes/configs/ptq/units/kv_fp8_cast.yaml
  • modelopt_recipes/configs/ptq/units/kv_nvfp4.yaml
  • modelopt_recipes/configs/ptq/units/kv_nvfp4_affine.yaml
  • modelopt_recipes/configs/ptq/units/kv_nvfp4_cast.yaml
  • modelopt_recipes/configs/ptq/units/kv_nvfp4_rotate.yaml
  • modelopt_recipes/configs/ptq/units/mamba_moe_disabled_quantizers.yaml
  • modelopt_recipes/configs/ptq/units/w4a4_nvfp4_nvfp4.yaml
  • modelopt_recipes/configs/ptq/units/w8a8_fp8_fp8.yaml
  • modelopt_recipes/general/ptq/fp8_default-kv_fp8.yaml
  • modelopt_recipes/general/ptq/fp8_default-kv_fp8_cast.yaml
  • modelopt_recipes/general/ptq/nvfp4_default-kv_fp8.yaml
  • modelopt_recipes/general/ptq/nvfp4_default-kv_fp8_cast.yaml
  • modelopt_recipes/general/ptq/nvfp4_default-kv_none-gptq.yaml
  • modelopt_recipes/general/ptq/nvfp4_default-kv_nvfp4_cast.yaml
  • modelopt_recipes/general/ptq/nvfp4_experts_only-kv_fp8.yaml
  • modelopt_recipes/general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yaml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-kv_fp8.yaml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only_mse-kv_fp8_cast.yaml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-kv_fp8.yaml
  • modelopt_recipes/general/speculative_decoding/dflash.yaml
  • modelopt_recipes/general/speculative_decoding/eagle3.yaml
  • modelopt_recipes/models/Step3.5-Flash/nvfp4-mlp-only.yaml
  • tests/unit/recipe/test_loader.py

Comment thread examples/diffusers/quantization/quantize.py
Comment thread modelopt_recipes/configs/ptq/presets/model/nvfp4_experts_only.yaml
Comment thread modelopt_recipes/configs/ptq/presets/model/nvfp4_mlp_only.yaml
Comment thread modelopt_recipes/configs/ptq/presets/model/w4a8_mxfp4_fp8.yaml
Comment thread modelopt_recipes/configs/ptq/presets/README.md Outdated
Comment thread modelopt/torch/quantization/config.py
Comment thread tests/unit/recipe/test_loader.py
@shengliangxu shengliangxu requested a review from meenchen May 18, 2026 18:23
shengliangxu and others added 3 commits May 18, 2026 13:59
The "update descriptions" commit (d834078) accidentally deleted the
`metadata.recipe_type` and `metadata.description` keys from
general/speculative_decoding/eagle3.yaml and dflash.yaml while
shortening their header comments. The recipe loader requires
`metadata.recipe_type`, so `test_load_recipe_eagle_builtin` and
`test_load_recipe_dflash_builtin` failed with
"Recipe file ... must contain a 'metadata.recipe_type' field."

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Restore backward compatibility for callers that read the legacy
hard-coded *_CFG values as plain dicts (e.g. hf_ptq.py's
_set_kv_cache_constant_amax does `entry["cfg"]` and expects a dict, not
a QuantizerAttributeConfig pydantic instance).

- Add `_load_quantize_config_dict(path)` helper that wraps
  `load_config(..., schema_type=QuantizeConfig).model_dump(exclude_unset=True)`
  and switch all 38 *_CFG module-level constants to use it; their
  annotations are now `dict[str, Any]` instead of `QuantizeConfig`.
- Simplify `_load_quantizer_cfg_dict_list` from a 17-line two-branch
  function plus a `_quantizer_cfg_entry_to_dict` helper down to a
  4-line list/single normalization. The three call sites
  (`base_disable_all`, `default_disabled_quantizers`,
  `mamba_moe_disabled_quantizers`) all load schema-typed YAMLs so the
  Mapping/`isinstance(QuantizerCfgEntry)` fallbacks were defensive-only.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
modelopt/torch/quantization/config.py (1)

1216-1220: ⚡ Quick win

Restore explicit type handling in the quantizer-snippet loader.

This helper now assumes every load_config() result is a QuantizerCfgEntry. If one of these snippets ever comes back as a plain mapping, module import will fail with 'dict' object has no attribute model_dump' instead of a clear schema error. Please mirror the defensive shape check used in _load_quantizer_attribute_dict here too.

Suggested hardening
 def _load_quantizer_cfg_dict_list(config_path: str) -> list[dict[str, Any]]:
     """Load a QuantizerCfgEntry or QuantizerCfgListConfig snippet as public dict entries."""
     config = load_config(config_path)
     entries = config if isinstance(config, list) else [config]
-    return [e.model_dump(exclude_unset=True) for e in entries]
+    result: list[dict[str, Any]] = []
+    for entry in entries:
+        if isinstance(entry, QuantizerCfgEntry):
+            result.append(entry.model_dump(exclude_unset=True))
+        elif isinstance(entry, Mapping):
+            result.append(dict(entry))
+        else:
+            raise TypeError(
+                f"{config_path} must declare QuantizerCfgEntry or a list of QuantizerCfgEntry."
+            )
+    return result
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/config.py` around lines 1216 - 1220, In
_load_quantizer_cfg_dict_list, restore defensive type handling like
_load_quantizer_attribute_dict: after calling load_config(config_path) and
normalizing entries to a list, iterate entries and for each item return item
directly if it's a plain dict, call item.model_dump(exclude_unset=True) if it is
a model-like object (e.g., QuantizerCfgEntry / QuantizerCfgListConfig), and
otherwise raise a clear TypeError indicating the snippet at config_path had an
unexpected shape; reference the helper name (_load_quantizer_cfg_dict_list), the
load_config result, and the expected QuantizerCfgEntry type in the error message
so imports fail with a schema error instead of an attribute error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1216-1220: In _load_quantizer_cfg_dict_list, restore defensive
type handling like _load_quantizer_attribute_dict: after calling
load_config(config_path) and normalizing entries to a list, iterate entries and
for each item return item directly if it's a plain dict, call
item.model_dump(exclude_unset=True) if it is a model-like object (e.g.,
QuantizerCfgEntry / QuantizerCfgListConfig), and otherwise raise a clear
TypeError indicating the snippet at config_path had an unexpected shape;
reference the helper name (_load_quantizer_cfg_dict_list), the load_config
result, and the expected QuantizerCfgEntry type in the error message so imports
fail with a schema error instead of an attribute error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 86f18af0-3b4d-42b0-8e9c-91fbae29c5af

📥 Commits

Reviewing files that changed from the base of the PR and between 7463504 and 1e0071b.

📒 Files selected for processing (1)
  • modelopt/torch/quantization/config.py

@shengliangxu shengliangxu changed the title Move built-in PTQ quantization configs to YAML [OMNIML-4775] Move built-in PTQ quantization configs to YAML May 18, 2026
The prior commit retyped the *_CFG module-level constants from
QuantizeConfig to dict[str, Any], but SUPPORT_QUANT_FORMAT in
examples/llm_autodeploy/run_auto_quantize.py was still annotated as
dict[str, QuantizeConfig], which mypy flagged. Switch to
dict[str, dict[str, Any]] and drop the now-unused QuantizeConfig import.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu requested review from h-guo18 and sychen52 May 18, 2026 23:18
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Large refactor (90 files / 2776 LOC) that extracts hard-coded *_CFG dicts in modelopt/torch/quantization/config.py into YAML presets composed via the existing $import system. Design check passes — this is a continuation of the already-merged $import infrastructure (PR #1253), not a new abstraction, so the design-review protocol is satisfied. The refactor is mostly mechanical and the YAML files visibly correspond to the previous Python dicts.

Three things a human owner should look at:

  1. Behavioral change in examples/diffusers/quantization/quantize.py — the new apply_int8_percentile_calibrator flag explicitly excludes QuantAlgo.SMOOTHQUANT, but the original code applied reset_set_int8_config whenever collect_method != DEFAULT, including with smoothquant (and the reset_set_int8_config docstring even references INT8_SMOOTHQUANT_CFG). This drops the smoothquant + non-default collect-method codepath for Conv2d input quantizers. It may be an intentional fix for a latent bug (smoothquant + percentile-Conv2d combo + global mutation), but it is not called out in the PR description. Worth confirming intent.

  2. pytest was not run locally (per the PR's own Testing section). For a refactor of this scope touching modelopt.torch.quantization.config (an extensively used public surface), CI / a local run of the quantization unit tests should be confirmed green before merge — the existing *_CFG constants are imported widely and a single mismatched YAML field would only surface there.

  3. Subtle field additions from numerics snippets: e.g. the new attention_qkv_fp8 unit imports numerics/fp8 whose YAML sets axis: null, so *[qkv]_bmm_quantizer.cfg now contains axis: null where the old hardcoded NVFP4_FP8_MHA_CONFIG had {"num_bits": (4, 3)} only. Same pattern in fp8_per_channel_per_token, etc. These should be no-ops semantically (axis defaults to None), but are an observable surface change for any consumer that does "axis" in cfg style probes — worth a quick confirmation.

Other notes: the _list_element_schema extension to handle Union types in config_loader.py and the new _resolve_list_import dict-into-list-element path are needed for w4a8_awq_beta.yaml's list-of-dicts cfg, and the new tests (test_import_dict_snippet_imports_in_union_typed_list_field, test_import_dict_snippet_in_union_typed_list_field_with_inline_item) cover them. The auto-discovery of built-in snippets in test_loader.py is a nice scalability improvement over the hard-coded list.

No licensing changes beyond the standard NVIDIA header on new files.

@shengliangxu
Copy link
Copy Markdown
Collaborator Author

@cjluo-nv thanks for the review. Addressing the three points:

  1. SmoothQuant gate in examples/diffusers/quantization/quantize.py — intentional. The --percentile / --collect-method CLI flags advertise "works for INT8, not including smoothquant" (quantize.py:471,478), but the prior code applied the calibrator to SmoothQuant too. The refactor surfaced the inconsistency; the adopted CodeRabbit suggestion now aligns runtime behavior with the documented CLI contract. Called out explicitly in the PR description under a new "Note on behavioral change" section.

  2. pytest not run locally — outdated; the PR description has been updated. nox -s "unit-3.10(torch_211, tf_latest)" was run on this branch: 2325 passed, 12 skipped, 4 failed. Two failures (test_load_recipe_eagle_builtin, test_load_recipe_dflash_builtin) were caused by metadata blocks accidentally deleted from two speculative-decoding recipe YAMLs and are fixed in commit 7463504724. The remaining two (test_eagle_model_convert_save_and_restore[eagle_config0], test_forward_returns_loss) fail with the same aarch64 torch.compile C++ codegen error on main HEAD — independent of this PR. nox -s pre_commit_all also passes (ruff / mypy / YAML format / license / bandit).

  3. axis: null appearing in NVFP4_FP8_MHA bmm cfgs — semantic no-op, no code change. The bmm/softmax entries now compose through numerics/fp8.yaml which sets axis: null (matching the convention used by FP8_DEFAULT_CFGs *input_quantizer on main, where {"num_bits": (4, 3), "axis": None} was already inline). axis=None is the QuantizerAttributeConfig default = per-tensor quantization, so behavior is unchanged. Main was already inconsistent across cfgs on whether the key was explicit; the YAML composition normalizes that.

(Note: the *_CFG constants are now dict[str, Any] rather than QuantizeConfig pydantic instances — switched in commit 51c8e5f7e9 to preserve backwards-compat with consumers that do mapping-style mutation like entry["cfg"] assignment in examples/llm_ptq/hf_ptq.py.)

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review of large YAML refactor (90 files, +2055/-721) extracting hardcoded *_CFG dicts in modelopt/torch/quantization/config.py into YAML presets composed via the existing $import system.

Design review: Passes — this is a continuation of the already-merged $import infrastructure (PR #1253), not a new abstraction.

Previous review comments — classification & status:

  • Critical comments addressed: _cfg_to_dict helper added in tests; apply_int8_percentile_calibrator correctly gates on algo != SMOOTHQUANT; explicit *mlp.experts* duplicates removed from nvfp4_experts_only.yaml; README wording corrected.
  • Critical comments explicitly justified-not-adopted (rationale checked and reasonable):
    • nvfp4_mlp_only.yaml overlapping *mlp* and *.experts.* patterns: mirrors original Python _nvfp4_selective_quant_cfg(["*mlp*", "*block_sparse_moe*", "*.experts.*"]); idempotent. ✓
    • w4a8_mxfp4_fp8.yaml algorithm: null: matches main behavior for MX-family configs. ✓
    • kv_nvfp4_rotate.yaml V missing rotate: true: verified against original NVFP4_KV_ROTATE_CFG — V intentionally omits rotate on main; YAML matches. ✓
    • KV *_KV_CFG default-filled algorithm: "max": confirmed all consumers extract ["quant_cfg"] only, so the default doesn't reach merge paths. ✓
  • Outside-diff config_loader Union[T, list[T]] splice case: not addressed but only relevant for hypothetical future schemas (current YAMLs work).

Why nudge instead of approve:

  1. Behavioral change in examples/diffusers/quantization/quantize.py: the new apply_int8_percentile_calibrator flag drops the SmoothQuant + non-default collect-method codepath. PR body frames this as a CLI-help-text alignment, but the original reset_set_int8_config docstring explicitly references INT8_SMOOTHQUANT_CFG. Worth a human owner confirming this is intentional.
  2. pytest was not run locally per the PR's own Testing section. For a refactor of this scope touching the widely-imported *_CFG constants, full CI green should be confirmed before merge.
  3. Subtle observable surface changes: numerics snippets like numerics/fp8 carry axis: null, so dumped configs now contain "axis": None keys where the originals had only {"num_bits": (4, 3)}. No-op semantically but breaks "axis" in cfg probes if any downstream consumer does that.
  4. PR size (90 files / 2776 LOC) — though cohesive and not reasonably splittable for this kind of mechanical migration.

No licensing concerns: only standard NVIDIA headers on new files.

Comment thread modelopt_recipes/configs/ptq/presets/model/nvfp4_mlp_only.yaml
Comment thread modelopt_recipes/general/ptq/fp8_default-kv_fp8.yaml
Comment thread examples/diffusers/quantization/config.py
Walking back the apply_int8_percentile_calibrator SmoothQuant exclusion
adopted in 296934a. The CLI-help-vs-code mismatch (`--percentile` /
`--collect-method` advertise "works for INT8, not including
smoothquant") is a pre-existing latent issue and should be addressed in
a dedicated PR rather than bundled into this otherwise-mechanical YAML
refactor.

Behavior now matches main exactly: the percentile calibrator applies
whenever `collect_method != DEFAULT`, regardless of algo.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review focused on whether further behavior changes have crept in beyond what the prior round flagged. Design protocol: passes — this remains a mechanical migration onto the already-merged $import system (PR #1253), not a new abstraction.

Status of prior critical comments:

  • apply_int8_percentile_calibrator SmoothQuant gate — author intentionally walked it back to keep refactor mechanical (matches main).
  • nvfp4_experts_only.yaml duplicate *mlp.experts* patterns removed.
  • _cfg_to_dict test helper added.
  • ✅ README "dicts" → "QuantizeConfig constants" wording updated.
  • ⚠️ Justified-not-adopted (rationale verified): nvfp4_mlp_only.yaml overlap (mirrors _nvfp4_selective_quant_cfg, idempotent); w4a8_mxfp4_fp8.yaml algorithm: null (matches MXFP-family convention on main); kv_nvfp4_rotate.yaml V missing rotate: true (matches main NVFP4_KV_ROTATE_CFG); *_KV_CFG default-filled algorithm: "max" is unreachable because every consumer extracts ["quant_cfg"] only.
  • 🟡 Open from prior round (minor, not addressed): config_loader.py Union[T, list[T]] splice path only matters for hypothetical future schemas — current YAMLs all work.

New behavior changes worth a human eye:

  1. examples/diffusers/quantization/quantize.py now does base_cfg = copy.deepcopy(base_cfg) plus a model_dump shim before applying reset_set_int8_config. On main, reset_set_int8_config was silently mutating the module-level mtq.INT8_SMOOTHQUANT_CFG / local INT8_DEFAULT_CONFIG globals across calls — the new code finally honors the existing # Build a fresh config dict so we never mutate the global constants comment. This is a latent-bug fix, not a regression, but it is a real behavioral delta that isn't called out in the PR body.
  2. choices set in modelopt/torch/quantization/config.py now includes MXFP6_DEFAULT_CFG and NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG. Both constants existed on main but were missing from choices; this widens the set of valid --qformat-style strings accepted by CLIs that gate on mtq.config.choices. Likely intentional cleanup but worth confirming.
  3. _load_quantizer_cfg_dict_list was simplified: it no longer has the dict/non-dict defensive branch that _load_quantizer_attribute_dict keeps. Any future snippet whose root validates as a plain dict (rather than QuantizerCfgEntry / list[QuantizerCfgEntry]) will fail with AttributeError: 'dict' object has no attribute 'model_dump' instead of a clear TypeError. Today's snippets are all schema-tagged so it doesn't bite, but the asymmetry with the sibling helper is worth a sanity-check call from the owner.

Other concerns from prior round still standing:

  • PR is 90 files / +2052/-721 — cohesive, but pytest was not run locally per the PR body's own Testing section. For a refactor of mtq.*_CFG (extensively imported), full CI green should be confirmed pre-merge.
  • The numerics/fp8.yaml etc. continue to emit axis: null into dumped configs, matching main's explicit "axis": None. No-op semantically but observable for any consumer that does "axis" in cfg probes.

Licensing: only standard NVIDIA headers on new files — no concern.

Recommend an owner with architectural context confirm CI green and sign off on the (latent-bug-fix) deepcopy semantics in the diffusers example and the choices set additions.

@shengliangxu
Copy link
Copy Markdown
Collaborator Author

@cjluo-nv thanks for the re-review.

#1 deepcopy in examples/diffusers/quantization/quantize.py — kept. The existing # Build a fresh config dict so we never mutate the global constants comment was aspirational only on main; reset_set_int8_config accumulated PercentileCalibrator entries into mtq.INT8_SMOOTHQUANT_CFG/INT8_DEFAULT_CONFIG across repeated calls, and set_quant_config_attr added trt_high_precision_dtype keys into globally-shared cfg dicts. The deepcopy makes the code match the comment. Removed the (now dead) if hasattr(base_cfg, "model_dump"): base_cfg = base_cfg.model_dump(...) shim from the same block — the *_CFG constants have been plain dicts since 51c8e5f7e9. Called out in the PR description under a new "Latent-bug fixes surfaced by the refactor" section.

#2 choices additions (MXFP6_DEFAULT_CFG, NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG) — kept. Both constants exist on main but were missing from choices, so CLIs that gate on mtq.config.choices (e.g., hf_ptq.py --qformat) couldn't reach them even though the configs themselves were fully supported. Also called out in the PR description.

#3 _load_quantizer_cfg_dict_list / _load_quantizer_attribute_dict asymmetry — intentional. The cfg-list helper was simplified during this PR after confirming all three call sites (base_disable_all, default_disabled_quantizers, mamba_moe_disabled_quantizers) load schema-tagged YAMLs that resolve to QuantizerCfgEntry / list[QuantizerCfgEntry] — the prior isinstance(_, Mapping) fallback branches were dead. The sibling _load_quantizer_attribute_dict keeps its defensive branch for now since its risk surface is the same and the cost of leaving it is one extra branch; not worth bundling another cleanup into this PR.

The `if hasattr(base_cfg, "model_dump"): base_cfg = base_cfg.model_dump(
exclude_unset=True)` block was added when the `*_CFG` constants were
QuantizeConfig pydantic instances. Commit 51c8e5f switched them
back to plain dicts, so hasattr is always False and the line is dead.

Keep the explicit `base_cfg = copy.deepcopy(base_cfg)` — it honors the
existing "Build a fresh config dict so we never mutate the global
constants" comment, which on main was aspirational only.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review of mechanical YAML refactor (90 files, +2050/-721) extracting hardcoded *_CFG dicts into YAML presets composed via the already-merged $import system. Design protocol passes — continuation of PR #1253 infrastructure, not a new abstraction.

Status of prior critical comments:

  • _cfg_to_dict helper added in tests/unit/recipe/test_loader.py:91-94
  • nvfp4_experts_only.yaml — explicit *mlp.experts* duplicates removed
  • ✅ README wording corrected ("dicts" → "QuantizeConfig constants")
  • ✅ Diffusers quantize.pycopy.deepcopy(base_cfg) honors the existing comment; latent-bug fix called out in PR body
  • apply_int8_percentile_calibrator SmoothQuant gate — author intentionally reverted to keep refactor mechanical (matches main)
  • ✅ Justified-not-adopted (rationale verified): nvfp4_mlp_only.yaml overlap (idempotent, mirrors _nvfp4_selective_quant_cfg); w4a8_mxfp4_fp8.yaml algorithm: null (matches MXFP family); kv_nvfp4_rotate.yaml V missing rotate: true (matches main); *_KV_CFG default-filled algorithm: "max" is unreachable (every consumer extracts ["quant_cfg"] only)
  • 🟡 Open minor: config_loader.py Union[T, list[T]] splice path not handled — only matters for hypothetical future schemas; current YAMLs work

Why nudge over approve:

  1. _load_quantizer_cfg_dict_list was simplified to drop the defensive isinstance(item, Mapping) branch that the sibling _load_quantizer_attribute_dict keeps. Fine today (all snippets are schema-tagged) but a future non-schema snippet would fail with AttributeError: 'dict' has no attribute 'model_dump' rather than a clear TypeError. Minor asymmetry worth a sanity-check.
  2. choices set additions (MXFP6_DEFAULT_CFG, NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG) widen the set of valid --qformat strings — likely intentional cleanup but a real public-API surface change not called out in the changelog/PR body summary.
  3. Per the PR body's own Testing section, pytest was not run locally. For a refactor of the widely-imported mtq.*_CFG constants, full CI green should be confirmed pre-merge.
  4. Standard NVIDIA headers only — no licensing concerns.

Recommend a human owner with architectural context confirm CI is green and sign off on the latent-bug-fix deepcopy semantics in the diffusers example and the choices set additions.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review of mechanical YAML refactor (90 files, +2050/-721) extracting hardcoded *_CFG dicts into YAML presets via the already-merged $import system. Design protocol passes — continuation of PR #1253 infrastructure, not a new abstraction.

Status of prior critical comments:

  • _cfg_to_dict helper added in tests/unit/recipe/test_loader.py:91-94.
  • nvfp4_experts_only.yaml — explicit *mlp.experts* duplicates removed.
  • ✅ README wording corrected ("dicts" → "QuantizeConfig constants").
  • ✅ Diffusers quantize.py copy.deepcopy(base_cfg) honors the existing comment; called out in PR body as latent-bug fix.
  • apply_int8_percentile_calibrator SmoothQuant gate — author intentionally reverted to keep refactor mechanical (matches main).
  • ✅ Justified-not-adopted (rationale verified): nvfp4_mlp_only.yaml overlap (idempotent, mirrors _nvfp4_selective_quant_cfg); w4a8_mxfp4_fp8.yaml algorithm: null (matches MXFP family); kv_nvfp4_rotate.yaml V missing rotate: true (matches main); *_KV_CFG default-filled algorithm: "max" is unreachable (every consumer extracts ["quant_cfg"] only).
  • 🟡 Open minor: config_loader.py Union[T, list[T]] splice path not handled — only matters for hypothetical future schemas; current YAMLs work.

Why nudge over approve:

  1. _load_quantizer_cfg_dict_list was simplified to drop the defensive isinstance(item, Mapping) branch that the sibling _load_quantizer_attribute_dict keeps — fine today (all snippets are schema-tagged) but the asymmetry with the sibling helper is worth a sanity-check.
  2. choices set additions (MXFP6_DEFAULT_CFG, NVFP4_W4A4_WEIGHT_LOCAL_HESSIAN_CFG) widen a public API surface — called out in PR body but worth human confirmation.
  3. Diffusers deepcopy is a real behavioral delta (latent-bug fix); called out in PR body but warrants human sign-off given it affects a shipped example.
  4. PR size (90 files / 2771 LOC) — cohesive but large; per updated PR body, full unit test suite was run locally (2329 passed, 12 skipped); CI green should be confirmed pre-merge.

Standard NVIDIA headers only — no licensing concerns. Recommend a human owner with architectural context confirm CI green and sign off on the latent-bug-fix deepcopy semantics in the diffusers example, the choices set additions, and the _load_quantizer_cfg_dict_list defensive-branch removal.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if no behavior change

@shengliangxu shengliangxu merged commit e4dc020 into main May 20, 2026
49 checks passed
@shengliangxu shengliangxu deleted the shengliangx/all-yaml-configs branch May 20, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants