feat(recipes): add nvfp4_mlp_only-novit-kv_fp8 (exclude VL vision tower) by Edwardf0t1 · Pull Request #1760 · NVIDIA/Model-Optimizer

Edwardf0t1 · 2026-06-17T01:25:46Z

What does this PR do?

Type of change: Bug fix

Adds a new built-in PTQ recipe general/ptq/nvfp4_mlp_only-novit-kv_fp8 that is identical to nvfp4_mlp_only-kv_fp8 but excludes the VL vision tower from quantization.

Root cause (NVBugs 6287461): The bare *mlp* enable globs in nvfp4_mlp_only-kv_fp8 also match VL vision-tower block MLPs (e.g. Kimi-K2.5 vision_tower.encoder.blocks.*.mlp.fc0/fc1). Quantizing the ViT FFNs to NVFP4 is both quality-harmful (degenerate image embeddings) and can break export: Kimi-K2.5's MoonViT vt_intermediate_size=4304 is not divisible by the NVFP4 packing constraint (2 × group_size = 32, since 4-bit values pack 2-per-byte). 4304 = 16 × 269 is divisible by 16 but not 32, so the compressed-tensors export raises ValueError: tensor column shape must be divisible by the given group_size 32 but got 4304. All language-model dims (2048 / 7168 / 18432) are divisible by 32 and quantize fine.

The new recipe appends *visual* / *vision_tower* disable rules (after the *mlp* enables, so the disable wins), mirroring the existing nvfp4_mlp_only_mse-kv_fp8_cast-novit recipe and NVIDIA's reference nvidia/Kimi-K2.5-NVFP4 checkpoint (which excludes the vision tower, multimodal projector, attention, and lm_head).

Usage

python hf_ptq.py --model /local/Kimi-K2.5 \
  --recipe general/ptq/nvfp4_mlp_only-novit-kv_fp8 \
  --batch_size 1 --calib_size 32 \
  --export_path /local/Kimi-K2.5-nvfp4_mlp_only-novit-kv_fp8 --trust_remote_code

Testing

Registered in the tests/unit/recipe/test_loader.py builtin smoke list (test_load_recipe_all_builtins).
Added a focused regression test (test_nvfp4_mlp_only_novit_recipe_disables_vision_quantizers) asserting the *visual* / *vision_tower* quantizers are disabled.
All pre-commit hooks pass, including the validate modelopt recipes hook.

Before your PR is "Ready for review"

Is this change backward compatible?: ✅ (additive — new recipe file only)
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅ (added to builtin recipe smoke test + vision-disable regression test)
Did you update Changelog?: N/A (new built-in recipe, no API change)
Did you get Claude approval on this PR?: ❌ (pending)

Additional Information

Fixes NVBugs 6287461 (Kimi-K2.5 nvfp4_mlp_only-kv_fp8 quant failure). Related Jira: OMNIML-5005.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a new built-in PTQ recipe for NVFP4 MLP/MoE quantization with FP8 KV-cache support.
- The recipe keeps vision-related components excluded from quantization.
Documentation
- Updated the shipped recipes list to include the new PTQ recipe.
Tests
- Extended built-in PTQ smoke coverage to load the new recipe.
- Added a unit test to verify vision quantizers are explicitly disabled for this recipe.

coderabbitai · 2026-06-17T01:26:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2afff1c0-3167-4e7f-9034-8b46dcc1bb2b

📥 Commits

Reviewing files that changed from the base of the PR and between f5cd4cd and 7fd792c.

📒 Files selected for processing (3)

modelopt_recipes/general/ptq/nvfp4_mlp_only-novit-kv_fp8.yaml
modelopt_recipes/ptq.md
tests/unit/recipe/test_loader.py

✅ Files skipped from review due to trivial changes (1)

modelopt_recipes/ptq.md

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt_recipes/general/ptq/nvfp4_mlp_only-novit-kv_fp8.yaml
tests/unit/recipe/test_loader.py

📝 Walkthrough

Walkthrough

Adds a new PTQ recipe for NVFP4 MLP/MoE quantization with FP8 KV-cache, excludes vision tower quantizers, updates the shipped recipe catalog, registers the recipe in built-in loader coverage, and adds a unit test for the disabled vision quantizers.

Changes

nvfp4_mlp_only-kv_fp8-novit PTQ recipe addition

Layer / File(s)	Summary
Recipe definition and coverage `modelopt_recipes/general/ptq/nvfp4_mlp_only-novit-kv_fp8.yaml`, `modelopt_recipes/ptq.md`, `tests/unit/recipe/test_loader.py`	Adds the PTQ recipe YAML, publishes it in the shipped recipes table, appends it to the built-in PTQ recipe list, and verifies that `visual` and `vision_tower` quantizers are disabled.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

NVIDIA/Model-Optimizer#1690: Also adjusts PTQ recipe quantizer wildcard disable rules to keep vision modules out of quantization.
NVIDIA/Model-Optimizer#1826: Also changes PTQ quantization config to disable vision-related quantizer patterns.

Suggested labels

cherry-pick-done

Suggested reviewers

realAsma
shengliangxu
kevalmorabia97

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the new PTQ recipe and its key behavior of excluding the VL vision tower.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR adds only docs, a YAML recipe, and tests; no banned patterns like hardcoded trust_remote_code, eval/exec, nosec, or unsafe load flags appear in changed code.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/kimi-k25-novit-recipe

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/unit/recipe/test_loader.py (1)
167-167: ⚡ Quick win

Add a focused regression assertion for the new -novit recipe behavior.

This addition only smoke-tests loadability. Since this recipe’s purpose is to keep vision quantizers disabled, add a direct assertion (similar to test_nvfp4_weight_only_recipe_disables_vllm_marlin_incompatible_projections) to lock that behavior.
Suggested test addition
+def test_nvfp4_mlp_only_kv_fp8_novit_disables_vision_quantizers():
+    recipe = load_recipe("general/ptq/nvfp4_mlp_only-kv_fp8-novit")
+    disabled_quantizers = {
+        entry["quantizer_name"]
+        for entry in recipe.quantize.model_dump()["quant_cfg"]
+        if entry.get("enable") is False
+    }
+    assert {"*visual*", "*vision_tower*"} <= disabled_quantizers
As per coding guidelines, checked-in tests should be lean but explicitly protect against regressions in expected behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/recipe/test_loader.py` at line 167, The current addition only
verifies that the nvfp4_mlp_only-kv_fp8-novit recipe can be loaded, but does not
assert that the recipe actually disables vision quantizers as intended. Create a
new test function (similar to the pattern used in
test_nvfp4_weight_only_recipe_disables_vllm_marlin_incompatible_projections)
that loads the nvfp4_mlp_only-kv_fp8-novit recipe and explicitly asserts that
vision quantizers are disabled, ensuring the regression behavior is locked in
place.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt_recipes/ptq.md`:
- Line 42: The markdown file contains a section header that references "All 18"
shipped recipes, but after adding the new row for nvfp4_mlp_only-kv_fp8-novit to
the table, the total count is now 19 recipes. Update the section header text to
reflect the correct count of 19 recipes instead of 18 to keep the documentation
accurate and avoid confusion.

---

Nitpick comments:
In `@tests/unit/recipe/test_loader.py`:
- Line 167: The current addition only verifies that the
nvfp4_mlp_only-kv_fp8-novit recipe can be loaded, but does not assert that the
recipe actually disables vision quantizers as intended. Create a new test
function (similar to the pattern used in
test_nvfp4_weight_only_recipe_disables_vllm_marlin_incompatible_projections)
that loads the nvfp4_mlp_only-kv_fp8-novit recipe and explicitly asserts that
vision quantizers are disabled, ensuring the regression behavior is locked in
place.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9cc5aaa6-6f9a-400d-a888-8607b90fa485

📥 Commits

Reviewing files that changed from the base of the PR and between 6c32c37 and 2c8b3a4.

📒 Files selected for processing (3)

modelopt_recipes/general/ptq/nvfp4_mlp_only-kv_fp8-novit.yaml
modelopt_recipes/ptq.md
tests/unit/recipe/test_loader.py

codecov · 2026-06-17T01:34:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.36%. Comparing base (5177447) to head (7fd792c).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1760   +/-   ##
=======================================
  Coverage   77.36%   77.36%           
=======================================
  Files         513      513           
  Lines       56894    56894           
=======================================
  Hits        44016    44016           
  Misses      12878    12878

Flag	Coverage Δ
unit	`54.62% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shengliangxu

Address PR #1760 review comments: - Move 'novit' before 'kv_fp8' so the vision-exclusion lives with the nvfp4_mlp_only base config rather than the KV variant (per @shengliangxu). - Update shipped-recipe count in ptq.md (18 -> 19). - Add a focused regression test asserting the recipe disables the *visual*/*vision_tower* quantizers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

shengliangxu

LGTM

The bare `*mlp*` enable globs in nvfp4_mlp_only-kv_fp8 also match VL vision tower block MLPs (e.g. Kimi-K2.5 vision_tower.encoder.blocks.*.mlp.*). Quantizing the ViT FFNs to NVFP4 is quality-harmful and can also break export: Kimi-K2.5's MoonViT vt_intermediate_size=4304 is not divisible by the NVFP4 packing constraint (2 x group_size = 32), so compressed-tensors export raises 'tensor column shape must be divisible by the given group_size 32 but got 4304'. Add a vision-excluding sibling recipe (plain max-calib + *visual*/*vision_tower* disable rules), mirroring nvfp4_mlp_only_mse-kv_fp8_cast-novit and NVIDIA's reference nvidia/Kimi-K2.5-NVFP4 checkpoint. Register it in the loader smoke test and document it in modelopt_recipes/ptq.md. Fixes NVBugs 6287461. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

@shengliangxu

Address PR #1760 review comments: - Move 'novit' before 'kv_fp8' so the vision-exclusion lives with the nvfp4_mlp_only base config rather than the KV variant (per @shengliangxu). - Update shipped-recipe count in ptq.md (18 -> 19). - Add a focused regression test asserting the recipe disables the *visual*/*vision_tower* quantizers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

github-actions · 2026-06-26T00:51:29Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-26 00:51 UTC

#1858 #1839 #1857 #1869 (#1880) ## Cherry-picked PRs - #1801 - #1808 - #1629 - #1627 - #1824 - #1826 - #1830 - #1760 - #1831 - #1858 - #1839 - #1857 - #1869 #1839, #1857 and #1869 were back-ported (not a clean cherry-pick): the file was renamed `llm_ptq` -> `hf_ptq` (#1759) and surrounding `get_model` code diverged on `main`, but the actual fix targets the `init_empty_weights` / `from_config` block that already exists on the release branch. Accompanying unit tests were ported (15 passed).  ## Summary by CodeRabbit * **New Features** * Added a new PTQ recipe for NVFP4 MLP/MoE quantization with FP8 KV-cache calibration. * **Bug Fixes** * Improved ONNX mixed-precision/FP16 conversion reliability with stricter type handling and better stale output-shape reconciliation. * Fixed quantization/export edge cases: MoE router/gate handling, FP8 calibration/reduction failures, and additional FP8/INT8 robustness during export. * Standardized Puzzletron validation split naming to `validation`. * **Documentation** * Refreshed LM-Eval and TensorRT-Edge-LLM CLI instructions, including updated command names and examples.  --------- Signed-off-by: Meng Xin <mxin@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: dimapihtar <dpykhtar@nvidia.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Co-authored-by: mxinO <164952785+mxinO@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Zhiyu <zhiyuc@nvidia.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>

Edwardf0t1 requested review from a team as code owners June 17, 2026 01:25

Edwardf0t1 requested a review from h-guo18 June 17, 2026 01:25

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread modelopt_recipes/ptq.md Outdated

Edwardf0t1 requested review from juhi10071998, shengliangxu and sychen52 June 23, 2026 04:43

shengliangxu reviewed Jun 24, 2026

View reviewed changes

Comment thread modelopt_recipes/general/ptq/nvfp4_mlp_only-novit-kv_fp8.yaml

Edwardf0t1 changed the title ~~feat(recipes): add nvfp4_mlp_only-kv_fp8-novit (exclude VL vision tower)~~ feat(recipes): add nvfp4_mlp_only-novit-kv_fp8 (exclude VL vision tower) Jun 24, 2026

coderabbitai Bot approved these changes Jun 24, 2026

View reviewed changes

Edwardf0t1 added the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 25, 2026

shengliangxu approved these changes Jun 25, 2026

View reviewed changes

Edwardf0t1 and others added 2 commits June 26, 2026 00:25

Edwardf0t1 force-pushed the fix/kimi-k25-novit-recipe branch from f5cd4cd to 7fd792c Compare June 26, 2026 00:33

Edwardf0t1 enabled auto-merge (squash) June 26, 2026 00:34

Edwardf0t1 merged commit 6cc5226 into main Jun 26, 2026
43 checks passed

Edwardf0t1 deleted the fix/kimi-k25-novit-recipe branch June 26, 2026 00:51

kevalmorabia97 mentioned this pull request Jul 1, 2026

[Cherry-pick] PRs #1801 #1808 #1629 #1627 #1824 #1826 #1830 #1760 #1831 #1858 #1839 #1857 #1869 #1880

Merged

kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label Jul 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(recipes): add nvfp4_mlp_only-novit-kv_fp8 (exclude VL vision tower)#1760

feat(recipes): add nvfp4_mlp_only-novit-kv_fp8 (exclude VL vision tower)#1760
Edwardf0t1 merged 2 commits into
mainfrom
fix/kimi-k25-novit-recipe

Edwardf0t1 commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

shengliangxu left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Edwardf0t1 commented Jun 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Edwardf0t1 commented Jun 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading

codecov Bot commented Jun 17, 2026 •

edited

Loading