feat(recipes): add nvfp4_mlp_only-novit-kv_fp8 (exclude VL vision tower)#1760
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a new PTQ recipe for NVFP4 MLP/MoE quantization with FP8 KV-cache, excludes vision tower quantizers, updates the shipped recipe catalog, registers the recipe in built-in loader coverage, and adds a unit test for the disabled vision quantizers. Changesnvfp4_mlp_only-kv_fp8-novit PTQ recipe addition
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/unit/recipe/test_loader.py (1)
167-167: ⚡ Quick winAdd a focused regression assertion for the new
-novitrecipe behavior.This addition only smoke-tests loadability. Since this recipe’s purpose is to keep vision quantizers disabled, add a direct assertion (similar to
test_nvfp4_weight_only_recipe_disables_vllm_marlin_incompatible_projections) to lock that behavior.Suggested test addition
+def test_nvfp4_mlp_only_kv_fp8_novit_disables_vision_quantizers(): + recipe = load_recipe("general/ptq/nvfp4_mlp_only-kv_fp8-novit") + disabled_quantizers = { + entry["quantizer_name"] + for entry in recipe.quantize.model_dump()["quant_cfg"] + if entry.get("enable") is False + } + assert {"*visual*", "*vision_tower*"} <= disabled_quantizersAs per coding guidelines, checked-in tests should be lean but explicitly protect against regressions in expected behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/recipe/test_loader.py` at line 167, The current addition only verifies that the nvfp4_mlp_only-kv_fp8-novit recipe can be loaded, but does not assert that the recipe actually disables vision quantizers as intended. Create a new test function (similar to the pattern used in test_nvfp4_weight_only_recipe_disables_vllm_marlin_incompatible_projections) that loads the nvfp4_mlp_only-kv_fp8-novit recipe and explicitly asserts that vision quantizers are disabled, ensuring the regression behavior is locked in place.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt_recipes/ptq.md`:
- Line 42: The markdown file contains a section header that references "All 18"
shipped recipes, but after adding the new row for nvfp4_mlp_only-kv_fp8-novit to
the table, the total count is now 19 recipes. Update the section header text to
reflect the correct count of 19 recipes instead of 18 to keep the documentation
accurate and avoid confusion.
---
Nitpick comments:
In `@tests/unit/recipe/test_loader.py`:
- Line 167: The current addition only verifies that the
nvfp4_mlp_only-kv_fp8-novit recipe can be loaded, but does not assert that the
recipe actually disables vision quantizers as intended. Create a new test
function (similar to the pattern used in
test_nvfp4_weight_only_recipe_disables_vllm_marlin_incompatible_projections)
that loads the nvfp4_mlp_only-kv_fp8-novit recipe and explicitly asserts that
vision quantizers are disabled, ensuring the regression behavior is locked in
place.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9cc5aaa6-6f9a-400d-a888-8607b90fa485
📒 Files selected for processing (3)
modelopt_recipes/general/ptq/nvfp4_mlp_only-kv_fp8-novit.yamlmodelopt_recipes/ptq.mdtests/unit/recipe/test_loader.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1760 +/- ##
=======================================
Coverage 77.36% 77.36%
=======================================
Files 513 513
Lines 56894 56894
=======================================
Hits 44016 44016
Misses 12878 12878
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Address PR #1760 review comments: - Move 'novit' before 'kv_fp8' so the vision-exclusion lives with the nvfp4_mlp_only base config rather than the KV variant (per @shengliangxu). - Update shipped-recipe count in ptq.md (18 -> 19). - Add a focused regression test asserting the recipe disables the *visual*/*vision_tower* quantizers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
The bare `*mlp*` enable globs in nvfp4_mlp_only-kv_fp8 also match VL vision tower block MLPs (e.g. Kimi-K2.5 vision_tower.encoder.blocks.*.mlp.*). Quantizing the ViT FFNs to NVFP4 is quality-harmful and can also break export: Kimi-K2.5's MoonViT vt_intermediate_size=4304 is not divisible by the NVFP4 packing constraint (2 x group_size = 32), so compressed-tensors export raises 'tensor column shape must be divisible by the given group_size 32 but got 4304'. Add a vision-excluding sibling recipe (plain max-calib + *visual*/*vision_tower* disable rules), mirroring nvfp4_mlp_only_mse-kv_fp8_cast-novit and NVIDIA's reference nvidia/Kimi-K2.5-NVFP4 checkpoint. Register it in the loader smoke test and document it in modelopt_recipes/ptq.md. Fixes NVBugs 6287461. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Address PR #1760 review comments: - Move 'novit' before 'kv_fp8' so the vision-exclusion lives with the nvfp4_mlp_only base config rather than the KV variant (per @shengliangxu). - Update shipped-recipe count in ptq.md (18 -> 19). - Add a focused regression test asserting the recipe disables the *visual*/*vision_tower* quantizers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
f5cd4cd to
7fd792c
Compare
|
#1858 #1839 #1857 #1869 (#1880) ## Cherry-picked PRs - #1801 - #1808 - #1629 - #1627 - #1824 - #1826 - #1830 - #1760 - #1831 - #1858 - #1839 - #1857 - #1869 #1839, #1857 and #1869 were back-ported (not a clean cherry-pick): the file was renamed `llm_ptq` -> `hf_ptq` (#1759) and surrounding `get_model` code diverged on `main`, but the actual fix targets the `init_empty_weights` / `from_config` block that already exists on the release branch. Accompanying unit tests were ported (15 passed). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added a new PTQ recipe for NVFP4 MLP/MoE quantization with FP8 KV-cache calibration. * **Bug Fixes** * Improved ONNX mixed-precision/FP16 conversion reliability with stricter type handling and better stale output-shape reconciliation. * Fixed quantization/export edge cases: MoE router/gate handling, FP8 calibration/reduction failures, and additional FP8/INT8 robustness during export. * Standardized Puzzletron validation split naming to `validation`. * **Documentation** * Refreshed LM-Eval and TensorRT-Edge-LLM CLI instructions, including updated command names and examples. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Meng Xin <mxin@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: dimapihtar <dpykhtar@nvidia.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Co-authored-by: mxinO <164952785+mxinO@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Zhiyu <zhiyuc@nvidia.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
What does this PR do?
Type of change: Bug fix
Adds a new built-in PTQ recipe
general/ptq/nvfp4_mlp_only-novit-kv_fp8that is identical tonvfp4_mlp_only-kv_fp8but excludes the VL vision tower from quantization.Root cause (NVBugs 6287461): The bare
*mlp*enable globs innvfp4_mlp_only-kv_fp8also match VL vision-tower block MLPs (e.g. Kimi-K2.5vision_tower.encoder.blocks.*.mlp.fc0/fc1). Quantizing the ViT FFNs to NVFP4 is both quality-harmful (degenerate image embeddings) and can break export: Kimi-K2.5's MoonViTvt_intermediate_size=4304is not divisible by the NVFP4 packing constraint (2 × group_size = 32, since 4-bit values pack 2-per-byte).4304 = 16 × 269is divisible by 16 but not 32, so the compressed-tensors export raisesValueError: tensor column shape must be divisible by the given group_size 32 but got 4304. All language-model dims (2048 / 7168 / 18432) are divisible by 32 and quantize fine.The new recipe appends
*visual*/*vision_tower*disable rules (after the*mlp*enables, so the disable wins), mirroring the existingnvfp4_mlp_only_mse-kv_fp8_cast-novitrecipe and NVIDIA's referencenvidia/Kimi-K2.5-NVFP4checkpoint (which excludes the vision tower, multimodal projector, attention, and lm_head).Usage
Testing
tests/unit/recipe/test_loader.pybuiltin smoke list (test_load_recipe_all_builtins).test_nvfp4_mlp_only_novit_recipe_disables_vision_quantizers) asserting the*visual*/*vision_tower*quantizers are disabled.validate modelopt recipeshook.Before your PR is "Ready for review"
CONTRIBUTING.md: N/AAdditional Information
Fixes NVBugs 6287461 (Kimi-K2.5
nvfp4_mlp_only-kv_fp8quant failure). Related Jira: OMNIML-5005.🤖 Generated with Claude Code
Summary by CodeRabbit