[NVBug 6061382] Fix PTQ for VLMs with image calibration by LianaMikael · Pull Request #1318 · NVIDIA/Model-Optimizer

LianaMikael · 2026-04-22T10:47:17Z

What does this PR do?

This PR fixes PTQ with image claibration for VLMs (NVBug 6061382)

Usage

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-VL-8B-Instruct --qformat fp8 --export_path Qwen3-VL-8B-Instruct-fp8 --trust_remote_code --kv_cache_qformat none --calib_with_images --calib_size 512

Summary by CodeRabbit

Bug Fixes
- Image-text calibration now extends support to additional model architectures when image calibration is enabled.
- Improved tokenizer truncation handling in multimodal dataset processing to prevent configuration conflicts when image inputs are present.

Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>

copy-pr-bot · 2026-04-22T10:47:22Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-22T10:47:25Z

📝 Walkthrough

Walkthrough

Two changes generalize image-text calibration support in the PTQ pipeline and refine tokenizer truncation handling in VLM dataset processing. The first broadens image-text calibration setup to any model when a flag is set, rather than restricting it to Nemotron VL models. The second prevents truncation parameters from being applied when image-based multimodal processing is active.

Changes

Cohort / File(s)	Summary
Image-Text Calibration Condition Broadening `examples/llm_ptq/hf_ptq.py`	Removed model-type check from `load_model` to allow image-text calibration (AutoProcessor initialization with `padding_side="left"`) for any model when `--calib_with_images` is set; updated associated comment from "For Nemotron VL image calibration" to "For VLM image calibration".
Conditional Tokenizer Truncation `modelopt/torch/utils/vlm_dataset_utils.py`	Modified `_collate_fn` to apply tokenizer truncation parameters only when `max_length` is provided AND the processor call does not include an `"images"` key, preventing truncation conflicts in image-based multimodal processing scenarios.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: fixing PTQ for VLMs with image calibration, which matches the core modifications to handle image-based calibration in both the model loading logic and dataset collation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR does not introduce security anti-patterns; trust_remote_code is user-configurable, no unsafe deserialization or forbidden security bypasses present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch lmikaelyan/fix-vlm-compression

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-22T10:51:26Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-04-24 20:30 UTC

codecov · 2026-04-22T11:00:23Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 75.67%. Comparing base (785d3a2) to head (5dfda82).
⚠️ Report is 13 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/utils/vlm_dataset_utils.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1318      +/-   ##
==========================================
+ Coverage   74.40%   75.67%   +1.27%     
==========================================
  Files         464      464              
  Lines       50036    50293     +257     
==========================================
+ Hits        37227    38058     +831     
+ Misses      12809    12235     -574

Flag	Coverage Δ
examples	`41.53% <0.00%> (+5.49%)`	⬆️
gpu	`58.59% <0.00%> (-0.51%)`	⬇️
regression	`14.86% <0.00%> (+0.07%)`	⬆️
unit	`52.44% <0.00%> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

modelopt/torch/utils/vlm_dataset_utils.py (1)

460-461: The truncation guard is currently a tautology against local state.

Because Line 456 always inserts "images" into kwargs, the Line 460 condition is always false, so max_length is never applied in this path. Consider gating by actual image presence instead of key membership.

Proposed refactor

diff --git a/modelopt/torch/utils/vlm_dataset_utils.py b/modelopt/torch/utils/vlm_dataset_utils.py
@@
-        kwargs: dict[str, Any] = {
-            "text": list(prompts),
-            "images": list(images),
-            "return_tensors": "pt",
-            "padding": True,
-        }
-        if max_length is not None and "images" not in kwargs:
+        has_images = any(img is not None for img in images)
+        kwargs: dict[str, Any] = {
+            "text": list(prompts),
+            "return_tensors": "pt",
+            "padding": True,
+        }
+        if has_images:
+            kwargs["images"] = list(images)
+        if max_length is not None and not has_images:
             kwargs.update({"truncation": True, "max_length": max_length})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/utils/vlm_dataset_utils.py` around lines 460 - 461, The
current guard uses '"images" not in kwargs' which always fails because earlier
code always inserts the "images" key; change the check to detect actual image
data instead of key membership. In the block that sets truncation (referencing
max_length, kwargs and the "images" key), replace the condition with a runtime
presence check such as "if max_length is not None and not kwargs.get('images'):"
(or equivalent that treats empty/None image values as absent) so
max_length/truncation is applied when there truly are no images, not just when
the key is missing.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 490-491: The change allows args.calib_with_images to enter the
multimodal branch for any model but the downstream selection still always uses
create_vlm_calibration_loop(...) only for Nemotron, which can misroute batches;
update the calibration loop selection so that when args.calib_with_images is
true you choose the correct loop based on the actual model type (e.g., call
create_vlm_calibration_loop(model, ...) only for models that implement the
Nemotron-style VLM interface and otherwise call the existing
create_calibration_loop(...) or a proper VLM-compatible loop), using the same
identifying symbols args.calib_with_images, create_vlm_calibration_loop,
create_calibration_loop and the model/type check to route multimodal batches to
the appropriate loop.

---

Nitpick comments:
In `@modelopt/torch/utils/vlm_dataset_utils.py`:
- Around line 460-461: The current guard uses '"images" not in kwargs' which
always fails because earlier code always inserts the "images" key; change the
check to detect actual image data instead of key membership. In the block that
sets truncation (referencing max_length, kwargs and the "images" key), replace
the condition with a runtime presence check such as "if max_length is not None
and not kwargs.get('images'):" (or equivalent that treats empty/None image
values as absent) so max_length/truncation is applied when there truly are no
images, not just when the key is missing.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9eedd8b2-571e-4f28-9afe-b0dc1908bbca

📥 Commits

Reviewing files that changed from the base of the PR and between c417e6f and 5dfda82.

📒 Files selected for processing (2)

examples/llm_ptq/hf_ptq.py
modelopt/torch/utils/vlm_dataset_utils.py

coderabbitai · 2026-04-22T11:33:28Z

+    elif args.calib_with_images:
+        # For VLM image calibration, we need an AutoProcessor to build multimodal inputs.


⚠️ Potential issue | 🟠 Major

Broadened image-calibration entrypoint is not matched by downstream loop selection.

After this change, --calib_with_images can enter the multimodal path for any model, but Line 643 still only uses create_vlm_calibration_loop(...) for Nemotron. That can misroute multimodal batches for non-Nemotron VLMs and break calibration.

Proposed fix

diff --git a/examples/llm_ptq/hf_ptq.py b/examples/llm_ptq/hf_ptq.py @@ - elif args.calib_with_images: + elif args.calib_with_images: + if not is_multimodal_model(full_model): + raise ValueError("--calib_with_images requires a multimodal/VLM checkpoint.") # For VLM image calibration, we need an AutoProcessor to build multimodal inputs. processor = AutoProcessor.from_pretrained( args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code, padding_side="left", ) @@ - if args.calib_with_images and is_nemotron_vl_model: + if args.calib_with_images and is_multimodal_model(full_model): calibrate_loop = create_vlm_calibration_loop(full_model, calib_dataloader) else: calibrate_loop = create_forward_loop( dataloader=calib_dataloader, allowed_non_tensor_keys={"base_model_outputs"}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/hf_ptq.py` around lines 490 - 491, The change allows args.calib_with_images to enter the multimodal branch for any model but the downstream selection still always uses create_vlm_calibration_loop(...) only for Nemotron, which can misroute batches; update the calibration loop selection so that when args.calib_with_images is true you choose the correct loop based on the actual model type (e.g., call create_vlm_calibration_loop(model, ...) only for models that implement the Nemotron-style VLM interface and otherwise call the existing create_calibration_loop(...) or a proper VLM-compatible loop), using the same identifying symbols args.calib_with_images, create_vlm_calibration_loop, create_calibration_loop and the model/type check to route multimodal batches to the appropriate loop.

cjluo-nv · 2026-04-24T20:09:14Z

Could you update the PR with your test plan?

#1318 (#1350) ## Cherry-picked PRs - #1256 - #1305 - #1322 - #1317 - #1321 - #1289 - #1311 - #1332 - #1104 - #1318  ## Summary by CodeRabbit ## Release Notes * **Documentation** * Updated installation guides with third-party software license disclaimers. * Added vLLM deployment instructions for model deployment. * Expanded NGC container image recommendations. * **Deprecations** * Mllama/vision model image processor support deprecated; users directed to use `--calib_with_images` with supported models. * **Bug Fixes** * Fixed ONNX quantization to exclude small MatMul/Gemm operations from INT8/FP8 quantization. * Improved FP8 export with enhanced cast-folding and attention fusion optimizations. * **New Features** * Added LayerNorm quantization support for improved FP8 attention quantization.  --------- Signed-off-by: samcheng <samcheng@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com> Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Co-authored-by: nv-samcheng <130026689+nv-samcheng@users.noreply.github.com> Co-authored-by: kinjalpatel27 <31936134+kinjalpatel27@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Javier De Jesus <javier.dejesusj9@gmail.com> Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>

### What does this PR do? This PR fixes PTQ with image claibration for VLMs. ### Usage ```python python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-VL-8B-Instruct --qformat fp8 --export_path Qwen3-VL-8B-Instruct-fp8 --trust_remote_code --kv_cache_qformat none --calib_with_images --calib_size 512 ```  ## Summary by CodeRabbit * **Bug Fixes** * Image-text calibration now extends support to additional model architectures when image calibration is enabled. * Improved tokenizer truncation handling in multimodal dataset processing to prevent configuration conflicts when image inputs are present.  Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>

Fix VLM PTQ for Qwen 3 8B

5dfda82

Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>

LianaMikael requested a review from kevalmorabia97 April 22, 2026 10:47

LianaMikael marked this pull request as ready for review April 22, 2026 11:28

LianaMikael requested review from a team as code owners April 22, 2026 11:28

LianaMikael requested a review from sugunav14 April 22, 2026 11:28

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

kevalmorabia97 approved these changes Apr 22, 2026

View reviewed changes

kevalmorabia97 added the cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Apr 22, 2026

kevalmorabia97 requested review from cjluo-nv and meenchen April 22, 2026 15:52

LianaMikael enabled auto-merge (squash) April 24, 2026 13:19

cjluo-nv approved these changes Apr 24, 2026

View reviewed changes

LianaMikael merged commit 946639a into main Apr 24, 2026
47 checks passed

LianaMikael deleted the lmikaelyan/fix-vlm-compression branch April 24, 2026 20:30

kevalmorabia97 changed the title ~~Fix PTQ for VLMs with image calibration~~ [NVBug 6061382] Fix PTQ for VLMs with image calibration Apr 24, 2026

kevalmorabia97 mentioned this pull request Apr 27, 2026

[Cherry-pick] PRs #1256 #1305 #1322 #1317 #1321 #1289 #1311 #1332 #1104 #1318 #1350

Merged

kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVBug 6061382] Fix PTQ for VLMs with image calibration#1318

[NVBug 6061382] Fix PTQ for VLMs with image calibration#1318
LianaMikael merged 1 commit intomainfrom
lmikaelyan/fix-vlm-compression

LianaMikael commented Apr 22, 2026 •

edited by kevalmorabia97

Loading

Uh oh!

copy-pr-bot Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

cjluo-nv commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		elif args.calib_with_images:
		# For VLM image calibration, we need an AutoProcessor to build multimodal inputs.

Conversation

LianaMikael commented Apr 22, 2026 • edited by kevalmorabia97 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LianaMikael commented Apr 22, 2026 •

edited by kevalmorabia97

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

github-actions Bot commented Apr 22, 2026 •

edited

Loading

codecov Bot commented Apr 22, 2026 •

edited

Loading