[NVBug 6061382] Fix PTQ for VLMs with image calibration#1318
[NVBug 6061382] Fix PTQ for VLMs with image calibration#1318LianaMikael merged 1 commit intomainfrom
Conversation
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughTwo changes generalize image-text calibration support in the PTQ pipeline and refine tokenizer truncation handling in VLM dataset processing. The first broadens image-text calibration setup to any model when a flag is set, rather than restricting it to Nemotron VL models. The second prevents truncation parameters from being applied when image-based multimodal processing is active. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1318 +/- ##
==========================================
+ Coverage 74.40% 75.67% +1.27%
==========================================
Files 464 464
Lines 50036 50293 +257
==========================================
+ Hits 37227 38058 +831
+ Misses 12809 12235 -574
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
modelopt/torch/utils/vlm_dataset_utils.py (1)
460-461: The truncation guard is currently a tautology against local state.Because Line 456 always inserts
"images"intokwargs, the Line 460 condition is always false, somax_lengthis never applied in this path. Consider gating by actual image presence instead of key membership.Proposed refactor
diff --git a/modelopt/torch/utils/vlm_dataset_utils.py b/modelopt/torch/utils/vlm_dataset_utils.py @@ - kwargs: dict[str, Any] = { - "text": list(prompts), - "images": list(images), - "return_tensors": "pt", - "padding": True, - } - if max_length is not None and "images" not in kwargs: + has_images = any(img is not None for img in images) + kwargs: dict[str, Any] = { + "text": list(prompts), + "return_tensors": "pt", + "padding": True, + } + if has_images: + kwargs["images"] = list(images) + if max_length is not None and not has_images: kwargs.update({"truncation": True, "max_length": max_length})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/utils/vlm_dataset_utils.py` around lines 460 - 461, The current guard uses '"images" not in kwargs' which always fails because earlier code always inserts the "images" key; change the check to detect actual image data instead of key membership. In the block that sets truncation (referencing max_length, kwargs and the "images" key), replace the condition with a runtime presence check such as "if max_length is not None and not kwargs.get('images'):" (or equivalent that treats empty/None image values as absent) so max_length/truncation is applied when there truly are no images, not just when the key is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 490-491: The change allows args.calib_with_images to enter the
multimodal branch for any model but the downstream selection still always uses
create_vlm_calibration_loop(...) only for Nemotron, which can misroute batches;
update the calibration loop selection so that when args.calib_with_images is
true you choose the correct loop based on the actual model type (e.g., call
create_vlm_calibration_loop(model, ...) only for models that implement the
Nemotron-style VLM interface and otherwise call the existing
create_calibration_loop(...) or a proper VLM-compatible loop), using the same
identifying symbols args.calib_with_images, create_vlm_calibration_loop,
create_calibration_loop and the model/type check to route multimodal batches to
the appropriate loop.
---
Nitpick comments:
In `@modelopt/torch/utils/vlm_dataset_utils.py`:
- Around line 460-461: The current guard uses '"images" not in kwargs' which
always fails because earlier code always inserts the "images" key; change the
check to detect actual image data instead of key membership. In the block that
sets truncation (referencing max_length, kwargs and the "images" key), replace
the condition with a runtime presence check such as "if max_length is not None
and not kwargs.get('images'):" (or equivalent that treats empty/None image
values as absent) so max_length/truncation is applied when there truly are no
images, not just when the key is missing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9eedd8b2-571e-4f28-9afe-b0dc1908bbca
📒 Files selected for processing (2)
examples/llm_ptq/hf_ptq.pymodelopt/torch/utils/vlm_dataset_utils.py
| elif args.calib_with_images: | ||
| # For VLM image calibration, we need an AutoProcessor to build multimodal inputs. |
There was a problem hiding this comment.
Broadened image-calibration entrypoint is not matched by downstream loop selection.
After this change, --calib_with_images can enter the multimodal path for any model, but Line 643 still only uses create_vlm_calibration_loop(...) for Nemotron. That can misroute multimodal batches for non-Nemotron VLMs and break calibration.
Proposed fix
diff --git a/examples/llm_ptq/hf_ptq.py b/examples/llm_ptq/hf_ptq.py
@@
- elif args.calib_with_images:
+ elif args.calib_with_images:
+ if not is_multimodal_model(full_model):
+ raise ValueError("--calib_with_images requires a multimodal/VLM checkpoint.")
# For VLM image calibration, we need an AutoProcessor to build multimodal inputs.
processor = AutoProcessor.from_pretrained(
args.pyt_ckpt_path,
trust_remote_code=args.trust_remote_code,
padding_side="left",
)
@@
- if args.calib_with_images and is_nemotron_vl_model:
+ if args.calib_with_images and is_multimodal_model(full_model):
calibrate_loop = create_vlm_calibration_loop(full_model, calib_dataloader)
else:
calibrate_loop = create_forward_loop(
dataloader=calib_dataloader,
allowed_non_tensor_keys={"base_model_outputs"}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llm_ptq/hf_ptq.py` around lines 490 - 491, The change allows
args.calib_with_images to enter the multimodal branch for any model but the
downstream selection still always uses create_vlm_calibration_loop(...) only for
Nemotron, which can misroute batches; update the calibration loop selection so
that when args.calib_with_images is true you choose the correct loop based on
the actual model type (e.g., call create_vlm_calibration_loop(model, ...) only
for models that implement the Nemotron-style VLM interface and otherwise call
the existing create_calibration_loop(...) or a proper VLM-compatible loop),
using the same identifying symbols args.calib_with_images,
create_vlm_calibration_loop, create_calibration_loop and the model/type check to
route multimodal batches to the appropriate loop.
|
Could you update the PR with your test plan? |
#1318 (#1350) ## Cherry-picked PRs - #1256 - #1305 - #1322 - #1317 - #1321 - #1289 - #1311 - #1332 - #1104 - #1318 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Documentation** * Updated installation guides with third-party software license disclaimers. * Added vLLM deployment instructions for model deployment. * Expanded NGC container image recommendations. * **Deprecations** * Mllama/vision model image processor support deprecated; users directed to use `--calib_with_images` with supported models. * **Bug Fixes** * Fixed ONNX quantization to exclude small MatMul/Gemm operations from INT8/FP8 quantization. * Improved FP8 export with enhanced cast-folding and attention fusion optimizations. * **New Features** * Added LayerNorm quantization support for improved FP8 attention quantization. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: samcheng <samcheng@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com> Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Co-authored-by: nv-samcheng <130026689+nv-samcheng@users.noreply.github.com> Co-authored-by: kinjalpatel27 <31936134+kinjalpatel27@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Michael Feil <63565275+michaelfeil@users.noreply.github.com> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Javier De Jesus <javier.dejesusj9@gmail.com> Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
### What does this PR do? This PR fixes PTQ with image claibration for VLMs. ### Usage ```python python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path Qwen/Qwen3-VL-8B-Instruct --qformat fp8 --export_path Qwen3-VL-8B-Instruct-fp8 --trust_remote_code --kv_cache_qformat none --calib_with_images --calib_size 512 ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Image-text calibration now extends support to additional model architectures when image calibration is enabled. * Improved tokenizer truncation handling in multimodal dataset processing to prevent configuration conflicts when image inputs are present. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
What does this PR do?
This PR fixes PTQ with image claibration for VLMs (NVBug 6061382)
Usage
Summary by CodeRabbit