[TRTLLM-11265][feat] Implement dynamic resolution for Nemotron VL by 2ez4bz · Pull Request #11894 · NVIDIA/TensorRT-LLM

2ez4bz · 2026-03-04T05:21:41Z

Summary by CodeRabbit

New Features
- Added dynamic per-image resolution handling for multimodal models, enabling adaptive tiling and variable image size processing.
- Enhanced vision encoders to support flexible image resolutions with improved token budgeting and per-image patch optimization.
Tests
- Added comprehensive unit tests for preprocessing logic with parametrized image sizes and resolution constraints.

[None][feat] Implement dynamic resolution

NOTE: this is essentially forking over the changes from @netanel-haber 's PR to vLLM.

This change implements "dynamic resolution" handling of images
for Nemotron VL models.

It also adds some logic to handle newer configuration class definitions
for Nemotron VL.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

tensorrt_llm/_torch/models/modeling_nemotron_nano.py

2ez4bz · 2026-03-05T06:53:00Z

/bot run --disable-fail-fast

tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py

tensorrt-cicd · 2026-03-05T06:58:45Z

PR_Github #37824 [ run ] triggered by Bot. Commit: ff21ea8 Link to invocation

coderabbitai · 2026-03-05T07:03:37Z

📝 Walkthrough

Walkthrough

Introduces dynamic, per-image adaptive tiling and resolution handling for multimodal models. The Nemotron Nano model adds dynamic resolution tiling with token budgeting for variable-sized images, while the RADIO model adds dynamic sequence handling for per-image position embeddings. Both maintain backward compatibility with fixed-resolution paths. Comprehensive preprocessing tests validate the new functionality.

Changes

Cohort / File(s)	Summary
Nemotron Nano Dynamic Resolution `tensorrt_llm/_torch/models/modeling_nemotron_nano.py`	Added `DynamicResolutionParams` and `DynamicResolutionImageTiler` classes for adaptive tiling with token budgeting. Introduced `_process_images_dynamic` and dynamic feature extraction methods in `NanoV2VLVisionEncoder`. Extended image processing to support per-image sizing and token budgeting. Enhanced RMSNorm eps handling for config compatibility. Fallback to existing fixed-tile path when dynamic tiling is disabled.
RADIO Dynamic Sequences `tensorrt_llm/_torch/models/modeling_radio.py`	Added `calc_seq_len` and `calc_seq_lens` utility functions for dynamic sequence length computation. Extended `ViTPatchGenerator`, `VisionTransformer`, and `RADIOVisionModelBase` with dynamic processing branches supporting variable image resolutions via `imgs_sizes` parameter. Introduced per-image position encoding and CLS token handling. Fixed max sequence length for attention metadata stability. Updated `forward_features` and `_extract_final` paths with dynamic sequence propagation.
Test Infrastructure `tests/integration/test_lists/test-db/l0_a10.yml`	Added new test entry for Nemotron Nano v2 VL preprocessing validation.
Preprocessing Tests `tests/unittest/_torch/modeling/test_nemotron_nano_v2_vl_preprocessing.py`	New test module validating `DynamicResolutionImageTiler` parameter bounds, token budgeting constraints, patch processing, and convergence. Tests both dynamic and fixed-tile vision encoder forward paths with mocks and parametrized scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Image Input
    participant Processor as NanoV2VLInputProcessor
    participant Tiler as DynamicResolutionImageTiler
    participant Encoder as NanoV2VLVisionEncoder
    participant Model as NemotronH_Nano_VL_V2

    Client->>Processor: Provide images + token budget
    Processor->>Tiler: Request tiling parameters
    Tiler->>Tiler: Compute per-image patches<br/>(budget constrained)
    Tiler-->>Processor: Return resizing + patch counts
    Processor->>Processor: Resize images per tiling
    Processor->>Processor: Normalize + stack patches
    Processor->>Encoder: Forward patches + imgs_sizes
    Encoder->>Encoder: extract_feature_dynamic<br/>(per-image processing)
    Encoder-->>Model: Embeddings + per-image tokens
    Model->>Model: Integrate into multimodal flow

sequenceDiagram
    participant Client as Variable-Size Images
    participant Generator as ViTPatchGenerator
    participant Transformer as VisionTransformer
    participant RADIOVision as RADIOVisionModel
    participant Output as Feature Output

    Client->>Generator: Forward x, imgs_sizes
    Generator->>Generator: extract_patches_dynamic
    Generator->>Generator: apply_pos_enc_dynamic<br/>(per-image embeddings)
    Generator->>Generator: cls_token_dynamic<br/>(per-image CLS tokens)
    Generator-->>Transformer: patches with per-image<br/>sequence structure
    Transformer->>Transformer: forward_features<br/>(with dynamic branches)
    Transformer->>Transformer: prepare_attn_metadata<br/>(fixed max_seq_len)
    Transformer-->>RADIOVision: Processed features
    RADIOVision->>RADIOVision: _extract_final<br/>(dynamic reshape)
    RADIOVision-->>Output: Per-image aligned features

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 38.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description explains the feature (dynamic resolution handling for Nemotron VL) and acknowledges the upstream source, but the Test Coverage section is incomplete and lacks specific test details.	Complete the Test Coverage section by explicitly listing the test files and their coverage (e.g., 'test_nemotron_nano_v2_vl_preprocessing.py validates dynamic tiling parameters and vision encoder paths').

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main feature: implementing dynamic resolution for Nemotron VL models, which aligns perfectly with the primary changes across all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

tests/unittest/_torch/modeling/test_nemotron_nano_v2_vl_preprocessing.py (1)

10-14: Switch to module-level import for modeling_nemotron_nano.

Please import the module and reference symbols via namespace instead of importing classes directly.

♻️ Suggested change

-from tensorrt_llm._torch.models.modeling_nemotron_nano import (
-    DynamicResolutionImageTiler,
-    DynamicResolutionParams,
-    NanoV2VLVisionEncoder,
-)
+from tensorrt_llm._torch.models import modeling_nemotron_nano
...
-    return DynamicResolutionImageTiler(**defaults)
+    return modeling_nemotron_nano.DynamicResolutionImageTiler(**defaults)
...
-                DynamicResolutionParams(
+                modeling_nemotron_nano.DynamicResolutionParams(
...
-    encoder = mock.MagicMock(spec=NanoV2VLVisionEncoder)
+    encoder = mock.MagicMock(spec=modeling_nemotron_nano.NanoV2VLVisionEncoder)
...
-    NanoV2VLVisionEncoder.forward(vision_encoder, [mm_param])
+    modeling_nemotron_nano.NanoV2VLVisionEncoder.forward(vision_encoder, [mm_param])

As per coding guidelines, `Python imports must use form from package.subpackage import module (never from module import Class)`.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/modeling/test_nemotron_nano_v2_vl_preprocessing.py`
around lines 10 - 14, The test imports classes directly from
modeling_nemotron_nano; change to a module-level import (import
tensorrt_llm._torch.models.modeling_nemotron_nano as modeling_nemotron_nano) and
update all references to DynamicResolutionImageTiler, DynamicResolutionParams,
and NanoV2VLVisionEncoder in this file to use the module namespace
(modeling_nemotron_nano.DynamicResolutionImageTiler,
modeling_nemotron_nano.DynamicResolutionParams,
modeling_nemotron_nano.NanoV2VLVisionEncoder) to comply with the package import
guideline.

tensorrt_llm/_torch/models/modeling_nemotron_nano.py (1)

40-40: Use module namespace import for modeling_radio.

Line 40 imports symbols directly; switch to module import and access members via module namespace.

♻️ Suggested change

-from .modeling_radio import RADIOVisionModel, calc_seq_lens
+from . import modeling_radio
...
-        self.vision_model = RADIOVisionModel(vision_model_config, disable_quantization=True)
+        self.vision_model = modeling_radio.RADIOVisionModel(
+            vision_model_config, disable_quantization=True
+        )
...
-        seq_lens = calc_seq_lens(imgs_sizes, patch_dim)
+        seq_lens = modeling_radio.calc_seq_lens(imgs_sizes, patch_dim)

As per coding guidelines, `When importing in Python, always maintain the namespace. Import the module, not individual classes or functions`.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py` at line 40, Replace the
direct symbol import from .modeling_radio with a module-level import (e.g., use
"from . import modeling_radio" or "import
tensorrt_llm._torch.models.modeling_radio as modeling_radio") and then update
all references in this file that currently use RADIOVisionModel and
calc_seq_lens to use the module namespace (modeling_radio.RADIOVisionModel and
modeling_radio.calc_seq_lens); ensure any type hints, instantiations, or calls
are updated accordingly so the file no longer relies on direct symbol imports.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py`:
- Around line 470-483: The current dynamic-mode gating uses has_dynamic =
any(...) which can enable the dynamic image path for mixed image/video batches
and then access imgs_sizes on modalities that are video (causing failures) and
skip EVS handling; change the gating to compute per-modality flags (e.g.,
image_needs_dynamic = ["imgs_sizes" in multimodal_data.get(modality_type, {})
for modality_type, multimodal_data in zip(modality_types, multimodal_data_lst)])
and only take the dynamic branch for modalities that are strictly images (or
where all requests for that modality support imgs_sizes), calling
extract_feature_dynamic only for those modalities and falling back to the
existing static/EVS paths for video or modalities without imgs_sizes; update
mm_embedding assembly so you don't return early for mixed batches and preserve
EVS handling by invoking the EVS-specific code path where appropriate.
- Around line 121-122: The current computation of
closest_patch_height/closest_patch_width uses round(orig / self._patch_size +
0.5) which increments exact multiples of patch_size; replace the expression with
a proper half-up integer division so exact multiples stay unchanged (e.g.,
compute using floor((orig + self._patch_size/2) / self._patch_size) or int((orig
+ self._patch_size/2) / self._patch_size)). Update both closest_patch_height and
closest_patch_width (references: closest_patch_height, closest_patch_width,
orig_height, orig_width, self._patch_size) and add/import math.floor if you
choose the math.floor variant. Ensure behavior preserves exact multiples and
avoids over-allocating patches.

In `@tensorrt_llm/_torch/models/modeling_radio.py`:
- Around line 855-867: The dynamic-resolution branch that slices flattened patch
tokens (uses imgs_sizes, calc_seq_lens, patch_gen.num_skip/patch_size and builds
all_patches/all_feat) must be guarded against inputs in feature_fmt == 'NCHW'
(or whenever x/y are full NCHW tensors rather than flattened patches); add a
check (e.g., if imgs_sizes is not None and feature_fmt != 'NCHW') and only run
the calc_seq_lens/patch slicing logic when the tensor is flattened patches,
otherwise follow the existing NCHW path or reshape using explicit H/W from
imgs_sizes; ensure the code references patch_gen.num_skip, patch_gen.patch_size
and calc_seq_lens to compute num_patches and build all_feat accordingly.

---

Nitpick comments:
In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py`:
- Line 40: Replace the direct symbol import from .modeling_radio with a
module-level import (e.g., use "from . import modeling_radio" or "import
tensorrt_llm._torch.models.modeling_radio as modeling_radio") and then update
all references in this file that currently use RADIOVisionModel and
calc_seq_lens to use the module namespace (modeling_radio.RADIOVisionModel and
modeling_radio.calc_seq_lens); ensure any type hints, instantiations, or calls
are updated accordingly so the file no longer relies on direct symbol imports.

In `@tests/unittest/_torch/modeling/test_nemotron_nano_v2_vl_preprocessing.py`:
- Around line 10-14: The test imports classes directly from
modeling_nemotron_nano; change to a module-level import (import
tensorrt_llm._torch.models.modeling_nemotron_nano as modeling_nemotron_nano) and
update all references to DynamicResolutionImageTiler, DynamicResolutionParams,
and NanoV2VLVisionEncoder in this file to use the module namespace
(modeling_nemotron_nano.DynamicResolutionImageTiler,
modeling_nemotron_nano.DynamicResolutionParams,
modeling_nemotron_nano.NanoV2VLVisionEncoder) to comply with the package import
guideline.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9ea41539-110d-4e56-8e5d-8abb815daa2e

📥 Commits

Reviewing files that changed from the base of the PR and between 12f2f39 and ff21ea8.

📒 Files selected for processing (4)

tensorrt_llm/_torch/models/modeling_nemotron_nano.py
tensorrt_llm/_torch/models/modeling_radio.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/unittest/_torch/modeling/test_nemotron_nano_v2_vl_preprocessing.py

tensorrt_llm/_torch/models/modeling_nemotron_nano.py

tensorrt_llm/_torch/models/modeling_radio.py

tensorrt-cicd · 2026-03-05T11:01:14Z

PR_Github #37824 [ run ] completed with state SUCCESS. Commit: ff21ea8
/LLM/main/L0_MergeRequest_PR pipeline #29285 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

2ez4bz · 2026-03-05T17:51:22Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-05T18:23:19Z

PR_Github #37897 [ run ] triggered by Bot. Commit: ff21ea8 Link to invocation

tensorrt-cicd · 2026-03-05T20:18:47Z

PR_Github #37897 [ run ] completed with state SUCCESS. Commit: ff21ea8
/LLM/main/L0_MergeRequest_PR pipeline #29343 completed with status: 'SUCCESS'

Link to invocation

tensorrt_llm/_torch/models/modeling_nemotron_nano.py

tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py

2ez4bz · 2026-03-07T06:16:17Z

/bot run --disable-fail-fast

Wanli-Jiang

LGTM now.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

This commit implements "dynamic resolution" handling of images for Nemotron VL models. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz · 2026-03-09T16:21:17Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-09T16:27:09Z

PR_Github #38294 [ run ] triggered by Bot. Commit: b7844fb Link to invocation

tensorrt-cicd · 2026-03-09T20:40:50Z

PR_Github #38294 [ run ] completed with state SUCCESS. Commit: b7844fb
/LLM/main/L0_MergeRequest_PR pipeline #29673 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

2ez4bz · 2026-03-10T04:30:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-10T04:36:27Z

PR_Github #38375 [ run ] triggered by Bot. Commit: 283de88 Link to invocation

tensorrt-cicd · 2026-03-10T10:40:34Z

PR_Github #38375 [ run ] completed with state SUCCESS. Commit: 283de88
/LLM/main/L0_MergeRequest_PR pipeline #29743 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

2ez4bz · 2026-03-10T16:36:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-10T16:43:33Z

PR_Github #38463 [ run ] triggered by Bot. Commit: 283de88 Link to invocation

tensorrt-cicd · 2026-03-10T19:10:20Z

PR_Github #38463 [ run ] completed with state SUCCESS. Commit: 283de88
/LLM/main/L0_MergeRequest_PR pipeline #29820 completed with status: 'SUCCESS'

Link to invocation

…IDIA#11894) This commit implements "dynamic resolution" handling of images for Nemotron VL models. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

github-actions bot assigned 2ez4bz Mar 4, 2026

2ez4bz force-pushed the dev-nano-vl-dyn-res branch from 84b6e9e to cbac145 Compare March 4, 2026 07:25

2ez4bz commented Mar 4, 2026

View reviewed changes

tensorrt_llm/_torch/models/modeling_nemotron_nano.py Show resolved Hide resolved

tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated Show resolved Hide resolved

2ez4bz changed the title ~~[None][feat] Implement dynamic resolution~~ [None][feat] Implement dynamic resolution for Nemotron VL Mar 4, 2026

Wanli-Jiang self-requested a review March 4, 2026 08:58

2ez4bz force-pushed the dev-nano-vl-dyn-res branch from cbac145 to ff21ea8 Compare March 5, 2026 06:52

2ez4bz marked this pull request as ready for review March 5, 2026 06:52

2ez4bz requested review from a team as code owners March 5, 2026 06:52

2ez4bz requested a review from jaedeok-nvidia March 5, 2026 06:52

2ez4bz commented Mar 5, 2026

View reviewed changes

tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

tensorrt_llm/_torch/models/modeling_nemotron_nano.py Show resolved Hide resolved

tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated Show resolved Hide resolved

tensorrt_llm/_torch/models/modeling_radio.py Outdated Show resolved Hide resolved

2ez4bz changed the title ~~[None][feat] Implement dynamic resolution for Nemotron VL~~ [TRTLLM-11264][feat] Implement dynamic resolution for Nemotron VL Mar 5, 2026

2ez4bz changed the title ~~[TRTLLM-11264][feat] Implement dynamic resolution for Nemotron VL~~ [TRTLLM-11265][feat] Implement dynamic resolution for Nemotron VL Mar 5, 2026