[TRTLLM-11268][feat] Video temporal compression to Nemotron Nano and RADIO by 2ez4bz · Pull Request #12649 · NVIDIA/TensorRT-LLM

2ez4bz · 2026-04-01T05:10:14Z

Summary by CodeRabbit

New Features
- Added video preprocessing with aspect-ratio preservation for improved frame handling.
- Introduced temporal compression support for efficient video frame grouping and processing.
- Extended vision models to support video inputs with configurable temporal and spatial parameters.
Tests
- Added comprehensive test coverage for video preprocessing utilities and temporal compression workflows.

Description

Implement tubelet-based temporal compression for video inputs, matching the Megatron-LM / vLLM video processing pipeline. T consecutive frames are grouped into tubelets before embedding, reducing the token count by a factor of video_temporal_patch_size.

Key additions:

Aspect-ratio-preserving video frame resize and normalization
Separate video embedder in RADIO ViT for tubelet projection
Tubelet-aware token counting, frame separators, and EVS paths
Fix align_corners=True -> False in position embedding interpolation

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

2ez4bz · 2026-04-04T06:18:19Z

/bot run

coderabbitai · 2026-04-04T06:23:59Z

📝 Walkthrough

Walkthrough

This PR adds video temporal compression support to Nemotron nano and RADIO vision models, including aspect-ratio-preserving video preprocessing, tubelet-based temporal grouping via temporal_patch_size configuration, updated vision encoders and input processors to handle temporal frames, and comprehensive test coverage for the new video preprocessing pipeline.

Changes

Cohort / File(s)	Summary
Video Preprocessing Utilities `tensorrt_llm/_torch/models/modeling_nemotron_nano.py`	Added functions `_compute_aspect_preserving_size`, `get_video_target_size_and_feature_size`, and `video_to_pixel_values` to compute aspect-ratio-aware target dimensions, perform bicubic resizing with clamping/rescaling, and apply optional mean/std normalization.
Vision Encoder & Temporal Compression `tensorrt_llm/_torch/models/modeling_nemotron_nano.py`	Modified `NanoV2VLVisionEncoder.extract_feature` to accept optional `num_frames` and forward it to vision model when temporal compression is enabled; added `_extract_video_embeddings_temporal` method to process videos with temporal compression and flatten embeddings.
Input Processor Configuration & Video Frame Handling `tensorrt_llm/_torch/models/modeling_nemotron_nano.py`	Extended `NanoV2VLInputProcessor` with config fields for `video_temporal_patch_size`, `video_maintain_aspect_ratio`, `video_target_num_patches`, and video frame normalization tensors; updated `get_num_tokens_per_video` and `_compute_token_numbers_per_video` to group frames into tubelets and compute tokens using aspect-aware sizing; modified `_process_videos_frames` to use `video_to_pixel_values` and handle variable-sized frame batches as lists.
Prompt Construction for Video `tensorrt_llm/_torch/models/modeling_nemotron_nano.py`	Added `_build_tubelet_separators` method for temporal-compression-aware prompt formatting; updated `_get_frame_separators` and `_process_video_prompts` to conditionally generate tubelet-based separators and prepend video prefix based on configuration; modified `__call__` to handle `pixel_values` as either tensor or list.
EVS Application for Video `tensorrt_llm/_torch/models/modeling_nemotron_nano.py`	Updated `apply_evs_per_video` to handle both 2D and 3D `mm_embed` layouts by computing spatial token counts from pixel/tile dimensions and slicing/reshaping accordingly.
RADIO Model Temporal Patch Generation `tensorrt_llm/_torch/models/modeling_radio.py`	Added `temporal_patch_size` and `separate_video_embedder` configuration to `ViTPatchGenerator`; introduced conditional `video_embedder` (\`ViTPatchLinear\\`) when temporal compression is enabled; added `forward_video` method to pad frame patches, group into tubelets, and apply embedder. Modified `ViTPatchLinear` to accept temporal patch size and adjust input projection dimension accordingly.
RADIO Model Integration `tensorrt_llm/_torch/models/modeling_radio.py`	Extended `VisionTransformer`, `RADIOVisionModelBase`, and `RADIOVisionModel` to accept optional `num_frames` parameter; when provided and temporal compression is enabled, calls `patch_generator.forward_video`, packs/reshapes tubelets for attention, and computes sequence lengths accordingly. Updated positional interpolation in `window_select` from `align_corners=True` to `False`. Updated weight-loading logic to mark `_video_embedder_loaded` flag.
Test Coverage for Video Preprocessing `tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py`	Added comprehensive unit tests: `TestComputeAspectPreservingSize` (validates patch-size divisibility and aspect-ratio preservation), `TestGetVideoTargetSizeAndFeatureSize` (checks feature-size computation), `TestVideoToPixelValues` (verifies shape, normalization, resizing), `TestBuildTubeletSeparators` (asserts separator formatting and numbering), and `TestGetNumTokensPerVideoTemporal` (validates token count reduction and tubelet grouping).

Sequence Diagram

sequenceDiagram
    participant Input as Video Input
    participant Preproc as Preprocessing<br/>(video_to_pixel_values)
    participant Encoder as Vision Encoder<br/>(extract_feature)
    participant Temporal as Temporal Processor<br/>(forward_video)
    participant Embed as Embedder<br/>(video_embedder)
    participant Attn as Attention<br/>(transformer blocks)
    participant Output as Feature Output

    Input->>Preproc: pixel_values, target_size
    Preproc->>Preproc: aspect-preserving resize<br/>normalize
    Preproc->>Encoder: processed_pixels
    
    Encoder->>Temporal: num_frames provided
    Temporal->>Temporal: pad frames to<br/>multiple of T
    Temporal->>Temporal: group into<br/>tubelets (T frames)
    Temporal->>Embed: tubelet patches
    Embed->>Embed: temporal projection<br/>3*T*patch_size²
    Embed->>Temporal: embedded tubelets
    Temporal->>Temporal: add positional encoding<br/>add CLS token
    
    Temporal->>Attn: repacked tubelets<br/>for attention
    Attn->>Attn: process through<br/>transformer blocks
    Attn->>Temporal: attended features
    Temporal->>Temporal: reshape back<br/>to frames
    Temporal->>Output: temporal embeddings

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–75 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.64% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: implementing video temporal compression for Nemotron Nano and RADIO models, with proper JIRA ticket reference and feature type.
Description check	✅ Passed	The description covers the key implementation details and objectives, though the Test Coverage section is incomplete and some checklist items are unchecked.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py (1)

714-716: Unused unpacked variables w and h.

These variables are unpacked but never used. Consider prefixing with underscore to indicate intentional discard.

♻️ Proposed fix

     def test_predicted_vs_actual_token_count(self):
-        w, h = self.FRAME_SIZE
+        _w, _h = self.FRAME_SIZE
         proc = _make_processor(max_num_patches=256, min_num_patches=4)

Or simply remove the unpacking since FRAME_SIZE is accessed directly elsewhere:

     def test_predicted_vs_actual_token_count(self):
         proc = _make_processor(max_num_patches=256, min_num_patches=4)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py` around
lines 714 - 716, In test_predicted_vs_actual_token_count, the unpacking "w, h =
self.FRAME_SIZE" creates unused variables w and h; either remove that unpacking
entirely or rename them to indicate intentional discard (e.g., "_w, _h =
self.FRAME_SIZE") so linters won’t flag unused variables; update the test around
_make_processor and FRAME_SIZE usage accordingly (no behavior change).

tensorrt_llm/_torch/models/modeling_nemotron_nano.py (2)

191-192: Consider handling partial normalization parameters.

If only one of norm_mean or norm_std is provided (but not both), the normalization is silently skipped. This could lead to unexpected behavior. Consider raising an error for this edge case.

🛡️ Proposed fix

     # Apply mean/std normalization (matches vLLM's input_conditioner).
-    if norm_mean is not None and norm_std is not None:
+    if norm_mean is not None or norm_std is not None:
+        if norm_mean is None or norm_std is None:
+            raise ValueError(
+                "Both norm_mean and norm_std must be provided for normalization, "
+                f"got norm_mean={norm_mean is not None}, norm_std={norm_std is not None}"
+            )
         video_tensor = (video_tensor - norm_mean) / norm_std

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py` around lines 191 - 192,
Currently normalization only runs when both norm_mean and norm_std are non-None
which silently skips normalization if only one is provided; update the logic
around the video_tensor normalization to validate the parameters: if exactly one
of norm_mean or norm_std is None, raise a ValueError (or appropriate exception)
with a clear message referencing norm_mean and norm_std, otherwise apply the
normalization video_tensor = (video_tensor - norm_mean) / norm_std; keep the
check and transformation located with the existing video_tensor normalization
block so callers cannot pass partial normalization parameters without being
alerted.

74-75: Unnecessary int() cast on round() result.

In Python 3, round() already returns an integer when called with one argument. The int() wrapper is redundant.

♻️ Proposed fix

-    reduction_factor = int(round(1 / downsample_ratio))
+    reduction_factor = round(1 / downsample_ratio)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py` around lines 74 - 75,
The assignment to reduction_factor unnecessarily wraps round(1 /
downsample_ratio) in int(); update the code in modeling_nemotron_nano.py to set
reduction_factor = round(1 / downsample_ratio) (and keep required_divisor =
reduction_factor) so the redundant int() cast is removed while preserving the
same behavior for reduction_factor and required_divisor.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_radio.py`:
- Around line 1121-1124: The condition for setting
patch_gen._video_embedder_loaded is ambiguous and can be true even when the
checkpoint lacked video_embedder weights; instead, only set
_video_embedder_loaded when the patch_gen actually has a video_embedder module
and the key wasn't unexpected. Update the logic in the load path around
radio_model.model.patch_generator (patch_gen) to first verify the presence of
the submodule (e.g., hasattr/ getattr(patch_gen, "video_embedder")) and then
check that 'model.patch_generator.video_embedder.weight' is not in
unexpected_keys before assigning patch_gen._video_embedder_loaded = True.

---

Nitpick comments:
In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py`:
- Around line 191-192: Currently normalization only runs when both norm_mean and
norm_std are non-None which silently skips normalization if only one is
provided; update the logic around the video_tensor normalization to validate the
parameters: if exactly one of norm_mean or norm_std is None, raise a ValueError
(or appropriate exception) with a clear message referencing norm_mean and
norm_std, otherwise apply the normalization video_tensor = (video_tensor -
norm_mean) / norm_std; keep the check and transformation located with the
existing video_tensor normalization block so callers cannot pass partial
normalization parameters without being alerted.
- Around line 74-75: The assignment to reduction_factor unnecessarily wraps
round(1 / downsample_ratio) in int(); update the code in
modeling_nemotron_nano.py to set reduction_factor = round(1 / downsample_ratio)
(and keep required_divisor = reduction_factor) so the redundant int() cast is
removed while preserving the same behavior for reduction_factor and
required_divisor.

In `@tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py`:
- Around line 714-716: In test_predicted_vs_actual_token_count, the unpacking
"w, h = self.FRAME_SIZE" creates unused variables w and h; either remove that
unpacking entirely or rename them to indicate intentional discard (e.g., "_w, _h
= self.FRAME_SIZE") so linters won’t flag unused variables; update the test
around _make_processor and FRAME_SIZE usage accordingly (no behavior change).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c6b77b2c-737c-4418-a7d8-7996a8b516a2

📥 Commits

Reviewing files that changed from the base of the PR and between 9ab5cef and 9640b75.

📒 Files selected for processing (3)

tensorrt_llm/_torch/models/modeling_nemotron_nano.py
tensorrt_llm/_torch/models/modeling_radio.py
tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py

tensorrt-cicd · 2026-04-04T06:24:35Z

PR_Github #41809 [ run ] triggered by Bot. Commit: 9640b75 Link to invocation

tensorrt-cicd · 2026-04-04T06:24:36Z

PR_Github #41809 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 PM PST on 4/4.

Link to invocation

2ez4bz · 2026-04-06T20:32:05Z

/bot run

tensorrt-cicd · 2026-04-06T20:38:04Z

PR_Github #41985 [ run ] triggered by Bot. Commit: 9640b75 Link to invocation

tensorrt-cicd · 2026-04-07T00:51:05Z

PR_Github #41985 [ run ] completed with state SUCCESS. Commit: 9640b75
/LLM/main/L0_MergeRequest_PR pipeline #32837 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…RADIO Implement tubelet-based temporal compression for video inputs, matching the Megatron-LM / vLLM video processing pipeline. T consecutive frames are grouped into tubelets before embedding, reducing the token count by a factor of `video_temporal_patch_size`. Key additions: - Aspect-ratio-preserving video frame resize and normalization - Separate video embedder in RADIO ViT for tubelet projection - Tubelet-aware token counting, frame separators, and EVS paths - Fix align_corners=True -> False in position embedding interpolation Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz · 2026-04-09T23:45:46Z

/bot run

tensorrt-cicd · 2026-04-09T23:52:23Z

PR_Github #42593 [ run ] triggered by Bot. Commit: 3192c7a Link to invocation

tensorrt-cicd · 2026-04-10T03:23:25Z

PR_Github #42593 [ run ] completed with state SUCCESS. Commit: 3192c7a
/LLM/main/L0_MergeRequest_PR pipeline #33319 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz · 2026-04-10T05:30:10Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-10T08:04:37Z

PR_Github #42673 [ run ] triggered by Bot. Commit: 696c3d2 Link to invocation

tensorrt-cicd · 2026-04-10T16:13:31Z

PR_Github #42673 [ run ] completed with state SUCCESS. Commit: 696c3d2
/LLM/main/L0_MergeRequest_PR pipeline #33379 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

2ez4bz · 2026-04-10T18:36:14Z

/bot run

tensorrt-cicd · 2026-04-10T18:46:37Z

PR_Github #42732 [ run ] triggered by Bot. Commit: 696c3d2 Link to invocation

tensorrt-cicd · 2026-04-10T21:31:20Z

PR_Github #42732 [ run ] completed with state SUCCESS. Commit: 696c3d2
/LLM/main/L0_MergeRequest_PR pipeline #33415 completed with status: 'SUCCESS'

CI Report

Link to invocation

github-actions bot assigned 2ez4bz Apr 1, 2026

2ez4bz commented Apr 1, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated

2ez4bz commented Apr 1, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_radio.py

2ez4bz commented Apr 1, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_radio.py

Wanli-Jiang reviewed Apr 1, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated

Wanli-Jiang reviewed Apr 1, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py

Wanli-Jiang reviewed Apr 1, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated

2ez4bz force-pushed the dev-nano-v3-video branch from 52a836f to 9640b75 Compare April 4, 2026 06:14

2ez4bz marked this pull request as ready for review April 4, 2026 06:16

2ez4bz requested review from a team as code owners April 4, 2026 06:16

2ez4bz requested review from jaedeok-nvidia, omera-nv and symphonylyh April 4, 2026 06:16

coderabbitai bot reviewed Apr 4, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_radio.py Outdated

Wanli-Jiang approved these changes Apr 6, 2026

View reviewed changes

yechank-nvidia reviewed Apr 7, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_radio.py

Comment thread tensorrt_llm/_torch/models/modeling_radio.py Outdated

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated

2ez4bz force-pushed the dev-nano-v3-video branch from 9640b75 to b3b2152 Compare April 7, 2026 18:56

yechank-nvidia approved these changes Apr 9, 2026

View reviewed changes

2ez4bz force-pushed the dev-nano-v3-video branch 2 times, most recently from 7c38538 to 3192c7a Compare April 9, 2026 23:45

2ez4bz enabled auto-merge (squash) April 9, 2026 23:46

fix EVS for nano nemotron

696c3d2

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz force-pushed the dev-nano-v3-video branch from 3192c7a to 696c3d2 Compare April 10, 2026 05:29

2ez4bz merged commit 07ba6d0 into NVIDIA:main Apr 10, 2026
5 checks passed

Conversation

2ez4bz commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

2ez4bz commented Apr 4, 2026

Uh oh!

coderabbitai bot commented Apr 4, 2026

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

tensorrt-cicd commented Apr 4, 2026

Uh oh!

2ez4bz commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

2ez4bz commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

2ez4bz commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

2ez4bz commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

2ez4bz commented Apr 1, 2026 •

edited by coderabbitai bot

Loading