[TRTLLM-11974][feat] Handle multimodal placeholder expansion in token space for Nemotron Nano by moraxu · Pull Request #13069 · NVIDIA/TensorRT-LLM

moraxu · 2026-04-15T06:12:44Z

Summary by CodeRabbit

New Features
- Faster tokenized multimodal path with improved placeholder expansion for images, video, and audio.
- More efficient video token pruning and EVS-aware retained-token computation.
Bug Fixes
- More consistent placeholder expansion results and clearer error cases for mismatched multimodal input.
Tests
- Extensive new unit and integration tests covering multimodal preprocessing, placeholder variants, and EVS behaviors.

Description

Adds the tokenized-prompt + multimodal-data fast path (the same pattern PR #11708 introduced for LLaVA-Next) to Nemotron Nano V3 OMNI, covering all three modalities — image, video, audio — including EVS-enabled video. Also fixes a pre-existing bug in NanoV2VLInputProcessor.get_num_tokens_per_video's EVS branch where the per-tubelet retained token count was inflated by routing through get_num_tokens_per_image's dynamic-tiler logic, causing find_mm_token_positions to disagree with the vision encoder's real post-EVS output.

tensorrt_llm/_torch/models/modeling_nemotron_nano.py — Adds get_text_with_mm_placeholders, get_num_tokens_per_audio, expand_prompt_token_ids_for_mm (+ per-modality helpers) to NanoV2VLInputProcessor. Video expansion uses a two-tier strategy mirroring vLLM's _apply_prompt_updates (token-level subsequence match with a text-level decode/split/re-encode fallback). EVS support included via a parallel evs_ids stream consumed by merge_evs_mm_embeds. Fixes get_num_tokens_per_video's EVS branch to derive video_size the same way _process_videos_frames does.
tensorrt_llm/inputs/evs.py - Extracts compute_retained_tokens_from_tubelet_budget as the shared retention-count helper; compute_retained_tokens_count now delegates to it.
tensorrt_llm/inputs/multimodal.py — Adds audio branch to find_mm_token_lengths; fixes find_mm_token_positions no-match early return from [] to ([], []) to match its declared 2-tuple signature.
tensorrt_llm/inputs/registry.py - tokenized_multimodal_process threads multi_modal_data through to the input processor (needed by video expansion) and merges any returned mm_data_updates (e.g. evs_ids) into extra_processed_inputs["multimodal_data"]. Adds "audio" to the multimodal-hashing modality whitelist.
tensorrt_llm/_torch/models/modeling_llava_next.py - expand_prompt_token_ids_for_mm signature updated to return Tuple[List[int], Optional[Dict[str, Dict[str, Any]]]]; always returns (expanded, None).

Test Coverage

tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py - 8 new test classes covering each fast-path method, multi-token BPE patterns, EVS token-level + text-level tiers, dispatch integration, and audio.

Tested with VLMEvalKitMcore for the following datasets InfoVQA_VAL, OCRBench, VoxPopuli_ASR_test, TedLium_ASR_Test, Video-MME (EVS off), Video-MME (EVS of 0.5) by temporarily tokenizing the text prompt to trigger the fast path. The scores match the default, non-fast path where the text prompt and MM data is handled directly by the HF processor.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

moraxu · 2026-04-22T04:31:08Z

@coderabbitai full review

coderabbitai · 2026-04-22T04:31:13Z

✅ Actions performed

Full review triggered.

moraxu · 2026-04-22T04:31:24Z

/bot run

tensorrt-cicd · 2026-04-22T04:37:20Z

PR_Github #44887 [ run ] triggered by Bot. Commit: a0d696a Link to invocation

coderabbitai · 2026-04-22T04:37:31Z

📝 Walkthrough

Walkthrough

Added a tokenized multimodal fast path for Nemotron Nano (image/audio/video), changed LLaVA-Next's expand function to return (expanded_ids, mm_data_updates), introduced a tubelet-based retained-token helper for EVS, and wired mm_data_updates through the tokenized multimodal registry; tests updated/added.

Changes

Cohort / File(s)	Summary
LLaVA-Next Return Type `tensorrt_llm/_torch/models/modeling_llava_next.py`, `tests/unittest/_torch/modeling/test_modeling_llava_next.py`	Changed `expand_prompt_token_ids_for_mm` signature to return `Tuple[List[int], Optional[Dict[str, Dict[str, Any]]]]`; tests updated to unpack `(expanded, mm_data_updates)` and assert `mm_data_updates is None`.
Nemotron Nano Tokenized Fast Path `tensorrt_llm/_torch/models/modeling_nemotron_nano.py`, `tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py`	Implemented tokenized+MM fast path: pre-tokenized video placeholder, `get_text_with_mm_placeholders`, modality detection and dispatch, image/audio/video token-ID expanders, lightweight & full video sizing, EVS-aware evs_ids stream, token-level and text-level `<video>` replacement fallbacks; extensive unit tests added.
EVS Retained Token Computation `tensorrt_llm/inputs/evs.py`, `tensorrt_llm/inputs/__init__.py`	Added `compute_retained_tokens_from_tubelet_budget(num_tubelets, tokens_per_tubelet, pruning_ratio)` and refactored `compute_retained_tokens_count` to delegate to it; exported the new helper from `tensorrt_llm.inputs`.
Multimodal Processing Pipeline `tensorrt_llm/inputs/registry.py`, `tensorrt_llm/inputs/multimodal.py`	`tokenized_multimodal_process` now builds `expand_kwargs` including `_multi_modal_data` and passes `hf_processor_mm_kwargs`; captures `mm_data_updates` from expansion and merges per-modality updates into `extra_processed_inputs["multimodal_data"]`; `find_mm_token_positions` early return now yields two empty lists.
Test Integration / Lists `tests/integration/test_lists/test-db/l0_l40s.yml`	Added Nemotron Nano preprocessing unit tests to the L40S PyTorch test set.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ExpandFn as expand_prompt_token_ids_for_mm
    participant ModalityDetect as Modality Detector
    participant ImageExp as Image Expander
    participant AudioExp as Audio Expander
    participant VideoExp as Video Expander
    participant EVS as EVS Stream

    Client->>ExpandFn: prompt token ids, _multi_modal_data, hf_processor_mm_kwargs
    ExpandFn->>ModalityDetect: determine active modality
    ModalityDetect-->>ExpandFn: image / audio / video

    alt image
        ExpandFn->>ImageExp: expand image placeholders -> expanded_ids
        ImageExp-->>ExpandFn: expanded_ids
    else audio
        ExpandFn->>AudioExp: expand audio placeholders -> expanded_ids
        AudioExp-->>ExpandFn: expanded_ids
    else video
        ExpandFn->>VideoExp: expand video placeholders -> expanded_ids, maybe evs_ids
        VideoExp->>EVS: build evs_ids (if EVS & pruning)
        EVS-->>VideoExp: evs_ids
        VideoExp-->>ExpandFn: expanded_ids + evs_ids
    end

    ExpandFn-->>Client: (expanded_ids, mm_data_updates)

sequenceDiagram
    participant Registry
    participant ExpandFn
    participant Extra as Extra Processed Inputs

    Registry->>Registry: build expand_kwargs (include _multi_modal_data)
    Registry->>ExpandFn: call expand_prompt_token_ids_for_mm(expand_kwargs)
    ExpandFn-->>Registry: (expanded_ids, mm_data_updates)
    alt mm_data_updates != None
        Registry->>Extra: merge mm_data_updates into multimodal_data
        Extra-->>Registry: updated multimodal_data
    end
    Registry-->>Registry: return processed result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.26% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding multimodal placeholder expansion in token space for Nemotron Nano. It is specific, descriptive, and directly related to the primary objective of the PR.
Description check	✅ Passed	The PR description fully covers required sections: it clearly explains the what and why, lists affected files with specific changes, provides test coverage details, and includes a completed PR checklist. All template sections are adequately addressed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/models/modeling_nemotron_nano.py (1)
1-1: ⚠️ Potential issue | 🟡 Minor

Update the NVIDIA copyright year.

This file is meaningfully modified in this PR, but the header still says 2025.

As per coding guidelines "Add NVIDIA copyright header on ALL new files and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py` at line 1, The file
header still shows "Copyright (c) 2025, NVIDIA CORPORATION. All rights
reserved." — update the year to the current year (change 2025 to 2026) in the
file's top copyright header (the copyright comment string) to comply with the
"update year on modified files" guideline; ensure the header format remains
identical aside from the year.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py`:
- Around line 1196-1284: The fast-path in expand_prompt_token_ids_for_mm drops
audio stored under multimodal_data["video"], so detect when mm_data contains
video with an "audio" field (check mm_data.get("video") and "audio" in
mm_data["video"]) and avoid the video-only fast-path; either (A) preserve and
return the appropriate audio updates by mirroring the logic in
_extract_audio_from_video and include those updates in mm_data_updates alongside
the evs_ids returned from _expand_video_placeholders_in_token_ids, or (B
simpler) explicitly fall back to the full HF processor by returning
prompt_token_ids, None (or otherwise signaling the caller to use the full
processor) whenever video metadata contains audio so the original prompt/audio
handling remains intact. Ensure you reference expand_prompt_token_ids_for_mm,
_expand_video_placeholders_in_token_ids, and _extract_audio_from_video when
implementing the chosen fix.
- Around line 1008-1034: get_text_with_mm_placeholders currently uses module
constants (IMAGE_PLACEHOLDER, VIDEO_PLACEHOLDER, AUDIO_PLACEHOLDER) which can
diverge from the instance configuration; change it to append the configured
placeholder attributes (self.img_context_token, self.video_context_token,
self._sound_context_token) instead, repeating each configured token
num_images/num_videos/num_audios times and returning the concatenated string so
the fast path matches the configured processor tokens.

In `@tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py`:
- Around line 1452-1456: The test assigns the first return value from
proc.expand_prompt_token_ids_for_mm to an unused variable `expanded`, which
triggers RUF059; change the unpack to use `_` for the unused first element
(e.g., `_, mm_data_updates = proc.expand_prompt_token_ids_for_mm(...)`) so only
`mm_data_updates` is treated as the meaningful binding while satisfying the
linter.

---

Outside diff comments:
In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py`:
- Line 1: The file header still shows "Copyright (c) 2025, NVIDIA CORPORATION.
All rights reserved." — update the year to the current year (change 2025 to
2026) in the file's top copyright header (the copyright comment string) to
comply with the "update year on modified files" guideline; ensure the header
format remains identical aside from the year.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9c8388c6-8aea-48f8-96fa-ab988158d3a4

📥 Commits

Reviewing files that changed from the base of the PR and between 0d2bea7 and a0d696a.

📒 Files selected for processing (9)

tensorrt_llm/_torch/models/modeling_llava_next.py
tensorrt_llm/_torch/models/modeling_nemotron_nano.py
tensorrt_llm/inputs/__init__.py
tensorrt_llm/inputs/evs.py
tensorrt_llm/inputs/multimodal.py
tensorrt_llm/inputs/registry.py
tests/integration/test_lists/test-db/l0_l40s.yml
tests/unittest/_torch/modeling/test_modeling_llava_next.py
tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py (1)

1452-1456: ⚠️ Potential issue | 🟡 Minor

Unused expanded variable flagged by linter.

Ruff RUF059 flags expanded as unused. Use _ to indicate the value is intentionally discarded.

Proposed fix

-        expanded, mm_data_updates = proc.expand_prompt_token_ids_for_mm(
+        _, mm_data_updates = proc.expand_prompt_token_ids_for_mm(
             prompt,
             [total_mm],
             hf_processor_mm_kwargs={"_multi_modal_data": mm_data},
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py` around
lines 1452 - 1456, The linter flags the local variable expanded as unused; when
calling proc.expand_prompt_token_ids_for_mm, discard that first return value by
replacing expanded with an underscore (use `_`), i.e., change the unpacking to
`_ , mm_data_updates = proc.expand_prompt_token_ids_for_mm(prompt, [total_mm],
hf_processor_mm_kwargs={"_multi_modal_data": mm_data}) so the unused return is
intentionally ignored while preserving mm_data_updates and the call to
expand_prompt_token_ids_for_mm.

tensorrt_llm/_torch/models/modeling_nemotron_nano.py (2)

1008-1034: ⚠️ Potential issue | 🟠 Major

Use the processor’s configured placeholder strings here.

get_text_with_mm_placeholders() still hard-codes the module constants. If a checkpoint overrides img_context_token, video_context_token, or sound_context_token, the dummy HF-processor call diverges from the actual processor config and the fast path can miscount or fail.

Suggested fix

-        parts.extend([IMAGE_PLACEHOLDER] * num_images)
-        parts.extend([VIDEO_PLACEHOLDER] * num_videos)
-        parts.extend([AUDIO_PLACEHOLDER] * num_audios)
+        parts.extend([self.img_context_token] * num_images)
+        parts.extend([self.video_context_token] * num_videos)
+        parts.extend([self._sound_context_token] * num_audios)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py` around lines 1008 -
1034, get_text_with_mm_placeholders currently uses hard-coded module constants
(IMAGE_PLACEHOLDER, VIDEO_PLACEHOLDER, AUDIO_PLACEHOLDER); change it to read the
actual processor/configured placeholder tokens (e.g., use
self.processor.img_context_token, self.processor.video_context_token,
self.processor.sound_context_token or the model/checkpoint attributes
img_context_token, video_context_token, sound_context_token) so the dummy text
matches the HF processor configuration; locate the method
get_text_with_mm_placeholders and replace the uses of
IMAGE_PLACEHOLDER/VIDEO_PLACEHOLDER/AUDIO_PLACEHOLDER with the corresponding
runtime-configured tokens and fall back to the constants if the processor
attributes are missing.

1239-1284: ⚠️ Potential issue | 🟠 Major

The tokenized video fast path still misses audio carried in video metadata.

__call__()’s video path runs _extract_audio_from_video() and injects audio placeholders for videos whose metadata contains audio. This path only checks top-level "audio" and always dispatches raw video requests to _expand_video_placeholders_in_token_ids(), so expanded_ids stays video-only while the dummy-processor output can already contain multimodal_data["video"]["audio"]. That leaves token IDs and multimodal embeddings out of sync for Omni video+audio requests.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py` around lines 1239 -
1284, The video fast-path in __call__ currently always treats a request with
video as video-only and calls _expand_video_placeholders_in_token_ids, which
misses audio carried inside mm_data["video"]["audio"] injected by
_extract_audio_from_video; update the detection/dispatch so that if mm_data
contains video.audio (or the dummy processor output includes
multimodal_data["video"]["audio"]) you either (A) set has_audio alongside
has_video and run the audio-expansion logic (e.g., call
_expand_audio_placeholders_in_token_ids or a combined expansion), or (B) after
_expand_video_placeholders_in_token_ids apply the audio placeholder expansion
and merge audio evs/ids into mm_data_updates and the returned expanded_ids so
token IDs and mm_data_updates stay in sync; reference __call__, mm_data,
_extract_audio_from_video, _expand_video_placeholders_in_token_ids,
_expand_audio_placeholders_in_token_ids, evs_ids_tensor and mm_data_updates when
making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py`:
- Around line 1306-1341: The loop that expands img placeholders computes
num_context incorrectly by subtracting len(self._img_start_token_ids) and
len(self._img_end_token_ids); because get_num_tokens_per_image budgets wrappers
as +2 (one start + one end), change the calculation to num_context = n - 2 (and
clamp to >= 0) so you repeat img_context_token_id the correct number of times;
keep using self._img_start_token_ids, self._img_end_token_ids,
img_context_token_id and num_mm_tokens_per_placeholder as before and validate
image_idx bounds as currently implemented.
- Around line 1557-1567: The zip over frame_seps and tokens_per_frame should be
made strict to surface length mismatches instead of silently truncating; update
the loop that iterates "for frame_sep, num_tokens in zip(frame_seps,
tokens_per_frame):" to use zip(frame_seps, tokens_per_frame, strict=True) so any
mismatch between frame_seps and tokens_per_frame raises immediately; ensure this
change preserves the existing logic that encodes frame_sep via
self.tokenizer.encode and extends expansion and evs_expansion with
_img_start_token_ids, img_context_token_id / video_context_token_id,
_img_end_token_ids as currently implemented.

---

Duplicate comments:
In `@tensorrt_llm/_torch/models/modeling_nemotron_nano.py`:
- Around line 1008-1034: get_text_with_mm_placeholders currently uses hard-coded
module constants (IMAGE_PLACEHOLDER, VIDEO_PLACEHOLDER, AUDIO_PLACEHOLDER);
change it to read the actual processor/configured placeholder tokens (e.g., use
self.processor.img_context_token, self.processor.video_context_token,
self.processor.sound_context_token or the model/checkpoint attributes
img_context_token, video_context_token, sound_context_token) so the dummy text
matches the HF processor configuration; locate the method
get_text_with_mm_placeholders and replace the uses of
IMAGE_PLACEHOLDER/VIDEO_PLACEHOLDER/AUDIO_PLACEHOLDER with the corresponding
runtime-configured tokens and fall back to the constants if the processor
attributes are missing.
- Around line 1239-1284: The video fast-path in __call__ currently always treats
a request with video as video-only and calls
_expand_video_placeholders_in_token_ids, which misses audio carried inside
mm_data["video"]["audio"] injected by _extract_audio_from_video; update the
detection/dispatch so that if mm_data contains video.audio (or the dummy
processor output includes multimodal_data["video"]["audio"]) you either (A) set
has_audio alongside has_video and run the audio-expansion logic (e.g., call
_expand_audio_placeholders_in_token_ids or a combined expansion), or (B) after
_expand_video_placeholders_in_token_ids apply the audio placeholder expansion
and merge audio evs/ids into mm_data_updates and the returned expanded_ids so
token IDs and mm_data_updates stay in sync; reference __call__, mm_data,
_extract_audio_from_video, _expand_video_placeholders_in_token_ids,
_expand_audio_placeholders_in_token_ids, evs_ids_tensor and mm_data_updates when
making the change.

In `@tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py`:
- Around line 1452-1456: The linter flags the local variable expanded as unused;
when calling proc.expand_prompt_token_ids_for_mm, discard that first return
value by replacing expanded with an underscore (use `_`), i.e., change the
unpacking to `_ , mm_data_updates = proc.expand_prompt_token_ids_for_mm(prompt,
[total_mm], hf_processor_mm_kwargs={"_multi_modal_data": mm_data}) so the unused
return is intentionally ignored while preserving mm_data_updates and the call to
expand_prompt_token_ids_for_mm.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: fe50175a-81c8-4c6d-874c-7478f1253657

📥 Commits

Reviewing files that changed from the base of the PR and between 0d2bea7 and a0d696a.

📒 Files selected for processing (9)

tensorrt_llm/_torch/models/modeling_llava_next.py
tensorrt_llm/_torch/models/modeling_nemotron_nano.py
tensorrt_llm/inputs/__init__.py
tensorrt_llm/inputs/evs.py
tensorrt_llm/inputs/multimodal.py
tensorrt_llm/inputs/registry.py
tests/integration/test_lists/test-db/l0_l40s.yml
tests/unittest/_torch/modeling/test_modeling_llava_next.py
tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py

tensorrt-cicd · 2026-04-22T10:14:36Z

PR_Github #44887 [ run ] completed with state SUCCESS. Commit: a0d696a
/LLM/main/L0_MergeRequest_PR pipeline #35221 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

venkywonka · 2026-04-24T00:10:24Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

moraxu · 2026-04-27T18:37:35Z

/bot run

tensorrt-cicd · 2026-04-27T18:45:25Z

PR_Github #45770 [ run ] triggered by Bot. Commit: 14e79e4 Link to invocation

tensorrt-cicd · 2026-04-27T20:01:01Z

PR_Github #45770 [ run ] completed with state FAILURE. Commit: 14e79e4
/LLM/main/L0_MergeRequest_PR pipeline #35961 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

moraxu · 2026-04-27T21:58:04Z

/bot run

tensorrt-cicd · 2026-04-27T22:05:34Z

PR_Github #45791 [ run ] triggered by Bot. Commit: 14e79e4 Link to invocation

tensorrt-cicd · 2026-04-30T09:13:48Z

PR_Github #46285 [ run ] completed with state SUCCESS. Commit: 2593b4d
/LLM/main/L0_MergeRequest_PR pipeline #36388 completed with status: 'SUCCESS'

CI Report

Link to invocation

moraxu · 2026-04-30T23:55:58Z

/bot run

tensorrt-cicd · 2026-05-01T00:02:11Z

PR_Github #46450 [ run ] triggered by Bot. Commit: ab95b4c Link to invocation

tensorrt-cicd · 2026-05-01T00:44:42Z

PR_Github #46450 [ run ] completed with state SUCCESS. Commit: ab95b4c
/LLM/main/L0_MergeRequest_PR pipeline #36520 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

moraxu · 2026-05-01T05:42:13Z

/bot run

tensorrt-cicd · 2026-05-01T05:48:31Z

PR_Github #46473 [ run ] triggered by Bot. Commit: 8a3e5b7 Link to invocation

tensorrt-cicd · 2026-05-01T08:36:08Z

PR_Github #46473 [ run ] completed with state SUCCESS. Commit: 8a3e5b7
/LLM/main/L0_MergeRequest_PR pipeline #36537 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

moraxu · 2026-05-01T16:38:03Z

/bot run

tensorrt-cicd · 2026-05-01T16:45:25Z

PR_Github #46511 [ run ] triggered by Bot. Commit: 41f5784 Link to invocation

tensorrt-cicd · 2026-05-02T03:22:53Z

PR_Github #46511 [ run ] completed with state SUCCESS. Commit: 41f5784
/LLM/main/L0_MergeRequest_PR pipeline #36571 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

moraxu · 2026-05-02T03:25:01Z

/bot run

tensorrt-cicd · 2026-05-02T03:31:20Z

PR_Github #46541 [ run ] triggered by Bot. Commit: 41f5784 Link to invocation

tensorrt-cicd · 2026-05-02T08:20:16Z

PR_Github #46541 [ run ] completed with state FAILURE. Commit: 41f5784
/LLM/main/L0_MergeRequest_PR pipeline #36599 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

moraxu · 2026-05-04T00:47:49Z

/bot run

tensorrt-cicd · 2026-05-04T00:54:27Z

PR_Github #46611 [ run ] triggered by Bot. Commit: 41f5784 Link to invocation

tensorrt-cicd · 2026-05-04T04:15:57Z

PR_Github #46611 [ run ] completed with state SUCCESS. Commit: 41f5784
/LLM/main/L0_MergeRequest_PR pipeline #36658 completed with status: 'SUCCESS'

CI Report

Link to invocation

github-actions Bot assigned moraxu Apr 15, 2026

moraxu force-pushed the dev-mguzek-add-fast-path-vlm-for-nano-v3-omni branch from 642eb68 to a0d696a Compare April 22, 2026 04:30

moraxu marked this pull request as ready for review April 22, 2026 04:30

moraxu requested review from a team as code owners April 22, 2026 04:31

moraxu requested review from tijyojwad and yechank-nvidia April 22, 2026 04:31

moraxu requested review from 2ez4bz, Wanli-Jiang and venkywonka April 22, 2026 04:31

moraxu commented Apr 22, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py

Comment thread tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py Outdated

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated

moraxu force-pushed the dev-mguzek-add-fast-path-vlm-for-nano-v3-omni branch from 83c288b to 37b9a3f Compare April 22, 2026 21:28

2ez4bz reviewed Apr 22, 2026

View reviewed changes

2ez4bz reviewed Apr 23, 2026

View reviewed changes

moraxu commented Apr 24, 2026

View reviewed changes

Comment thread tensorrt_llm/inputs/registry.py

moraxu requested a review from 2ez4bz April 27, 2026 18:37

moraxu force-pushed the dev-mguzek-add-fast-path-vlm-for-nano-v3-omni branch from 2593b4d to ab95b4c Compare April 30, 2026 23:53

moraxu added 11 commits May 1, 2026 09:35

Final draft

48d3bfa

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Address reviews

7c5da93

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Address CodeRabbit reviews

be3695b

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Address reviews

b8966d4

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Address reviews

2a51157

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Support extracting audio from video

13f903c

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Add tests for audio from video extraction in fast path

2458cf4

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Address review comments

e1a0ff5

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Fix CI

a0ff81b

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Fix CI

e96374e

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

Fix CI

41f5784

Signed-off-by: Michal Guzek <mguzek@nvidia.com>

moraxu force-pushed the dev-mguzek-add-fast-path-vlm-for-nano-v3-omni branch from 8a3e5b7 to 41f5784 Compare May 1, 2026 16:36

2ez4bz merged commit 95385b8 into NVIDIA:main May 4, 2026
6 checks passed

Conversation

moraxu commented Apr 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

moraxu commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Uh oh!

moraxu commented Apr 22, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

venkywonka commented Apr 24, 2026

Code review

Uh oh!

Uh oh!

moraxu commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

moraxu commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 30, 2026

Uh oh!

moraxu commented Apr 30, 2026

Uh oh!

tensorrt-cicd commented May 1, 2026

Uh oh!

tensorrt-cicd commented May 1, 2026

Uh oh!

moraxu commented May 1, 2026

Uh oh!

moraxu commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading