[None][fix] Fix Nano chunked prefill by 2ez4bz · Pull Request #12782 · NVIDIA/TensorRT-LLM

2ez4bz · 2026-04-06T22:24:51Z

Summary by CodeRabbit

Release Notes

Bug Fixes
- Refined token counting for images and videos in multimodal models to use fixed token allocations.
Tests
- Added test coverage for audio-enabled model configurations.

Description

Why?

The number of multimodal tokens was calculated incorrectly for
newer models that support both audio and vision modalities,
leading to errors at runtime when chunked prefill is enabled.

What?

This commit fixes this issue, and adds a test for it (verified to
fail without the fix).

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-04-06T22:29:08Z

📝 Walkthrough

Walkthrough

Token counting logic for image and video modalities in the Nemotron Nano model was updated to use fixed token counts (2 tokens per image) instead of dynamic counts derived from all multimodal special tokens. A corresponding test was added to verify image token counts remain consistent when audio is enabled.

Changes

Cohort / File(s)	Summary
Model Token Counting `tensorrt_llm/_torch/models/modeling_nemotron_nano.py`	Updated `get_num_tokens_per_image` and `get_num_tokens_per_video` to use fixed token counts (`num_special_per_frame = 2`) instead of relying on `len(self.get_mm_special_token_ids())`, decoupling image token budgeting from audio special tokens.
Test Coverage `tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py`	Added parameterized test `test_get_num_tokens_per_image_with_audio_config` that verifies image token counts remain unchanged when audio configuration is enabled, with test coverage across multiple image counts (1, 2, 3).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main fix: addressing chunked prefill token counting for Nano models with audio support.
Description check	✅ Passed	The description explains the problem and solution, but the Test Coverage section is incomplete and lacks specific test names.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py`:
- Around line 260-281: The test test_get_num_tokens_per_image_with_audio_config
currently parametrize num_images but never uses it; either remove the
`@pytest.mark.parametrize`("num_images", [1,2,3]) or change the body of
test_get_num_tokens_per_image_with_audio_config to honor num_images by creating
that many images (e.g., loop or list comprehension) and assert
proc.get_num_tokens_per_image(image=img) == expected_tokens for each image;
update references to _make_audio_processor and get_num_tokens_per_image
accordingly so the test actually validates multiple-image behavior when
num_images > 1.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 278a9d05-0382-4344-baa6-631e0033e6ec

📥 Commits

Reviewing files that changed from the base of the PR and between 2b80f8d and 58b1356.

📒 Files selected for processing (2)

tensorrt_llm/_torch/models/modeling_nemotron_nano.py
tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py

yechank-nvidia

LGTM. Suggesting name change, but still okay now.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz · 2026-04-07T16:45:37Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-07T16:52:28Z

PR_Github #42165 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

tensorrt-cicd · 2026-04-07T21:47:16Z

PR_Github #42165 [ run ] completed with state SUCCESS. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #32993 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

2ez4bz · 2026-04-08T04:42:47Z

/bot run

tensorrt-cicd · 2026-04-08T04:48:21Z

PR_Github #42269 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

tensorrt-cicd · 2026-04-08T06:35:38Z

PR_Github #42269 [ run ] completed with state FAILURE. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #33069 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

2ez4bz · 2026-04-08T06:42:51Z

/bot run

tensorrt-cicd · 2026-04-08T06:50:22Z

PR_Github #42289 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

tensorrt-cicd · 2026-04-08T11:32:17Z

PR_Github #42289 [ run ] completed with state SUCCESS. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #33085 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

2ez4bz · 2026-04-08T17:22:13Z

/bot run

tensorrt-cicd · 2026-04-08T17:28:27Z

PR_Github #42370 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

tensorrt-cicd · 2026-04-08T21:49:00Z

PR_Github #42370 [ run ] completed with state SUCCESS. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #33151 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

2ez4bz · 2026-04-08T22:19:33Z

/bot run

tensorrt-cicd · 2026-04-08T22:26:13Z

PR_Github #42406 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

tensorrt-cicd · 2026-04-09T01:12:18Z

PR_Github #42406 [ run ] completed with state FAILURE. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #33178 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

2ez4bz · 2026-04-09T03:21:16Z

/bot run

tensorrt-cicd · 2026-04-09T04:04:32Z

PR_Github #42448 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

tensorrt-cicd · 2026-04-09T09:53:26Z

PR_Github #42448 [ run ] completed with state SUCCESS. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #33213 completed with status: 'SUCCESS'

CI Report

Link to invocation

2ez4bz requested a review from a team as a code owner April 6, 2026 22:24

2ez4bz requested a review from yechank-nvidia April 6, 2026 22:24

github-actions bot assigned 2ez4bz Apr 6, 2026

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

Comment thread tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py Outdated

yechank-nvidia approved these changes Apr 7, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated

yechank-nvidia mentioned this pull request Apr 7, 2026

[TRTLLM-11268][feat] Video temporal compression to Nemotron Nano and RADIO #12649

Merged

1 task

[None][fix] Fix Nano chunked prefill

5ee93cb

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz force-pushed the dev-nano-fix-chunked-prefill branch from 58b1356 to 5ee93cb Compare April 7, 2026 16:45

2ez4bz enabled auto-merge (squash) April 7, 2026 18:12

2ez4bz merged commit 26a6151 into NVIDIA:main Apr 9, 2026
5 checks passed

Conversation

2ez4bz commented Apr 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Apr 6, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yechank-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

2ez4bz commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

2ez4bz commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

2ez4bz commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

2ez4bz commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

2ez4bz commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 8, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

2ez4bz commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

tensorrt-cicd commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

2ez4bz commented Apr 6, 2026 •

edited by coderabbitai bot

Loading