Skip to content

[None][fix] Fix Nano chunked prefill#12782

Merged
2ez4bz merged 1 commit intoNVIDIA:mainfrom
2ez4bz:dev-nano-fix-chunked-prefill
Apr 9, 2026
Merged

[None][fix] Fix Nano chunked prefill#12782
2ez4bz merged 1 commit intoNVIDIA:mainfrom
2ez4bz:dev-nano-fix-chunked-prefill

Conversation

@2ez4bz
Copy link
Copy Markdown
Collaborator

@2ez4bz 2ez4bz commented Apr 6, 2026

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Refined token counting for images and videos in multimodal models to use fixed token allocations.
  • Tests

    • Added test coverage for audio-enabled model configurations.

Description

  • Why?

The number of multimodal tokens was calculated incorrectly for
newer models that support both audio and vision modalities,
leading to errors at runtime when chunked prefill is enabled.

  • What?

This commit fixes this issue, and adds a test for it (verified to
fail without the fix).

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@2ez4bz 2ez4bz requested a review from a team as a code owner April 6, 2026 22:24
@2ez4bz 2ez4bz requested a review from yechank-nvidia April 6, 2026 22:24
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Token counting logic for image and video modalities in the Nemotron Nano model was updated to use fixed token counts (2 tokens per image) instead of dynamic counts derived from all multimodal special tokens. A corresponding test was added to verify image token counts remain consistent when audio is enabled.

Changes

Cohort / File(s) Summary
Model Token Counting
tensorrt_llm/_torch/models/modeling_nemotron_nano.py
Updated get_num_tokens_per_image and get_num_tokens_per_video to use fixed token counts (num_special_per_frame = 2) instead of relying on len(self.get_mm_special_token_ids()), decoupling image token budgeting from audio special tokens.
Test Coverage
tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py
Added parameterized test test_get_num_tokens_per_image_with_audio_config that verifies image token counts remain unchanged when audio configuration is enabled, with test coverage across multiple image counts (1, 2, 3).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix: addressing chunked prefill token counting for Nano models with audio support.
Description check ✅ Passed The description explains the problem and solution, but the Test Coverage section is incomplete and lacks specific test names.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py`:
- Around line 260-281: The test test_get_num_tokens_per_image_with_audio_config
currently parametrize num_images but never uses it; either remove the
`@pytest.mark.parametrize`("num_images", [1,2,3]) or change the body of
test_get_num_tokens_per_image_with_audio_config to honor num_images by creating
that many images (e.g., loop or list comprehension) and assert
proc.get_num_tokens_per_image(image=img) == expected_tokens for each image;
update references to _make_audio_processor and get_num_tokens_per_image
accordingly so the test actually validates multiple-image behavior when
num_images > 1.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 278a9d05-0382-4344-baa6-631e0033e6ec

📥 Commits

Reviewing files that changed from the base of the PR and between 2b80f8d and 58b1356.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/models/modeling_nemotron_nano.py
  • tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py

Comment thread tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py Outdated
Copy link
Copy Markdown
Collaborator

@yechank-nvidia yechank-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Suggesting name change, but still okay now.

Comment thread tensorrt_llm/_torch/models/modeling_nemotron_nano.py Outdated
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
@2ez4bz 2ez4bz force-pushed the dev-nano-fix-chunked-prefill branch from 58b1356 to 5ee93cb Compare April 7, 2026 16:45
@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 7, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42165 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

@2ez4bz 2ez4bz enabled auto-merge (squash) April 7, 2026 18:12
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42165 [ run ] completed with state SUCCESS. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #32993 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 8, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42269 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42269 [ run ] completed with state FAILURE. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #33069 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 8, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42289 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42289 [ run ] completed with state SUCCESS. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #33085 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 8, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42370 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42370 [ run ] completed with state SUCCESS. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #33151 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 8, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42406 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42406 [ run ] completed with state FAILURE. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #33178 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 9, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42448 [ run ] triggered by Bot. Commit: 5ee93cb Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42448 [ run ] completed with state SUCCESS. Commit: 5ee93cb
/LLM/main/L0_MergeRequest_PR pipeline #33213 completed with status: 'SUCCESS'

CI Report

Link to invocation

@2ez4bz 2ez4bz merged commit 26a6151 into NVIDIA:main Apr 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants