[None][fix] Fix Nano chunked prefill#12782
Conversation
📝 WalkthroughWalkthroughToken counting logic for image and video modalities in the Nemotron Nano model was updated to use fixed token counts (2 tokens per image) instead of dynamic counts derived from all multimodal special tokens. A corresponding test was added to verify image token counts remain consistent when audio is enabled. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py`:
- Around line 260-281: The test test_get_num_tokens_per_image_with_audio_config
currently parametrize num_images but never uses it; either remove the
`@pytest.mark.parametrize`("num_images", [1,2,3]) or change the body of
test_get_num_tokens_per_image_with_audio_config to honor num_images by creating
that many images (e.g., loop or list comprehension) and assert
proc.get_num_tokens_per_image(image=img) == expected_tokens for each image;
update references to _make_audio_processor and get_num_tokens_per_image
accordingly so the test actually validates multiple-image behavior when
num_images > 1.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 278a9d05-0382-4344-baa6-631e0033e6ec
📒 Files selected for processing (2)
tensorrt_llm/_torch/models/modeling_nemotron_nano.pytests/unittest/_torch/modeling/test_nemotron_nano_preprocessing.py
yechank-nvidia
left a comment
There was a problem hiding this comment.
LGTM. Suggesting name change, but still okay now.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
58b1356 to
5ee93cb
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #42165 [ run ] triggered by Bot. Commit: |
|
PR_Github #42165 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42269 [ run ] triggered by Bot. Commit: |
|
PR_Github #42269 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42289 [ run ] triggered by Bot. Commit: |
|
PR_Github #42289 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42370 [ run ] triggered by Bot. Commit: |
|
PR_Github #42370 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42406 [ run ] triggered by Bot. Commit: |
|
PR_Github #42406 [ run ] completed with state
|
|
/bot run |
|
PR_Github #42448 [ run ] triggered by Bot. Commit: |
|
PR_Github #42448 [ run ] completed with state |
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests
Description
The number of multimodal tokens was calculated incorrectly for
newer models that support both audio and vision modalities,
leading to errors at runtime when chunked prefill is enabled.
This commit fixes this issue, and adds a test for it (verified to
fail without the fix).
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.