[TRTLLM-12430][tests] Add video E2E test for nano v3 omni#13883
Conversation
📝 WalkthroughWalkthroughThis PR introduces AudioASREvaluator and VideoMME evaluators for automatic speech recognition and video question-answering evaluation, generalizes the accuracy test framework to support multiple metrics (WER, accuracy) with configurable higher_is_better direction, and integrates both evaluators into the multimodal test suite alongside the existing MMMU task. ChangesAudio ASR and Video QA Evaluators with Metric-Aware Testing
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/integration/defs/accuracy/video_mme.py (1)
314-326: 💤 Low valueDuplicated
_get_model_typefunction.This function is identical to
_get_model_contextintensorrt_llm/evaluate/audio_asr.py(except the latter also returnsmodel_dir). Consider extracting this to a shared utility intensorrt_llm/evaluate/interface.pyor a common module to avoid duplication.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/defs/accuracy/video_mme.py` around lines 314 - 326, The function _get_model_type is duplicated (also present as _get_model_context in tensorrt_llm/evaluate/audio_asr.py); extract the shared logic into a single utility (e.g., add get_model_type or get_model_info in tensorrt_llm/evaluate/interface.py) that accepts the LLM object, reads config.json, and returns model_type (and optionally model_dir or both to match _get_model_context). Update callers in tests/integration/defs/accuracy/video_mme.py and tensorrt_llm/evaluate/audio_asr.py to import and use the new shared function and remove the duplicate local implementations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tensorrt_llm/evaluate/audio_asr.py`:
- Line 452: The zip over sample_ids, predictions, references in the loop
(variables sample_id, prediction, reference) should use strict=True to avoid
silent truncation on mismatched lengths; update the for statement in the
function handling evaluation/iteration to call zip(sample_ids, predictions,
references, strict=True) so a ValueError is raised when lengths differ, matching
the other zip usage in this module.
- Line 305: The local assignment input = {"prompt": prompt} in
tensorrt_llm/evaluate/audio_asr.py shadows Python's built-in input; rename this
variable (for example to payload, input_payload, or prompt_input) and update all
subsequent references within the same scope to use the new name (preserving the
dictionary structure and keys so downstream code expecting {"prompt": prompt}
continues to work).
---
Nitpick comments:
In `@tests/integration/defs/accuracy/video_mme.py`:
- Around line 314-326: The function _get_model_type is duplicated (also present
as _get_model_context in tensorrt_llm/evaluate/audio_asr.py); extract the shared
logic into a single utility (e.g., add get_model_type or get_model_info in
tensorrt_llm/evaluate/interface.py) that accepts the LLM object, reads
config.json, and returns model_type (and optionally model_dir or both to match
_get_model_context). Update callers in
tests/integration/defs/accuracy/video_mme.py and
tensorrt_llm/evaluate/audio_asr.py to import and use the new shared function and
remove the duplicate local implementations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 01cf6dc8-be7e-40f4-a020-dee43b45133b
📒 Files selected for processing (7)
tensorrt_llm/evaluate/__init__.pytensorrt_llm/evaluate/audio_asr.pytests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/accuracy/references/videomme.yamltests/integration/defs/accuracy/references/voxpopuli.yamltests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.pytests/integration/defs/accuracy/video_mme.py
f1790c5 to
a454f35
Compare
|
/bot run |
|
PR_Github #47443 [ run ] triggered by Bot. Commit: |
|
PR_Github #47443 [ run ] completed with state
|
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
a454f35 to
4343e3c
Compare
|
/bot run |
|
PR_Github #47783 [ run ] triggered by Bot. Commit: |
|
PR_Github #47783 [ run ] completed with state
|
|
/bot run |
|
PR_Github #47800 [ run ] triggered by Bot. Commit: |
|
PR_Github #47800 [ run ] completed with state |
This PR adds a new video MCQ evaluation harness, which is used with a subset of the Video-MME dataset's short videos against Nano v3 Omni. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Summary by CodeRabbit
Release Notes
New Features
Tests
Description
This PR adds a new video MCQ evaluation harness, which is used
with a subset of the Video-MME dataset's short videos against
Nano v3 Omni.
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.