[TRTLLM-12429][tests] Add audio E2E test for nano v3 omni#13750
Conversation
moraxu
left a comment
There was a problem hiding this comment.
LGTM, assuming audio-specific methods from tensorrt_llm/evaluate/audio_asr.py are correct
b531204 to
66dd700
Compare
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughThis PR introduces audio ASR evaluation capability to TensorRT-LLM. It adds the ChangesAudio ASR Evaluator Implementation
Test Framework & Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tensorrt_llm/evaluate/audio_asr.py (2)
452-452: ⚡ Quick winAdd
strict=Truetozip()for safety.Per Python best practices (and ruff B905), using
strict=Trueensures that mismatched list lengths raise an error rather than silently truncating.Suggested fix
- for sample_id, prediction, reference in zip(sample_ids, predictions, references): + for sample_id, prediction, reference in zip(sample_ids, predictions, references, strict=True):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/evaluate/audio_asr.py` at line 452, The for-loop that iterates with zip(sample_ids, predictions, references) should use zip(..., strict=True) to make length mismatches raise an error; update the loop in audio_asr.py (the for sample_id, prediction, reference in zip(...) line) to for sample_id, prediction, reference in zip(sample_ids, predictions, references, strict=True) so any differing lengths of sample_ids, predictions, or references fail fast (ensure the project runs on Python 3.10+ or add an explicit length check if older Python support is required).
305-305: ⚡ Quick winRename
inputto avoid shadowing the Python builtin.The variable name
inputshadows Python's built-in function, which can cause confusion and potential bugs if the builtin is needed later.Suggested fix
- input = {"prompt": prompt} + request_input = {"prompt": prompt} multi_modal_data, _ = mm_data_tracker.retrieve_all_sync() if multi_modal_data: - input["multi_modal_data"] = multi_modal_data - return input + request_input["multi_modal_data"] = multi_modal_data + return request_input🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/evaluate/audio_asr.py` at line 305, The variable named input shadows Python's builtin; rename it (for example to input_payload or prompt_input) wherever it's defined and used in the audio ASR evaluation code (the assignment currently written as input = {"prompt": prompt}) and update all subsequent references in the same scope to the new name so the builtin is no longer shadowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py`:
- Around line 584-625: Add the new integration test test_voxpopuli_asr into the
llm_function_core.txt QA list so it runs as part of the same E2E accuracy suite:
open llm_function_core.txt and add an entry for the test using the same format
as other entries from test_llm_api_pytorch_multimodal.py (e.g.
tests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py::test_voxpopuli_asr);
keep the `@pytest.mark.skip_less_device_memory`(80000) decorator as-is so the test
is skipped on insufficient hardware.
---
Nitpick comments:
In `@tensorrt_llm/evaluate/audio_asr.py`:
- Line 452: The for-loop that iterates with zip(sample_ids, predictions,
references) should use zip(..., strict=True) to make length mismatches raise an
error; update the loop in audio_asr.py (the for sample_id, prediction, reference
in zip(...) line) to for sample_id, prediction, reference in zip(sample_ids,
predictions, references, strict=True) so any differing lengths of sample_ids,
predictions, or references fail fast (ensure the project runs on Python 3.10+ or
add an explicit length check if older Python support is required).
- Line 305: The variable named input shadows Python's builtin; rename it (for
example to input_payload or prompt_input) wherever it's defined and used in the
audio ASR evaluation code (the assignment currently written as input =
{"prompt": prompt}) and update all subsequent references in the same scope to
the new name so the builtin is no longer shadowed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a1c3f0a7-c55a-4ee5-b134-0f004d04a916
📒 Files selected for processing (5)
tensorrt_llm/evaluate/__init__.pytensorrt_llm/evaluate/audio_asr.pytests/integration/defs/accuracy/accuracy_core.pytests/integration/defs/accuracy/references/voxpopuli.yamltests/integration/defs/accuracy/test_llm_api_pytorch_multimodal.py
|
PR_Github #47226 [ run ] triggered by Bot. Commit: |
This PR adds a new ASR evaluation harness, which is used with the Vox Populi dataset's English test split for E2E testing against Nano v3 Omni. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
66dd700 to
4a9dac6
Compare
|
/bot run |
|
PR_Github #47242 [ run ] triggered by Bot. Commit: |
|
PR_Github #47242 [ run ] completed with state
|
|
/bot run |
|
PR_Github #47305 [ run ] triggered by Bot. Commit: |
|
PR_Github #47305 [ run ] completed with state |
This PR adds a new ASR evaluation harness, which is used with the Vox Populi dataset's English test split for E2E testing against Nano v3 Omni. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Summary by CodeRabbit
New Features
Description
This PR adds a new ASR evaluation harness, which is used
with the Vox Populi dataset's English test split for E2E testing
against Nano v3 Omni.
Test Coverage
See description.
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.