[None][test] Remove duplicate test cases in llm_perf_core file#14749
Conversation
…ean up waives.txt Removed outdated model paths and unnecessary entries from MODEL_PATH_DICT in test_perf.py. Updated waives.txt to reflect the removal of tests that are no longer applicable, improving clarity and maintainability. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
These 7 waivers referenced perf tests (bart_large_cnn, bert_large, flan_t5_base/large/xl/xxl, mbart_large_50_many_to_one_mmt) that no longer appear in any test-db yaml on main. Drop them to keep the cleanup consistent with the 5 sibling waivers (roberta_base, t5_*) that were already removed in this PR. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Drop the 4 perf waivers that the PR originally added — author confirmed the underlying nvbugs (5150255 / 5304388 / 6130334) are no longer necessary to waive. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Included additional models "nemotron_nano_12b_v2", "phi_4_multimodal_instruct", "phi_4_multimodal_instruct_fp4", and "phi_4_multimodal_instruct_fp8" to the TRUST_REMOTE_CODE_MODELS dictionary to enhance testing coverage. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
…dels in pytorch_model_config.py and update test_perf.py to include new spec-decoding models. Added configurations for streaming and throughput variants, ensuring better performance tuning. Adjusted test conditions in llm_perf_core.yml to reflect new model tests and conditions for GPU capabilities. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
📝 WalkthroughWalkthroughThe PR updates performance testing infrastructure for the TensorRT-LLM framework by splitting nemotron model configurations into streaming and throughput variants, updating the perf test harness to handle spec-decoding models correctly, and expanding the performance test matrix with new models and GPU tier constraints across multiple hardware configurations. ChangesPerformance Configuration and Testing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/defs/perf/pytorch_model_config.py`:
- Around line 517-568: The throughput entries' pattern strings
('nemotron_3_super_120b_nvfp4-' and 'nemotron_3_super_120b_nvfp4_mtp') are too
broad and accidentally match streaming/low-latency labels, causing their
'config' (e.g., enable_attention_dp, cuda_graph_config.max_batch_size) to
override streaming variants; fix by making the patterns non-overlapping (for
example rename to a distinct suffix like
'nemotron_3_super_120b_nvfp4_throughput' and
'nemotron_3_super_120b_nvfp4_mtp_throughput' or use more specific anchors) so
the throughput entries in the 'patterns' lists no longer match streaming serve
labels and won't overwrite the streaming configs.
In `@tests/integration/test_lists/qa/llm_perf_core.yml`:
- Line 414: The list entry
"perf/test_perf.py::test_perf[deepseek_r1_0528_fp4-bench-pytorch-streaming-float4-maxbs:512-maxnt:5220-input_output_len:4000,2000-reqs:512-ep:8-tp:8-gpus:8]`#max_throughput`"
is malformed because the trailing "`#max_throughput`" is being treated as part of
the scalar; fix it by separating the comment from the scalar (e.g., add a space
before the #) or by quoting the entire test id string so the "#" is preserved
correctly as a comment marker or literal, updating the entry in
tests/integration/test_lists/qa/llm_perf_core.yml where that test id appears.
- Around line 251-254: The QA perf entries for
nemotron_3_super_120b_nvfp4-serve-pytorch-float4 (and the other missing QA cases
qwen3.5_9b, qwen3.5_27b, qwen3.5_122b_a10b,
deepseek_r1_0528_fp4-bench-pytorch-streaming-float4) that appear in
tests/integration/test_lists/qa/llm_perf_core.yml are not present in the
authoritative CI test-db files under
tests/integration/test_lists/test-db/l0_perf*.yml; add equivalent entries to
those l0_perf*.yml files so the CI DB includes the perf cases referenced (e.g.,
perf/test_perf.py::test_perf[nemotron_3_super_120b_nvfp4-serve-pytorch-float4-...],
perf/test_perf.py::test_perf[qwen3.5_9b-...],
perf/test_perf.py::test_perf[qwen3.5_27b-...],
perf/test_perf.py::test_perf[qwen3.5_122b_a10b-...], and
perf/test_perf.py::test_perf[deepseek_r1_0528_fp4-bench-pytorch-streaming-float4-...])
ensuring the exact test identifiers, markers (min_latency / max_throughput) and
parameter strings are copied so CI will discover and run the same cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: da3dc35b-eff7-4168-87e5-b47b897eff22
📒 Files selected for processing (3)
tests/integration/defs/perf/pytorch_model_config.pytests/integration/defs/perf/test_perf.pytests/integration/test_lists/qa/llm_perf_core.yml
|
Pushed /bot run |
Address two CodeRabbit-reported issues: 1. pytorch_model_config.py: throughput patterns 'nemotron_3_super_120b_nvfp4-' and 'nemotron_3_super_120b_nvfp4_mtp' were prefix-substrings of the streaming patterns 'nemotron_3_super_120b_nvfp4-serve-pytorch-streaming-' and '_mtp-serve-pytorch-streaming-'. Because the loop only `break`s the inner for-loop, every label was matched by the throughput entry too and recursive_update silently overwrote the streaming config (enable_attention_dp, cuda_graph_config.max_batch_size, ...). Narrow the throughput patterns to '-bench-pytorch-' and '-serve-pytorch-float' so they no longer match '-serve-pytorch-streaming-' labels. 2. llm_perf_core.yml line 414: '...gpus:8]#max_throughput' lacks the space before '#', so YAML parses the trailing '#max_throughput' as part of the test id rather than as a comment, leaving pytest unable to find the test. Add the missing space to match every other '#max_throughput' / '#min_latency' annotation in the file. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
…nvfp4 There are no '-bench-pytorch-' labels for nemotron_3_super_120b_nvfp4 or its _mtp variant in any test list yaml, so the bench-pytorch patterns added in the previous commit were dead. Keep only the '-serve-pytorch-float' pattern, which is the one actually exercised. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
…noise CodeRabbit kept suggesting that new entries in tests/integration/test_lists/qa/ should be mirrored to tests/integration/test_lists/test-db/, conflating two independent test pipelines: - qa/ -> manually-triggered QA perf/regression lists - test-db/ -> auto-run CI test-db (per-GPU l0_*.yml tiers) Adding/removing entries in one does not require touching the other. Add explicit path_instructions for both directories so future PRs don't get the same out-of-scope cross-sync suggestion. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
|
/bot |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot skip --comment "only test cases modify" |
|
PR_Github #51535 [ skip ] triggered by Bot. Commit: |
|
PR_Github #51535 [ skip ] completed with state |
Co-authored-by: tburt-nv <195370667+tburt-nv@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
|
/bot skip --comment "only test cases modify" |
|
PR_Github #51761 [ skip ] triggered by Bot. Commit: |
|
PR_Github #51761 [ skip ] completed with state |
Signed-off-by: yufeiwu-nv 230315618+yufeiwu-nv@users.noreply.github.com
Summary by CodeRabbit
Description
Also add nemotron_3_super_120b_nvfp4 serve test cases
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.