[None][test] Remove duplicate test cases in llm_perf_core file by yufeiwu-nv · Pull Request #14749 · NVIDIA/TensorRT-LLM

yufeiwu-nv · 2026-05-29T13:16:40Z

Signed-off-by: yufeiwu-nv 230315618+yufeiwu-nv@users.noreply.github.com

Summary by CodeRabbit

Tests
- Expanded performance test coverage with new model configurations for streaming and throughput scenarios.
- Enhanced speculative decoding support in performance benchmarking.
- Added new model variants to performance test matrix across multiple GPU tiers.
- Improved remote code trust handling for newly supported models.

Description

Also add nemotron_3_super_120b_nvfp4 serve test cases

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…ean up waives.txt Removed outdated model paths and unnecessary entries from MODEL_PATH_DICT in test_perf.py. Updated waives.txt to reflect the removal of tests that are no longer applicable, improving clarity and maintainability. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

These 7 waivers referenced perf tests (bart_large_cnn, bert_large, flan_t5_base/large/xl/xxl, mbart_large_50_many_to_one_mmt) that no longer appear in any test-db yaml on main. Drop them to keep the cleanup consistent with the 5 sibling waivers (roberta_base, t5_*) that were already removed in this PR. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

Drop the 4 perf waivers that the PR originally added — author confirmed the underlying nvbugs (5150255 / 5304388 / 6130334) are no longer necessary to waive. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

Included additional models "nemotron_nano_12b_v2", "phi_4_multimodal_instruct", "phi_4_multimodal_instruct_fp4", and "phi_4_multimodal_instruct_fp8" to the TRUST_REMOTE_CODE_MODELS dictionary to enhance testing coverage. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

…dels in pytorch_model_config.py and update test_perf.py to include new spec-decoding models. Added configurations for streaming and throughput variants, ensuring better performance tuning. Adjusted test conditions in llm_perf_core.yml to reflect new model tests and conditions for GPU capabilities. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

coderabbitai · 2026-05-29T13:23:03Z

📝 Walkthrough

Walkthrough

The PR updates performance testing infrastructure for the TensorRT-LLM framework by splitting nemotron model configurations into streaming and throughput variants, updating the perf test harness to handle spec-decoding models correctly, and expanding the performance test matrix with new models and GPU tier constraints across multiple hardware configurations.

Changes

Performance Configuration and Testing

Layer / File(s)	Summary
Model streaming and throughput configuration split `tests/integration/defs/perf/pytorch_model_config.py`	Splits `nemotron_3_super_120b_nvfp4` and `nemotron_3_super_120b_nvfp4_mtp` into streaming/serve patterns with `enable_attention_dp=False` and smaller batch size (8), and throughput patterns with `enable_attention_dp=True` and larger batch size (256), with MTP variant supporting speculative decoding in throughput mode.
Serve client and model trust configuration `tests/integration/defs/perf/test_perf.py`	Adds `nemotron_nano_12b_v2`, `phi_4_multimodal_instruct` and variants to `TRUST_REMOTE_CODE_MODELS`, introduces `SPEC_DEC_MODELS` constant aggregating spec-decoding models, and updates serve-client command construction to conditionally append `--ignore-eos` only for non-spec-decoding models.
Performance test matrix GPU tier updates `tests/integration/test_lists/qa/llm_perf_core.yml`	Reorganizes the LLM performance test matrix with updated GPU tier headers, adjusted compute capability constraints, expanded model coverage (qwen3.5 variants, llama_v3.3 configs, deepseek_r1), increased system GPU count requirements for RTX-6000D/Server tier, and new nemotron_3_super_120b_nvfp4 serve-based test entries across multiple GPU tiers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#14570: Both PRs modify tests/integration/defs/perf/test_perf.py to expand TRUST_REMOTE_CODE_MODELS for nemotron_nano_12b_v2 and phi_4_multimodal_instruct (FP4/FP8), aligning the perf harness' trust_remote behavior for these new models.

Suggested reviewers

StanleySun639
LarryXFly
ruodil
niukuo

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title describes removing duplicate test cases in llm_perf_core, but the raw summary shows the changes actually add new test cases, update configurations, and reorganize GPU test groupings rather than simply removing duplicates.	Update the PR title to accurately reflect that the changes reorganize GPU test groupings and add new model/perf test cases (like qwen3.5 variants and nemotron_3_super_120b_nvfp4-serve) in addition to removing old ones.
Description check	⚠️ Warning	The PR description is mostly empty placeholders from the template. The author checked the final PR Checklist box but provided no actual description content, test coverage explanation, or narrative justification for the changes.	Fill in the Description section explaining what duplicate test cases were removed and why, and provide specific Test Coverage details for the changes made to pytorch_model_config.py, test_perf.py, and llm_perf_core.yml.
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/defs/perf/pytorch_model_config.py`:
- Around line 517-568: The throughput entries' pattern strings
('nemotron_3_super_120b_nvfp4-' and 'nemotron_3_super_120b_nvfp4_mtp') are too
broad and accidentally match streaming/low-latency labels, causing their
'config' (e.g., enable_attention_dp, cuda_graph_config.max_batch_size) to
override streaming variants; fix by making the patterns non-overlapping (for
example rename to a distinct suffix like
'nemotron_3_super_120b_nvfp4_throughput' and
'nemotron_3_super_120b_nvfp4_mtp_throughput' or use more specific anchors) so
the throughput entries in the 'patterns' lists no longer match streaming serve
labels and won't overwrite the streaming configs.

In `@tests/integration/test_lists/qa/llm_perf_core.yml`:
- Line 414: The list entry
"perf/test_perf.py::test_perf[deepseek_r1_0528_fp4-bench-pytorch-streaming-float4-maxbs:512-maxnt:5220-input_output_len:4000,2000-reqs:512-ep:8-tp:8-gpus:8]`#max_throughput`"
is malformed because the trailing "`#max_throughput`" is being treated as part of
the scalar; fix it by separating the comment from the scalar (e.g., add a space
before the #) or by quoting the entire test id string so the "#" is preserved
correctly as a comment marker or literal, updating the entry in
tests/integration/test_lists/qa/llm_perf_core.yml where that test id appears.
- Around line 251-254: The QA perf entries for
nemotron_3_super_120b_nvfp4-serve-pytorch-float4 (and the other missing QA cases
qwen3.5_9b, qwen3.5_27b, qwen3.5_122b_a10b,
deepseek_r1_0528_fp4-bench-pytorch-streaming-float4) that appear in
tests/integration/test_lists/qa/llm_perf_core.yml are not present in the
authoritative CI test-db files under
tests/integration/test_lists/test-db/l0_perf*.yml; add equivalent entries to
those l0_perf*.yml files so the CI DB includes the perf cases referenced (e.g.,
perf/test_perf.py::test_perf[nemotron_3_super_120b_nvfp4-serve-pytorch-float4-...],
perf/test_perf.py::test_perf[qwen3.5_9b-...],
perf/test_perf.py::test_perf[qwen3.5_27b-...],
perf/test_perf.py::test_perf[qwen3.5_122b_a10b-...], and
perf/test_perf.py::test_perf[deepseek_r1_0528_fp4-bench-pytorch-streaming-float4-...])
ensuring the exact test identifiers, markers (min_latency / max_throughput) and
parameter strings are copied so CI will discover and run the same cases.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: da3dc35b-eff7-4168-87e5-b47b897eff22

📥 Commits

Reviewing files that changed from the base of the PR and between c7683f2 and e944483.

📒 Files selected for processing (3)

tests/integration/defs/perf/pytorch_model_config.py
tests/integration/defs/perf/test_perf.py
tests/integration/test_lists/qa/llm_perf_core.yml

yufeiwu-nv · 2026-06-02T06:03:20Z

Pushed af4dd69 fixing the two valid CodeRabbit issues (nemotron pattern overlap + missing space before #max_throughput). The third thread (CI test-db vs QA list) is out of scope and was resolved with explanation.

/bot run

Address two CodeRabbit-reported issues: 1. pytorch_model_config.py: throughput patterns 'nemotron_3_super_120b_nvfp4-' and 'nemotron_3_super_120b_nvfp4_mtp' were prefix-substrings of the streaming patterns 'nemotron_3_super_120b_nvfp4-serve-pytorch-streaming-' and '_mtp-serve-pytorch-streaming-'. Because the loop only `break`s the inner for-loop, every label was matched by the throughput entry too and recursive_update silently overwrote the streaming config (enable_attention_dp, cuda_graph_config.max_batch_size, ...). Narrow the throughput patterns to '-bench-pytorch-' and '-serve-pytorch-float' so they no longer match '-serve-pytorch-streaming-' labels. 2. llm_perf_core.yml line 414: '...gpus:8]#max_throughput' lacks the space before '#', so YAML parses the trailing '#max_throughput' as part of the test id rather than as a comment, leaving pytest unable to find the test. Add the missing space to match every other '#max_throughput' / '#min_latency' annotation in the file. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

…nvfp4 There are no '-bench-pytorch-' labels for nemotron_3_super_120b_nvfp4 or its _mtp variant in any test list yaml, so the bench-pytorch patterns added in the previous commit were dead. Keep only the '-serve-pytorch-float' pattern, which is the one actually exercised. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

…noise CodeRabbit kept suggesting that new entries in tests/integration/test_lists/qa/ should be mirrored to tests/integration/test_lists/test-db/, conflating two independent test pipelines: - qa/ -> manually-triggered QA perf/regression lists - test-db/ -> auto-run CI test-db (per-GPU l0_*.yml tiers) Adding/removing entries in one does not require touching the other. Add explicit path_instructions for both directories so future PRs don't get the same out-of-scope cross-sync suggestion. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

yufeiwu-nv · 2026-06-02T06:33:03Z

/bot

github-actions · 2026-06-02T06:33:11Z

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

yufeiwu-nv · 2026-06-02T06:33:39Z

/bot skip --comment "only test cases modify"

tensorrt-cicd · 2026-06-02T06:39:58Z

PR_Github #51535 [ skip ] triggered by Bot. Commit: 97de773 Link to invocation

tensorrt-cicd · 2026-06-02T06:53:31Z

PR_Github #51535 [ skip ] completed with state SUCCESS. Commit: 97de773
Skipping testing for commit 97de773

Link to invocation

Co-authored-by: tburt-nv <195370667+tburt-nv@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

yufeiwu-nv · 2026-06-03T05:29:48Z

/bot skip --comment "only test cases modify"

tensorrt-cicd · 2026-06-03T05:36:40Z

PR_Github #51761 [ skip ] triggered by Bot. Commit: dc708b7 Link to invocation

tensorrt-cicd · 2026-06-03T05:48:25Z

PR_Github #51761 [ skip ] completed with state SUCCESS. Commit: dc708b7
Skipping testing for commit dc708b7

Link to invocation

yufeiwu-nv added 8 commits May 26, 2026 07:47

Refactor model_path to ensure QA and CI have the same testing models

fea317c

Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

test: remove 4 perf waivers per author confirmation

2109fc8

Drop the 4 perf waivers that the PR originally added — author confirmed the underlying nvbugs (5150255 / 5304388 / 6130334) are no longer necessary to waive. Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

Merge branch 'main' into model_dict

f3e7287

Merge branch 'main' into model_dict

8fe688b

Merge branch 'main' into model_dict

1b7b574

yufeiwu-nv requested review from a team as code owners May 29, 2026 13:16

github-actions Bot assigned yufeiwu-nv May 29, 2026

yufeiwu-nv added 2 commits May 29, 2026 13:18

Merge branch 'main' into model_dict

f3cca5e

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread tests/integration/defs/perf/pytorch_model_config.py

Comment thread tests/integration/test_lists/qa/llm_perf_core.yml

Comment thread tests/integration/test_lists/qa/llm_perf_core.yml Outdated

StanleySun639 approved these changes Jun 1, 2026

View reviewed changes

yufeiwu-nv added 2 commits June 2, 2026 13:42

Merge branch 'main' into model_dict

e74dc6b

Merge branch 'main' into model_dict

d32b70d

yufeiwu-nv added 2 commits June 2, 2026 06:05

yufeiwu-nv requested review from a team as code owners June 2, 2026 06:28

yufeiwu-nv requested review from mzweilz and niukuo June 2, 2026 06:28

yufeiwu-nv added 2 commits June 2, 2026 06:30

Merge branch 'main' into model_dict

1495761

Merge branch 'main' into model_dict

97de773

Merge branch 'main' into model_dict

4719d67

tburt-nv approved these changes Jun 2, 2026

View reviewed changes

Comment thread .coderabbit.yaml Outdated

Comment thread .coderabbit.yaml

yufeiwu-nv and others added 2 commits June 3, 2026 13:16

Merge branch 'main' into model_dict

959ce50

Update .coderabbit.yaml

dc708b7

Co-authored-by: tburt-nv <195370667+tburt-nv@users.noreply.github.com> Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>

yufeiwu-nv enabled auto-merge (squash) June 3, 2026 05:30

yufeiwu-nv merged commit 8edd72e into NVIDIA:main Jun 3, 2026
8 checks passed

This was referenced Jun 4, 2026

[None][test] Decrease P1 models number and merge sanity test list into core #14952

Merged

[None][test] remove outdated model in perf test #14992

Merged

Conversation

yufeiwu-nv commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 29, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (3 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yufeiwu-nv commented Jun 2, 2026

Uh oh!

yufeiwu-nv commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

yufeiwu-nv commented Jun 2, 2026

Uh oh!

tensorrt-cicd commented Jun 2, 2026

Uh oh!

tensorrt-cicd commented Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

yufeiwu-nv commented Jun 3, 2026

Uh oh!

tensorrt-cicd commented Jun 3, 2026

Uh oh!

tensorrt-cicd commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yufeiwu-nv commented May 29, 2026 •

edited

Loading