Skip to content

[None][test] Decrease P1 models number and merge sanity test list into core#14952

Merged
yufeiwu-nv merged 14 commits into
NVIDIA:mainfrom
yufeiwu-nv:bug
Jun 4, 2026
Merged

[None][test] Decrease P1 models number and merge sanity test list into core#14952
yufeiwu-nv merged 14 commits into
NVIDIA:mainfrom
yufeiwu-nv:bug

Conversation

@yufeiwu-nv
Copy link
Copy Markdown
Collaborator

@yufeiwu-nv yufeiwu-nv commented Jun 4, 2026

  • Added new performance tests for various models including qwen3 and llama_v3.1.
  • Removed the llm_perf_sanity.yml file as it is no longer needed.

Signed-off-by: yufeiwu-nv 230315618+yufeiwu-nv@users.noreply.github.com

Summary by CodeRabbit

  • Tests
    • Updated LLM performance testing configurations across multiple GPU compute conditions
    • Refined test coverage for model variants with different optimization formats
    • Optimized test matrix to prioritize critical performance benchmarks

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR updates the LLM performance test configurations by revising llm_perf_core.yml test entries across multiple GPU condition sections and removing llm_perf_sanity.yml entirely. Changes consolidate llama model coverage, add new model variants (gpt_oss_20b_fp4, nemotron_nano_12b_v2), and reduce timeout-prone deepseek test configurations.

Changes

Performance Test Configuration Updates

Layer / File(s) Summary
Common GPU baseline tests update
tests/integration/test_lists/qa/llm_perf_core.yml
Replaces the initial "All GPUs common tests" entries with revised FP8/BF16 configurations, removing prior qwen/llama variants and adding qwen3_0.6b, qwen3_4b_eagle3 streaming, and llama_v3.1_nemotron_nano_8b_fp8 entries.
Llama model consolidation and streamlining
tests/integration/test_lists/qa/llm_perf_core.yml
Removes llama_v3.1_8b test block from mid-range conditions, reduces llama_v3.3_70b_instruct FP8 streaming variants by removing many gpus:4 configurations and streamlining to smaller BF16/FP8 sets, and updates RTX-6000D condition entries with reduced BF16/FP4 variant set.
New model variants and additions
tests/integration/test_lists/qa/llm_perf_core.yml
Introduces new test entries for gpt_oss_20b_fp4 (float4 format) and adds BF16 nemotron_nano_12b_v2 variant with updated I/O length configurations.
Advanced GPU condition optimizations
tests/integration/test_lists/qa/llm_perf_core.yml
Updates GB200/B200/B300 condition group with streamlined FP8 streaming and FP4 entries, reduces kimi_k2_nvfp4 FP4 test coverage to fewer maxbs/config points, and removes multiple timeout-prone deepseek_v3.2_fp4 and deepseek_v3.2_fp8 input-output variants.
YAML formatting adjustment
tests/integration/test_lists/qa/llm_perf_core.yml
Introduces spacing adjustment within the test list.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • NVIDIA/TensorRT-LLM#14749: Both PRs adjust llm_perf_core.yml to add and configure Nemotron model test coverage (e.g., nemotron_nano_12b_v2) alongside model-config handling updates for those variants.

Suggested reviewers

  • niukuo
  • StanleySun639
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description provides basic context but is incomplete. It lacks detail on the specific changes, rationale, and test coverage information required by the template. Expand the description to include: what specific P1 models were reduced, why the sanity test list was merged into core, and detailed test coverage information for validation.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main changes: decreasing P1 models and merging sanity tests into core list.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

yufeiwu-nv and others added 3 commits June 4, 2026 09:34
…aced repo in quantization scripts

Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
…IA#14925)

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
@yufeiwu-nv yufeiwu-nv removed request for a team, mzweilz, niukuo and suyoggupta June 4, 2026 09:34
xinhe-nv and others added 10 commits June 4, 2026 09:34
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
…est_run_with_different_env (NVIDIA#14939)

Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
…NVIDIA#14900)

Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
…y.yml file

- Added new performance tests for various models including qwen3 and llama_v3.1.
- Removed the llm_perf_sanity.yml file as it is no longer needed.

Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
….5 8-GPU perf (NVIDIA#14613)

Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
…ter MLIR elementwise fusion (NVIDIA#14795)

Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
@yufeiwu-nv
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "only test list modify"

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/test_lists/qa/llm_perf_core.yml`:
- Line 28: The test matrix entry sets maxnt:2048 which is smaller than the
declared input_output_len (8000,1000) and will cause the workload to exceed the
token budget; update the perf/test_perf.py::test_perf[...] case so maxnt
(max_num_tokens) is increased to cover the larger sequence (e.g., >=8000 or
remove the explicit maxnt to rely on the default 8192) ensuring the token budget
matches input_output_len, and keep the test name/identifier intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0f13f275-b6d1-4113-aeb0-8579272e1f17

📥 Commits

Reviewing files that changed from the base of the PR and between 8c39de8 and c206c0c.

📒 Files selected for processing (2)
  • tests/integration/test_lists/qa/llm_perf_core.yml
  • tests/integration/test_lists/qa/llm_perf_sanity.yml
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/qa/llm_perf_sanity.yml

Comment thread tests/integration/test_lists/qa/llm_perf_core.yml
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52037 [ skip ] triggered by Bot. Commit: a6d5db1 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52037 [ skip ] completed with state SUCCESS. Commit: a6d5db1
Skipping testing for commit a6d5db1

Link to invocation

@yufeiwu-nv
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "only test list modify"

@yufeiwu-nv yufeiwu-nv enabled auto-merge (squash) June 4, 2026 10:05
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52044 [ skip ] triggered by Bot. Commit: e95537d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #52044 [ skip ] completed with state SUCCESS. Commit: e95537d
Skipping testing for commit e95537d

Link to invocation

@yufeiwu-nv yufeiwu-nv merged commit 941c778 into NVIDIA:main Jun 4, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants