Skip to content

[#10607][fix] moved AD perf regression tests to AD jobs pre and post merge#12461

Merged
MrGeva merged 3 commits intoNVIDIA:mainfrom
nv-auto-deploy:eg/perfci
Mar 30, 2026
Merged

[#10607][fix] moved AD perf regression tests to AD jobs pre and post merge#12461
MrGeva merged 3 commits intoNVIDIA:mainfrom
nv-auto-deploy:eg/perfci

Conversation

@MrGeva
Copy link
Copy Markdown
Collaborator

@MrGeva MrGeva commented Mar 23, 2026

The AD perf regression test was in a PT BE job so did not get trigger for AD PRs. this PR moves the test to AD's pre and post merge jobs.

Summary by CodeRabbit

  • Tests
    • Expanded performance testing coverage for B200 GPUs with new test entries for single-GPU and multi-GPU configurations.
    • Added performance sanity tests for AutoDeploy backend across different system configurations and GPU counts.
    • Reorganized test execution stages for improved testing pipeline efficiency.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

These changes expand perf sanity test coverage for Blackwell GPUs by introducing new test entries across multiple test stages and configurations. A new single-GPU server configuration is added to support autodeploy backend testing, while test lists are updated to include corresponding post-merge test cases and GPU-specific conditions.

Changes

Cohort / File(s) Summary
Test List Configuration
tests/integration/test_lists/test-db/l0_b200.yml, tests/integration/test_lists/test-db/l0_dgx_b200.yml, tests/integration/test_lists/test-db/l0_gb200_multi_gpus_perf_sanity.yml
Added AutoDeploy Perf Sanity test entries to pre-merge and post-merge stages with B200/GB200 GPU filters; removed equivalent entry from multi-GPU test list to consolidate coverage.
Performance Configuration
tests/scripts/perf-sanity/aggregated/super_ad_blackwell.yaml
Extended GPU support from GB200 to include B200; introduced new single-GPU server configuration (super_ad_ws1_1k1k) for super_nvfp4 model with autodeploy backend and specified client workload parameters.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive PR description is minimal and lacks required sections. Only mentions what was moved but omits structured details about the change rationale, affected test cases, and verification steps. Fill in the Description section explaining why tests were moved and the Impact section detailing affected test cases. Also complete the Test Coverage section with specific test identifiers being moved.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The PR title clearly indicates the main objective: moving AD (AutoDeploy) perf regression tests to AD jobs with pre and post merge stages, which directly matches the changeset modifications across test configuration files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@MrGeva MrGeva changed the title [None][fix] moved AD perf regression tests to AD jobs [None][fix] moved AD perf regression tests to AD jobs pre and post merge Mar 23, 2026
@MrGeva MrGeva changed the title [None][fix] moved AD perf regression tests to AD jobs pre and post merge [#10607][fix] moved AD perf regression tests to AD jobs pre and post merge Mar 23, 2026
@MrGeva
Copy link
Copy Markdown
Collaborator Author

MrGeva commented Mar 24, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40090 [ run ] triggered by Bot. Commit: 5dca209 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40090 [ run ] completed with state SUCCESS. Commit: 5dca209
/LLM/main/L0_MergeRequest_PR pipeline #31242 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@MrGeva
Copy link
Copy Markdown
Collaborator Author

MrGeva commented Mar 24, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
@MrGeva
Copy link
Copy Markdown
Collaborator Author

MrGeva commented Mar 25, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@MrGeva MrGeva requested a review from galagam March 25, 2026 09:32
@MrGeva MrGeva enabled auto-merge (squash) March 25, 2026 09:35
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40312 [ run ] triggered by Bot. Commit: 60e3d6a Link to invocation

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40312 [ run ] completed with state SUCCESS. Commit: 60e3d6a
/LLM/main/L0_MergeRequest_PR pipeline #31423 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@MrGeva
Copy link
Copy Markdown
Collaborator Author

MrGeva commented Mar 30, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40700 [ run ] triggered by Bot. Commit: d528737 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40700 [ run ] completed with state SUCCESS. Commit: d528737
/LLM/main/L0_MergeRequest_PR pipeline #31729 completed with status: 'SUCCESS'

CI Report

Link to invocation

@MrGeva MrGeva merged commit d1a4af8 into NVIDIA:main Mar 30, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants