[TRTLLM-12365][ci] Dedup AutoDeploy unit tests on B200 by QiJune · Pull Request #13593 · NVIDIA/TensorRT-LLM

QiJune · 2026-04-29T04:07:44Z

AutoDeploy unit tests under tests/unittest/auto_deploy/{singlegpu,multigpu}/ were scheduled as full directory entries on B-tier pre-merge stages. Most tests are either HW-agnostic (graph transforms, model patches, RoPE/RMSNorm fusion, KV cache) or pure FP8 (Hopper covers natively). Only the fp4_compatible() / @skip_pre_blackwell / TRTLLM-Gen tests need B-tier.

Replace the directory entries with explicit file lists / parametrize filters keeping only NVFP4 / FP4 / TRTLLM-Gen tests. Hopper (l0_h100.yml, l0_dgx_h100.yml) and A30 (l0_a30.yml) still run the full directories, so no coverage gap.

Per-PR B-tier savings (past-7d OS averages):

Entry	Before	After	Saved
`singlegpu/smoke`	12.1	1.3	10.8
`singlegpu/models`	2.3	0	2.3
`singlegpu/transformations`	3.0	2.4	0.6
`singlegpu/custom_ops`	6.1	1.2	4.9
`multigpu/transformations`	21.9	1.0	20.9
Total			~40 min/PR

The multigpu/transformations line is particularly valuable given recent GB200 queue contention.

Summary by CodeRabbit

Tests
- Updated automated test configurations to refine test selection for deployment pipelines, focusing on specific test scenarios including quantization and multi-GPU transformation testing.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

coderabbitai · 2026-04-29T04:11:04Z

📝 Walkthrough

Walkthrough

The changes modify two autodeploy YAML test configuration files, narrowing broad directory-level test suite selections to explicitly listed test modules and specific test cases, removing redundant coverage.

Changes

Cohort / File(s)	Summary
Test Configuration Files `tests/integration/test_lists/test-db/l0_b200.yml`, `tests/integration/test_lists/test-db/l0_dgx_b200.yml`	Narrowed test selections from broad directory-level suites to curated explicit test modules. Changes include replacing general suite references with specific test cases (`custom_ops/moe`, `custom_ops/quantization`, `test_moe_tp_shard_nvfp4`) and transformations library variants, with inline documentation noting removed tests are hardware-agnostic or already covered elsewhere.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: deduplicating AutoDeploy unit tests on B200 hardware by removing redundant test entries.
Description check	✅ Passed	The PR description comprehensively explains the issue, solution, and expected benefits with specific performance metrics, but lacks explicit Test Coverage and PR Checklist sections from the template.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/test_lists/test-db/l0_dgx_b200.yml`:
- Around line 325-329: The YAML currently lists only test_moe_tp_shard_nvfp4
which drops BF16 coverage; update the test-db entry to include both
test_moe_tp_shard_nvfp4 and test_moe_tp_shard_bf16 (both tests in
test_tp_sharding.py under the fp4_compatible() gate) so BF16 MoE TP sharding
remains on B200, and remove any unrelated QA-list edits in
tests/integration/test_lists/qa/ since this PR only retunes unittest scheduling.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c17e08f0-cb54-4959-a62a-1b94a3a50390

📥 Commits

Reviewing files that changed from the base of the PR and between 0c86dda and 6c67e30.

📒 Files selected for processing (2)

tests/integration/test_lists/test-db/l0_b200.yml
tests/integration/test_lists/test-db/l0_dgx_b200.yml

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

lucaslie

Thanks for tackling the B-tier dedup — the savings (~40 min/PR, especially the multigpu line) are very welcome.

I cross-checked the explicit list against every fp4_compatible() / @skip_pre_blackwell / is_sm_100f() gated test in tests/unittest/auto_deploy/, and every strictly-gated NVFP4 / FP4 / TRTLLM-Gen test is preserved by the files / parametrize filter you kept. That part of the dedup looks correct to me.

A few specific places where I think we may lose Blackwell-only signal — left as inline comments. Net summary:

(blocking-ish) test_triton_paged_attention.py::TestSDPADispatch::test_large_head_dim_forces_sdpa is explicitly a regression test for a Blackwell-only Triton tl.dot shared-memory bug — Hopper coverage doesn't substitute. (See inline comment on l0_b200.yml:323.)
(non-blocking) Reducing test_ad_build_small_single.py to just the FP8 variant gives up the broadest smoke-level signal we have on B200. The same file's source explicitly notes a past trtllm attention failure on B200, so smoke runs there have caught Blackwell-only kernel issues before. (See inline on l0_b200.yml:324.)
(low priority / FYI) Dropping all of multigpu/transformations/library/ except test_moe_tp_shard_nvfp4 also drops test_ad_allreduce_strategies.py and test_allreduce_residual_rmsnorm_fusion.py. The kernels themselves are exercised by multigpu/custom_ops (kept), but the strategy-selection / NVLS path is not. (See inline on l0_dgx_b200.yml:328.)

Also flagging that the CodeRabbit suggestion to add test_moe_tp_shard_bf16 is incorrect — that test has no fp4_compatible() gate, it's HW-agnostic, and Hopper covers it. You're right to leave it off.

- Add TestSDPADispatch on B-tier (contains test_large_head_dim_forces_sdpa, a regression test for Blackwell tl.dot misaligned shared memory accesses that does not reproduce on Hopper). - Broaden the smoke filter to also include Nemotron-Nano-3-30B-A3.5B-dev and Llama-4-Scout so the (attention x compile-backend) combinatorial smoke coverage on B-tier is preserved. - Add test_allreduce_residual_rmsnorm_fusion -k "strategy_auto" on l0_dgx_b200 to cover AUTO -> NVLS dispatch on B-tier, which is structurally different from the H100 path. Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

suyoggupta · 2026-04-30T15:17:13Z

@QiJune : moving forward, is the expectation to explicitly add only the blackwell-specific tests to B200 L0? (NVFP4 / FP4 / TRTLLM-Gen)
cc: @galagam , @NVIDIA/trt-llm-torch-autodeploy-devs

lucaslie

thank you!

QiJune · 2026-05-01T02:13:22Z

/bot skip --comment "trivial changes"

tensorrt-cicd · 2026-05-01T02:19:41Z

PR_Github #46461 [ skip ] triggered by Bot. Commit: 5fd43cb Link to invocation

QiJune · 2026-05-01T02:21:11Z

@QiJune : moving forward, is the expectation to explicitly add only the blackwell-specific tests to B200 L0? (NVFP4 / FP4 / TRTLLM-Gen) cc: @galagam , @NVIDIA/trt-llm-torch-autodeploy-devs

Hi @suyoggupta, the short answer is yes!

A couple of guiding principles we're following:

All HW-agnostic test cases should only have one copy. Because B200 currently has tighter resource constraints than H100, we are offloading these general tests from B200.
If H100 and B200 use different FP8 kernel implementations for the same function, please test the kernel directly on B200, but avoid adding full E2E tests for this scenario.

For visibility, we are also actively working on deduplicating test cases(#13584, #13366), and will do more of this in the future.

tensorrt-cicd · 2026-05-01T02:26:59Z

PR_Github #46461 [ skip ] completed with state SUCCESS. Commit: 5fd43cb
Skipping testing for commit 5fd43cb

Link to invocation

Dedup AutoDeploy unit tests on B200

6c67e30

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

github-actions Bot assigned QiJune Apr 29, 2026

QiJune requested review from YihuiLu512, lucaslie and sunnyqgg April 29, 2026 04:08

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread tests/integration/test_lists/test-db/l0_dgx_b200.yml Outdated

QiJune added 2 commits April 29, 2026 12:18

update

7c822c3

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

update

3121617

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

lucaslie reviewed Apr 30, 2026

View reviewed changes

Comment thread tests/integration/test_lists/test-db/l0_b200.yml

Comment thread tests/integration/test_lists/test-db/l0_b200.yml Outdated

Comment thread tests/integration/test_lists/test-db/l0_dgx_b200.yml

suyoggupta requested a review from hnover-nv April 30, 2026 15:12

lucaslie approved these changes Apr 30, 2026

View reviewed changes

QiJune enabled auto-merge (squash) May 1, 2026 02:13

QiJune merged commit cf190ec into NVIDIA:main May 1, 2026
7 checks passed

Conversation

QiJune commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Apr 29, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucaslie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

suyoggupta commented Apr 30, 2026

Uh oh!

lucaslie left a comment

Choose a reason for hiding this comment

Uh oh!

QiJune commented May 1, 2026

Uh oh!

tensorrt-cicd commented May 1, 2026

Uh oh!

QiJune commented May 1, 2026

Uh oh!

tensorrt-cicd commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

QiJune commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading