Skip to content

[TRTLLM-12365][ci] Dedup AutoDeploy unit tests on B200#13593

Merged
QiJune merged 4 commits intoNVIDIA:mainfrom
QiJune:dedup_autodeploy
May 1, 2026
Merged

[TRTLLM-12365][ci] Dedup AutoDeploy unit tests on B200#13593
QiJune merged 4 commits intoNVIDIA:mainfrom
QiJune:dedup_autodeploy

Conversation

@QiJune
Copy link
Copy Markdown
Collaborator

@QiJune QiJune commented Apr 29, 2026

AutoDeploy unit tests under tests/unittest/auto_deploy/{singlegpu,multigpu}/ were scheduled as full directory entries on B-tier pre-merge stages. Most tests are either HW-agnostic (graph transforms, model patches, RoPE/RMSNorm fusion, KV cache) or pure FP8 (Hopper covers natively). Only the fp4_compatible() / @skip_pre_blackwell / TRTLLM-Gen tests need B-tier.

Replace the directory entries with explicit file lists / parametrize filters keeping only NVFP4 / FP4 / TRTLLM-Gen tests. Hopper (l0_h100.yml, l0_dgx_h100.yml) and A30 (l0_a30.yml) still run the full directories, so no coverage gap.

Per-PR B-tier savings (past-7d OS averages):

Entry Before After Saved
singlegpu/smoke 12.1 1.3 10.8
singlegpu/models 2.3 0 2.3
singlegpu/transformations 3.0 2.4 0.6
singlegpu/custom_ops 6.1 1.2 4.9
multigpu/transformations 21.9 1.0 20.9
Total ~40 min/PR

The multigpu/transformations line is particularly valuable given recent GB200 queue contention.

Summary by CodeRabbit

  • Tests
    • Updated automated test configurations to refine test selection for deployment pipelines, focusing on specific test scenarios including quantization and multi-GPU transformation testing.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

The changes modify two autodeploy YAML test configuration files, narrowing broad directory-level test suite selections to explicitly listed test modules and specific test cases, removing redundant coverage.

Changes

Cohort / File(s) Summary
Test Configuration Files
tests/integration/test_lists/test-db/l0_b200.yml, tests/integration/test_lists/test-db/l0_dgx_b200.yml
Narrowed test selections from broad directory-level suites to curated explicit test modules. Changes include replacing general suite references with specific test cases (custom_ops/moe, custom_ops/quantization, test_moe_tp_shard_nvfp4) and transformations library variants, with inline documentation noting removed tests are hardware-agnostic or already covered elsewhere.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: deduplicating AutoDeploy unit tests on B200 hardware by removing redundant test entries.
Description check ✅ Passed The PR description comprehensively explains the issue, solution, and expected benefits with specific performance metrics, but lacks explicit Test Coverage and PR Checklist sections from the template.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/test_lists/test-db/l0_dgx_b200.yml`:
- Around line 325-329: The YAML currently lists only test_moe_tp_shard_nvfp4
which drops BF16 coverage; update the test-db entry to include both
test_moe_tp_shard_nvfp4 and test_moe_tp_shard_bf16 (both tests in
test_tp_sharding.py under the fp4_compatible() gate) so BF16 MoE TP sharding
remains on B200, and remove any unrelated QA-list edits in
tests/integration/test_lists/qa/ since this PR only retunes unittest scheduling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c17e08f0-cb54-4959-a62a-1b94a3a50390

📥 Commits

Reviewing files that changed from the base of the PR and between 0c86dda and 6c67e30.

📒 Files selected for processing (2)
  • tests/integration/test_lists/test-db/l0_b200.yml
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml

Comment thread tests/integration/test_lists/test-db/l0_dgx_b200.yml Outdated
QiJune added 2 commits April 29, 2026 12:18
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Copy link
Copy Markdown
Member

@lucaslie lucaslie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling the B-tier dedup — the savings (~40 min/PR, especially the multigpu line) are very welcome.

I cross-checked the explicit list against every fp4_compatible() / @skip_pre_blackwell / is_sm_100f() gated test in tests/unittest/auto_deploy/, and every strictly-gated NVFP4 / FP4 / TRTLLM-Gen test is preserved by the files / parametrize filter you kept. That part of the dedup looks correct to me.

A few specific places where I think we may lose Blackwell-only signal — left as inline comments. Net summary:

  1. (blocking-ish) test_triton_paged_attention.py::TestSDPADispatch::test_large_head_dim_forces_sdpa is explicitly a regression test for a Blackwell-only Triton tl.dot shared-memory bug — Hopper coverage doesn't substitute. (See inline comment on l0_b200.yml:323.)
  2. (non-blocking) Reducing test_ad_build_small_single.py to just the FP8 variant gives up the broadest smoke-level signal we have on B200. The same file's source explicitly notes a past trtllm attention failure on B200, so smoke runs there have caught Blackwell-only kernel issues before. (See inline on l0_b200.yml:324.)
  3. (low priority / FYI) Dropping all of multigpu/transformations/library/ except test_moe_tp_shard_nvfp4 also drops test_ad_allreduce_strategies.py and test_allreduce_residual_rmsnorm_fusion.py. The kernels themselves are exercised by multigpu/custom_ops (kept), but the strategy-selection / NVLS path is not. (See inline on l0_dgx_b200.yml:328.)

Also flagging that the CodeRabbit suggestion to add test_moe_tp_shard_bf16 is incorrect — that test has no fp4_compatible() gate, it's HW-agnostic, and Hopper covers it. You're right to leave it off.

Comment thread tests/integration/test_lists/test-db/l0_b200.yml
Comment thread tests/integration/test_lists/test-db/l0_b200.yml Outdated
Comment thread tests/integration/test_lists/test-db/l0_dgx_b200.yml
- Add TestSDPADispatch on B-tier (contains test_large_head_dim_forces_sdpa,
  a regression test for Blackwell tl.dot misaligned shared memory accesses
  that does not reproduce on Hopper).
- Broaden the smoke filter to also include Nemotron-Nano-3-30B-A3.5B-dev
  and Llama-4-Scout so the (attention x compile-backend) combinatorial
  smoke coverage on B-tier is preserved.
- Add test_allreduce_residual_rmsnorm_fusion -k "strategy_auto" on
  l0_dgx_b200 to cover AUTO -> NVLS dispatch on B-tier, which is
  structurally different from the H100 path.

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
@suyoggupta suyoggupta requested a review from hnover-nv April 30, 2026 15:12
@suyoggupta
Copy link
Copy Markdown
Collaborator

@QiJune : moving forward, is the expectation to explicitly add only the blackwell-specific tests to B200 L0? (NVFP4 / FP4 / TRTLLM-Gen)
cc: @galagam , @NVIDIA/trt-llm-torch-autodeploy-devs

Copy link
Copy Markdown
Member

@lucaslie lucaslie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@QiJune
Copy link
Copy Markdown
Collaborator Author

QiJune commented May 1, 2026

/bot skip --comment "trivial changes"

@QiJune QiJune enabled auto-merge (squash) May 1, 2026 02:13
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46461 [ skip ] triggered by Bot. Commit: 5fd43cb Link to invocation

@QiJune
Copy link
Copy Markdown
Collaborator Author

QiJune commented May 1, 2026

@QiJune : moving forward, is the expectation to explicitly add only the blackwell-specific tests to B200 L0? (NVFP4 / FP4 / TRTLLM-Gen) cc: @galagam , @NVIDIA/trt-llm-torch-autodeploy-devs

Hi @suyoggupta, the short answer is yes!

A couple of guiding principles we're following:

  1. All HW-agnostic test cases should only have one copy. Because B200 currently has tighter resource constraints than H100, we are offloading these general tests from B200.
  2. If H100 and B200 use different FP8 kernel implementations for the same function, please test the kernel directly on B200, but avoid adding full E2E tests for this scenario.

For visibility, we are also actively working on deduplicating test cases(#13584, #13366), and will do more of this in the future.

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46461 [ skip ] completed with state SUCCESS. Commit: 5fd43cb
Skipping testing for commit 5fd43cb

Link to invocation

@QiJune QiJune merged commit cf190ec into NVIDIA:main May 1, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants