[None][test] Fix DGX_B200 CI timeout by splitting multimodal tests an… by nv-guomingz · Pull Request #12978 · NVIDIA/TensorRT-LLM

nv-guomingz · 2026-04-13T03:59:28Z

…d correcting stale/missing test durations for pytest-split.

Summary by CodeRabbit

Tests
- Updated test duration measurements for various backend and multimodal test configurations.
- Reorganized multimodal test selection for improved test suite organization and efficiency.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

nv-guomingz · 2026-04-13T04:00:41Z

/bot run

coderabbitai · 2026-04-13T04:02:10Z

📝 Walkthrough

Walkthrough

Updated test metadata across two configuration files: recorded revised duration measurements for additional unit/integration tests, and enumerated specific multimodal test files instead of broad directory references in the l0_b200 test suite.

Changes

Cohort / File(s)	Summary
Test Duration Tracking `tests/integration/defs/.test_durations`	Added recorded durations for FlashInfer TRT-LLM collision test, MoE backend tests, and individual multimodal tests. Removed coarse-grained entries and increased durations for sampler and thop/parallel tests.
Test List Configuration `tests/integration/test_lists/test-db/l0_b200.yml`	Changed multimodal test selection from directory pattern to explicit enumeration of six specific test files (e.g., `test_mm_encoder_standalone.py`, `test_multimodal_runtime.py`, etc.).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete. It contains the template structure but lacks substantive content in required sections like Description and Test Coverage.	Fill in the Description section explaining the timeout issue and solution, and the Test Coverage section listing relevant tests that safeguard these changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: fixing DGX_B200 CI timeout by splitting multimodal tests and correcting test durations.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/.test_durations`:
- Around line 805-806: The .test_durations file is missing entries for the
remaining test_moe_backend selectors (CUTEDSL, DEEPGEMM, DENSEGEMM) which skews
pytest-split balancing; add keys matching the existing pattern used for
CUTLASS/TRTLLM such as
"test_unittests.py::test_unittests_v2[unittest/_torch/modules/moe/test_moe_backend.py::test_moe_backend
-k \"CUTEDSL\"]",
"test_unittests.py::test_unittests_v2[unittest/_torch/modules/moe/test_moe_backend.py::test_moe_backend
-k \"DEEPGEMM\"]", and
"test_unittests.py::test_unittests_v2[unittest/_torch/modules/moe/test_moe_backend.py::test_moe_backend
-k \"DENSEGEMM\"]" with reasonable duration values (estimate based on
CUTLASS/TRTLLM or other similar tests) so pytest-split can balance shards
correctly. Ensure the key formatting exactly matches the existing entries
(module path, test name, and -k selector) and commit the updated .test_durations
entry to keep split/timeouts accurate.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 024246b5-b89d-4e3e-8640-961a8612f561

📥 Commits

Reviewing files that changed from the base of the PR and between ae84aad and cb38fa5.

📒 Files selected for processing (2)

tests/integration/defs/.test_durations
tests/integration/test_lists/test-db/l0_b200.yml

tensorrt-cicd · 2026-04-13T04:06:13Z

PR_Github #42944 [ run ] triggered by Bot. Commit: cb38fa5 Link to invocation

tensorrt-cicd · 2026-04-13T06:07:15Z

PR_Github #42944 [ run ] completed with state SUCCESS. Commit: cb38fa5
/LLM/main/L0_MergeRequest_PR pipeline #33602 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-13T07:03:20Z

/bot run

tensorrt-cicd · 2026-04-13T07:10:53Z

PR_Github #42978 [ run ] triggered by Bot. Commit: 845b19e Link to invocation

tensorrt-cicd · 2026-04-13T09:11:31Z

PR_Github #42978 [ run ] completed with state SUCCESS. Commit: 845b19e
/LLM/main/L0_MergeRequest_PR pipeline #33633 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-13T09:47:49Z

/bot run

tensorrt-cicd · 2026-04-13T09:54:40Z

PR_Github #43015 [ run ] triggered by Bot. Commit: 845b19e Link to invocation

tensorrt-cicd · 2026-04-13T12:14:29Z

PR_Github #43015 [ run ] completed with state SUCCESS. Commit: 845b19e
/LLM/main/L0_MergeRequest_PR pipeline #33664 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-13T14:07:37Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-13T14:13:43Z

PR_Github #43056 [ run ] triggered by Bot. Commit: 845b19e Link to invocation

tensorrt-cicd · 2026-04-13T18:34:24Z

PR_Github #43056 [ run ] completed with state SUCCESS. Commit: 845b19e
/LLM/main/L0_MergeRequest_PR pipeline #33698 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-14T01:47:28Z

/bot run

tensorrt-cicd · 2026-04-14T01:54:07Z

PR_Github #43138 [ run ] triggered by Bot. Commit: 845b19e Link to invocation

tensorrt-cicd · 2026-04-14T03:52:19Z

PR_Github #43138 [ run ] completed with state SUCCESS. Commit: 845b19e
/LLM/main/L0_MergeRequest_PR pipeline #33769 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-16T02:34:35Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-16T02:41:32Z

PR_Github #43623 [ run ] triggered by Bot. Commit: 33e864a Link to invocation

tensorrt-cicd · 2026-04-16T07:16:21Z

PR_Github #43623 [ run ] completed with state SUCCESS. Commit: 33e864a
/LLM/main/L0_MergeRequest_PR pipeline #34112 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-16T13:53:05Z

/bot run

tensorrt-cicd · 2026-04-16T13:59:13Z

PR_Github #43777 [ run ] triggered by Bot. Commit: 33e864a Link to invocation

tensorrt-cicd · 2026-04-16T21:18:18Z

PR_Github #43777 [ run ] completed with state FAILURE. Commit: 33e864a
/LLM/main/L0_MergeRequest_PR pipeline #34258 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-17T02:31:44Z

/bot run

tensorrt-cicd · 2026-04-17T02:38:24Z

PR_Github #43891 [ run ] triggered by Bot. Commit: 33e864a Link to invocation

tensorrt-cicd · 2026-04-17T02:53:43Z

PR_Github #43891 [ run ] completed with state FAILURE. Commit: 33e864a
/LLM/main/L0_MergeRequest_PR pipeline #34343 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-17T03:31:43Z

/bot run

tensorrt-cicd · 2026-04-17T03:37:29Z

PR_Github #43932 [ ] completed with state FAILURE. Commit: 33e864a
Not allowed on merged PR

Link to invocation

…d correcting stale/missing test durations for pytest-split. Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

nv-guomingz · 2026-04-18T13:40:54Z

/bot run

tensorrt-cicd · 2026-04-18T13:46:33Z

PR_Github #44116 [ run ] triggered by Bot. Commit: 583f6f7 Link to invocation

tensorrt-cicd · 2026-04-18T15:10:26Z

PR_Github #44116 [ run ] completed with state SUCCESS. Commit: 583f6f7
/LLM/main/L0_MergeRequest_PR pipeline #34543 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-20T02:51:36Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-20T02:57:21Z

PR_Github #44267 [ run ] triggered by Bot. Commit: 583f6f7 Link to invocation

tensorrt-cicd · 2026-04-20T08:38:54Z

PR_Github #44267 [ run ] completed with state SUCCESS. Commit: 583f6f7
/LLM/main/L0_MergeRequest_PR pipeline #34686 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

nv-guomingz · 2026-04-20T10:07:10Z

/bot run

tensorrt-cicd · 2026-04-20T10:13:14Z

PR_Github #44438 [ run ] triggered by Bot. Commit: 583f6f7 Link to invocation

tensorrt-cicd · 2026-04-20T14:05:04Z

PR_Github #44438 [ run ] completed with state SUCCESS. Commit: 583f6f7
/LLM/main/L0_MergeRequest_PR pipeline #34843 completed with status: 'SUCCESS'

CI Report

Link to invocation

github-actions Bot assigned nv-guomingz Apr 13, 2026

nv-guomingz requested a review from xinhe-nv April 13, 2026 04:00

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread tests/integration/defs/.test_durations

xinhe-nv approved these changes Apr 13, 2026

View reviewed changes

nv-guomingz force-pushed the user/guomingz/opt_test branch from cb38fa5 to 845b19e Compare April 13, 2026 04:16

nv-guomingz enabled auto-merge (squash) April 13, 2026 04:18

nv-guomingz force-pushed the user/guomingz/opt_test branch from 845b19e to 33e864a Compare April 16, 2026 02:28

nv-guomingz closed this Apr 17, 2026

auto-merge was automatically disabled April 17, 2026 03:31
Pull request was closed

nv-guomingz reopened this Apr 18, 2026

[None][test] Fix DGX_B200 CI timeout by splitting multimodal tests an…

583f6f7

…d correcting stale/missing test durations for pytest-split. Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>

nv-guomingz force-pushed the user/guomingz/opt_test branch from 33e864a to 583f6f7 Compare April 18, 2026 13:40

nv-guomingz merged commit 14539f1 into NVIDIA:main Apr 20, 2026
5 checks passed

Conversation

nv-guomingz commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

nv-guomingz commented Apr 13, 2026

Uh oh!

coderabbitai Bot commented Apr 13, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

nv-guomingz commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

nv-guomingz commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

nv-guomingz commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

nv-guomingz commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

nv-guomingz commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

nv-guomingz commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

nv-guomingz commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

nv-guomingz commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

nv-guomingz commented Apr 18, 2026

Uh oh!

tensorrt-cicd commented Apr 18, 2026

Uh oh!

tensorrt-cicd commented Apr 18, 2026

Uh oh!

nv-guomingz commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

nv-guomingz commented Apr 13, 2026 •

edited

Loading