Skip to content

[None][test] Fix DGX_B200 CI timeout by splitting multimodal tests an…#12978

Merged
nv-guomingz merged 1 commit intoNVIDIA:mainfrom
nv-guomingz:user/guomingz/opt_test
Apr 20, 2026
Merged

[None][test] Fix DGX_B200 CI timeout by splitting multimodal tests an…#12978
nv-guomingz merged 1 commit intoNVIDIA:mainfrom
nv-guomingz:user/guomingz/opt_test

Conversation

@nv-guomingz
Copy link
Copy Markdown
Collaborator

@nv-guomingz nv-guomingz commented Apr 13, 2026

…d correcting stale/missing test durations for pytest-split.

Summary by CodeRabbit

  • Tests
    • Updated test duration measurements for various backend and multimodal test configurations.
    • Reorganized multimodal test selection for improved test suite organization and efficiency.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

Updated test metadata across two configuration files: recorded revised duration measurements for additional unit/integration tests, and enumerated specific multimodal test files instead of broad directory references in the l0_b200 test suite.

Changes

Cohort / File(s) Summary
Test Duration Tracking
tests/integration/defs/.test_durations
Added recorded durations for FlashInfer TRT-LLM collision test, MoE backend tests, and individual multimodal tests. Removed coarse-grained entries and increased durations for sampler and thop/parallel tests.
Test List Configuration
tests/integration/test_lists/test-db/l0_b200.yml
Changed multimodal test selection from directory pattern to explicit enumeration of six specific test files (e.g., test_mm_encoder_standalone.py, test_multimodal_runtime.py, etc.).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. It contains the template structure but lacks substantive content in required sections like Description and Test Coverage. Fill in the Description section explaining the timeout issue and solution, and the Test Coverage section listing relevant tests that safeguard these changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: fixing DGX_B200 CI timeout by splitting multimodal tests and correcting test durations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/.test_durations`:
- Around line 805-806: The .test_durations file is missing entries for the
remaining test_moe_backend selectors (CUTEDSL, DEEPGEMM, DENSEGEMM) which skews
pytest-split balancing; add keys matching the existing pattern used for
CUTLASS/TRTLLM such as
"test_unittests.py::test_unittests_v2[unittest/_torch/modules/moe/test_moe_backend.py::test_moe_backend
-k \"CUTEDSL\"]",
"test_unittests.py::test_unittests_v2[unittest/_torch/modules/moe/test_moe_backend.py::test_moe_backend
-k \"DEEPGEMM\"]", and
"test_unittests.py::test_unittests_v2[unittest/_torch/modules/moe/test_moe_backend.py::test_moe_backend
-k \"DENSEGEMM\"]" with reasonable duration values (estimate based on
CUTLASS/TRTLLM or other similar tests) so pytest-split can balance shards
correctly. Ensure the key formatting exactly matches the existing entries
(module path, test name, and -k selector) and commit the updated .test_durations
entry to keep split/timeouts accurate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 024246b5-b89d-4e3e-8640-961a8612f561

📥 Commits

Reviewing files that changed from the base of the PR and between ae84aad and cb38fa5.

📒 Files selected for processing (2)
  • tests/integration/defs/.test_durations
  • tests/integration/test_lists/test-db/l0_b200.yml

Comment thread tests/integration/defs/.test_durations
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42944 [ run ] triggered by Bot. Commit: cb38fa5 Link to invocation

@nv-guomingz nv-guomingz force-pushed the user/guomingz/opt_test branch from cb38fa5 to 845b19e Compare April 13, 2026 04:16
@nv-guomingz nv-guomingz enabled auto-merge (squash) April 13, 2026 04:18
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42944 [ run ] completed with state SUCCESS. Commit: cb38fa5
/LLM/main/L0_MergeRequest_PR pipeline #33602 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42978 [ run ] triggered by Bot. Commit: 845b19e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42978 [ run ] completed with state SUCCESS. Commit: 845b19e
/LLM/main/L0_MergeRequest_PR pipeline #33633 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43015 [ run ] triggered by Bot. Commit: 845b19e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43015 [ run ] completed with state SUCCESS. Commit: 845b19e
/LLM/main/L0_MergeRequest_PR pipeline #33664 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43056 [ run ] triggered by Bot. Commit: 845b19e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43056 [ run ] completed with state SUCCESS. Commit: 845b19e
/LLM/main/L0_MergeRequest_PR pipeline #33698 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43138 [ run ] triggered by Bot. Commit: 845b19e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43138 [ run ] completed with state SUCCESS. Commit: 845b19e
/LLM/main/L0_MergeRequest_PR pipeline #33769 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@nv-guomingz nv-guomingz force-pushed the user/guomingz/opt_test branch from 845b19e to 33e864a Compare April 16, 2026 02:28
@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43623 [ run ] triggered by Bot. Commit: 33e864a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43623 [ run ] completed with state SUCCESS. Commit: 33e864a
/LLM/main/L0_MergeRequest_PR pipeline #34112 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43777 [ run ] triggered by Bot. Commit: 33e864a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43777 [ run ] completed with state FAILURE. Commit: 33e864a
/LLM/main/L0_MergeRequest_PR pipeline #34258 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43891 [ run ] triggered by Bot. Commit: 33e864a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43891 [ run ] completed with state FAILURE. Commit: 33e864a
/LLM/main/L0_MergeRequest_PR pipeline #34343 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run

auto-merge was automatically disabled April 17, 2026 03:31

Pull request was closed

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43932 [ ] completed with state FAILURE. Commit: 33e864a
Not allowed on merged PR

Link to invocation

@nv-guomingz nv-guomingz reopened this Apr 18, 2026
…d correcting stale/missing test durations for pytest-split.

Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run

@nv-guomingz nv-guomingz force-pushed the user/guomingz/opt_test branch from 33e864a to 583f6f7 Compare April 18, 2026 13:40
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44116 [ run ] triggered by Bot. Commit: 583f6f7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44116 [ run ] completed with state SUCCESS. Commit: 583f6f7
/LLM/main/L0_MergeRequest_PR pipeline #34543 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44267 [ run ] triggered by Bot. Commit: 583f6f7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44267 [ run ] completed with state SUCCESS. Commit: 583f6f7
/LLM/main/L0_MergeRequest_PR pipeline #34686 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@nv-guomingz
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44438 [ run ] triggered by Bot. Commit: 583f6f7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44438 [ run ] completed with state SUCCESS. Commit: 583f6f7
/LLM/main/L0_MergeRequest_PR pipeline #34843 completed with status: 'SUCCESS'

CI Report

Link to invocation

@nv-guomingz nv-guomingz merged commit 14539f1 into NVIDIA:main Apr 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants