Skip to content

[https://nvbugs/5911788][fix] Isolate single_gpu ray orchestrator tests to avoid CI timeouts#12616

Merged
shuyixiong merged 3 commits intoNVIDIA:mainfrom
shuyixiong:user/shuyix/isolate_ray_tests
Apr 2, 2026
Merged

[https://nvbugs/5911788][fix] Isolate single_gpu ray orchestrator tests to avoid CI timeouts#12616
shuyixiong merged 3 commits intoNVIDIA:mainfrom
shuyixiong:user/shuyix/isolate_ray_tests

Conversation

@shuyixiong
Copy link
Copy Markdown
Collaborator

@shuyixiong shuyixiong commented Mar 31, 2026

Summary by CodeRabbit

  • Tests
    • Updated test selection configuration for GPU testing infrastructure, refining which test cases are executed in the test suite.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@shuyixiong shuyixiong force-pushed the user/shuyix/isolate_ray_tests branch from 8a90efd to e362e6e Compare March 31, 2026 08:02
@shuyixiong
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40916 [ run ] triggered by Bot. Commit: e362e6e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40919 [ run ] triggered by Bot. Commit: e362e6e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40916 [ run ] completed with state ABORTED. Commit: e362e6e

Link to invocation

@shuyixiong
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40930 [ run ] triggered by Bot. Commit: e362e6e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40919 [ run ] completed with state ABORTED. Commit: e362e6e

Link to invocation

@shuyixiong
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40939 [ run ] triggered by Bot. Commit: 8c61bfd Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40930 [ run ] completed with state ABORTED. Commit: e362e6e

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40939 [ run ] completed with state SUCCESS. Commit: 8c61bfd
/LLM/main/L0_MergeRequest_PR pipeline #31931 completed with status: 'SUCCESS'

CI Report

Link to invocation

@shuyixiong shuyixiong marked this pull request as ready for review April 1, 2026 02:44
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

This PR updates test selection and waive configurations for single-GPU ray orchestrator tests in the integration test suite. The change replaces a directory-level test selection with explicit individual test cases and removes waive (SKIP) entries for several parameterized test variants.

Changes

Cohort / File(s) Summary
Test Selection and Waive Updates
tests/integration/test_lists/test-db/l0_h100.yml, tests/integration/test_lists/waives.txt
Updated test selection to run specific test cases instead of the entire unittest/_torch/ray_orchestrator/single_gpu directory. Removed SKIP waive entries for test_llm_partial_update_weights and test_llm_update_weights_with_quant_config cases across multiple model variants (TinyLlama, Qwen2.5, Qwen3, including FP8 variants).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description lacks substantive content in required sections: Description and Test Coverage are empty, with only the template checklist provided. Add details to the Description section explaining why test isolation is needed to avoid CI timeouts, and document specific test cases in Test Coverage that validate the changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly indicates the main change: isolating single_gpu ray orchestrator tests to resolve CI timeout issues.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@shuyixiong shuyixiong requested a review from Superjomn April 1, 2026 02:49
@shuyixiong shuyixiong requested a review from Superjomn April 1, 2026 04:57
@shuyixiong
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41108 [ run ] triggered by Bot. Commit: da139f4 Link to invocation

Copy link
Copy Markdown
Collaborator

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41108 [ run ] completed with state FAILURE. Commit: da139f4
/LLM/main/L0_MergeRequest_PR pipeline #32082 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
@shuyixiong shuyixiong force-pushed the user/shuyix/isolate_ray_tests branch from da139f4 to 1178d95 Compare April 1, 2026 07:58
@shuyixiong
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41156 [ run ] triggered by Bot. Commit: 1178d95 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41156 [ run ] completed with state SUCCESS. Commit: 1178d95
/LLM/main/L0_MergeRequest_PR pipeline #32125 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@shuyixiong
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41190 [ run ] triggered by Bot. Commit: 1178d95 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41190 [ run ] completed with state SUCCESS. Commit: 1178d95
/LLM/main/L0_MergeRequest_PR pipeline #32152 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@shuyixiong
Copy link
Copy Markdown
Collaborator Author

/bot run

@shuyixiong shuyixiong enabled auto-merge (squash) April 1, 2026 14:30
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41205 [ run ] triggered by Bot. Commit: 1178d95 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41205 [ run ] completed with state SUCCESS. Commit: 1178d95
/LLM/main/L0_MergeRequest_PR pipeline #32166 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@shuyixiong
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41347 [ run ] triggered by Bot. Commit: 1178d95 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41347 [ run ] completed with state SUCCESS. Commit: 1178d95
/LLM/main/L0_MergeRequest_PR pipeline #32293 completed with status: 'SUCCESS'

CI Report

Link to invocation

@shuyixiong shuyixiong merged commit fd09239 into NVIDIA:main Apr 2, 2026
5 checks passed
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
…ts to avoid CI timeouts (NVIDIA#12616)

Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants