Skip to content

[https://nvbugs/6112508][fix] WAR for popen in QA env in disagg_test_utils#13634

Merged
crazydemo merged 2 commits intoNVIDIA:mainfrom
xwang233:fix/nvbugs-6112508-popen-war-disagg-test-utils
May 5, 2026
Merged

[https://nvbugs/6112508][fix] WAR for popen in QA env in disagg_test_utils#13634
crazydemo merged 2 commits intoNVIDIA:mainfrom
xwang233:fix/nvbugs-6112508-popen-war-disagg-test-utils

Conversation

@xwang233
Copy link
Copy Markdown
Collaborator

@xwang233 xwang233 commented Apr 30, 2026

Description

Restores a workaround that already exists in the sibling helper test_auto_scaling.py (NVBug 5821433, PRs #10989 / #11214) but was dropped when disagg_test_utils.py was created in PR #11242 — the helpers were copied over in their pre-WAR shape.

Under SWQA's pytest invocation (--capture=tee-sys), sys.stdout / sys.stderr are wrapped TextIOWrappers without a usable fileno(), so subprocess.Popen(stdout=sys.stdout, ...) raises io.UnsupportedOperation: fileno on the first worker spawn. Fix: leave stdout/stderr as None so the child inherits the parent's real fds. Identical in shape to PR #11214.

Also drops the two waives added against this NVBug in commit 4b39c38675 so QA picks the test back up after merge.

Test Coverage

QA-only failure (pytest -s locally cannot reproduce). Validation path is LLM_FUNCTION_CLUSTER_TEST qa/llm_function_stress.txt on main.

Summary by CodeRabbit

  • Tests
    • Improved subprocess stream handling for test workers.
    • Removed previously waived disaggregated stress test cases, indicating improved test stability.

…utils

Under SWQA's pytest invocation (--capture=tee-sys) sys.stdout/sys.stderr
are wrapped TextIOWrapper instances without a usable fileno(), so
subprocess.Popen(stdout=sys.stdout, stderr=sys.stderr) raises
io.UnsupportedOperation: fileno on the first worker spawn. Leaving
stdout/stderr as None lets the child inherit the parent's real fds.

Same workaround that PRs NVIDIA#10989 / NVIDIA#11214 applied to test_auto_scaling.py
for nvbugs/5821433. The WAR was dropped when disagg_test_utils.py was
created in PR NVIDIA#11242 by copying _run_worker / run_disagg_server from the
sibling file in their pre-WAR shape; this restores it.

Also drops the two waives added against this NVBug in waives.txt.

Signed-off-by: Xiao Wang <24860335+xwang233@users.noreply.github.com>
@xwang233 xwang233 marked this pull request as ready for review April 30, 2026 03:03
@xwang233 xwang233 requested a review from a team as a code owner April 30, 2026 03:03
@xwang233 xwang233 requested a review from crazydemo April 30, 2026 03:03
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

Updates subprocess log/stream handling in disaggregated test utilities to leave child stdout/stderr as None instead of routing them through parent's sys.stdout/sys.stderr when log saving is disabled. Additionally, removes two previously waived disaggregated stress-test cases from the waive list.

Changes

Cohort / File(s) Summary
Subprocess Stream Handling
tests/integration/defs/disaggregated/disagg_test_utils.py
Modifies child process stdout/stderr routing: when save_log is disabled, streams are left as None to inherit real file descriptors instead of being redirected through parent's sys.stdout/sys.stderr. Removes unused sys import.
Test Waive List
tests/integration/test_lists/waives.txt
Removes two waived disaggregated stress-test parameterizations: test_disaggregated_stress_test[deepseek_r1_v2_fp4_stress] and test_disaggregated_stress_test[gpt_oss_120b_stress].

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the fix for a subprocess.Popen workaround in disagg_test_utils, referencing the specific NVBug ticket and being concise and specific.
Description check ✅ Passed The description thoroughly explains the issue (wrapped TextIOWrapper causing fileno() errors), the solution (leaving stdout/stderr as None), context from related PRs, and test coverage details, aligning well with the template's requirements.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/integration/defs/disaggregated/disagg_test_utils.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the SPDX copyright year.

This file is modified in 2026, but the header still stops at 2024. Please bump it to the latest meaningful modification year so the file stays compliant.

Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/disaggregated/disagg_test_utils.py` at line 1, Update
the SPDX header year in the file's top comment: change the existing SPDX
copyright line that ends with "2022-2024" to include the current modification
year (e.g., "2022-2026") so the header reflects the file was modified in 2026;
edit the SPDX comment string at the top of
tests/integration/defs/disaggregated/disagg_test_utils.py accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/integration/defs/disaggregated/disagg_test_utils.py`:
- Line 1: Update the SPDX header year in the file's top comment: change the
existing SPDX copyright line that ends with "2022-2024" to include the current
modification year (e.g., "2022-2026") so the header reflects the file was
modified in 2026; edit the SPDX comment string at the top of
tests/integration/defs/disaggregated/disagg_test_utils.py accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6465e01c-7ca4-40de-ba85-1f6b084fbb3d

📥 Commits

Reviewing files that changed from the base of the PR and between 2bc8f7f and 3bfcc21.

📒 Files selected for processing (2)
  • tests/integration/defs/disaggregated/disagg_test_utils.py
  • tests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt

@xwang233
Copy link
Copy Markdown
Collaborator Author

/bot run --skip-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46278 [ run ] triggered by Bot. Commit: 3bfcc21 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46278 [ run ] completed with state SUCCESS. Commit: 3bfcc21
/LLM/main/L0_MergeRequest_PR pipeline #36383 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@xwang233
Copy link
Copy Markdown
Collaborator Author

xwang233 commented May 4, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46667 [ run ] triggered by Bot. Commit: 3bfcc21 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46667 [ run ] completed with state SUCCESS. Commit: 3bfcc21
/LLM/main/L0_MergeRequest_PR pipeline #36708 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

…pen-war-disagg-test-utils

Signed-off-by: Xiao Wang <xiaow@nvidia.com>
Signed-off-by: Xiao Wang <24860335+xwang233@users.noreply.github.com>

# Conflicts:
#	tests/integration/test_lists/waives.txt
@crazydemo crazydemo enabled auto-merge (squash) May 5, 2026 03:21
@xwang233
Copy link
Copy Markdown
Collaborator Author

xwang233 commented May 5, 2026

/bot run --stage-list

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46735 Bot args parsing error: usage: /bot run [--reuse-test [optionalpipeline-id]] [--disable-fail-fast]
[--skip-test] [--stage-list "A10-PyTorch-1, xxx"]
[--gpu-type "A30, H100_PCIe"] [--test-backend "pytorch, cpp"]
[--add-multi-gpu-test] [--only-multi-gpu-test]
[--disable-multi-gpu-test] [--post-merge]
[--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"]
[--debug] [--detailed-log] [--high-priority]
/bot run: error: argument --stage-list: expected one argument

Link to invocation

@crazydemo
Copy link
Copy Markdown
Collaborator

/bot run --stage-list ""

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46737 [ run ] triggered by Bot. Commit: ea15816 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46737 [ run ] completed with state SUCCESS. Commit: ea15816
/LLM/main/L0_MergeRequest_PR pipeline #36767 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@crazydemo
Copy link
Copy Markdown
Collaborator

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46746 [ reuse-pipeline ] triggered by Bot. Commit: ea15816 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46746 [ reuse-pipeline ] completed with state SUCCESS. Commit: ea15816
Reusing PR_Github #46737 (Partly Tested) for commit ea15816

Link to invocation

@crazydemo crazydemo merged commit 50d509e into NVIDIA:main May 5, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants