Skip to content

Fix ToolTask hang when grandchild processes inherit pipe handles#13351

Merged
YuliiaKovalova merged 6 commits into
mainfrom
dev/fix-tooltask-hang-grandchild
Mar 23, 2026
Merged

Fix ToolTask hang when grandchild processes inherit pipe handles#13351
YuliiaKovalova merged 6 commits into
mainfrom
dev/fix-tooltask-hang-grandchild

Conversation

@YuliiaKovalova
Copy link
Copy Markdown
Member

@YuliiaKovalova YuliiaKovalova commented Mar 9, 2026

Context

ToolTask can hang indefinitely when the tool it spawns creates grandchild processes that inherit stdout/stderr pipe handles. This is a long-standing issue reported in #2981 (opened 2018), with a previous fix attempt in #10297 that was reverted (#10395) because it caused output loss (#10378).

Root Cause

On .NET Framework (and .NET Core), the parameterless Process.WaitForExit() internally calls AsyncStreamReader.WaitUtilEOF(), which blocks until all write handles to the stdout/stderr pipes are closed. When a tool like cl.exe spawns a grandchild process (e.g., mspdbsrv.exe) that inherits the pipe handles, EOF is never reached even though the tool itself has exited - causing an infinite hang.

Why the previous fix (#10297) failed

PR #10297 changed proc.WaitForExit() to proc.WaitForExit(int.MaxValue). The int overload skips WaitUtilEOF(), preventing the hang. But it also skips waiting for all DataReceived callbacks to be delivered. For fast tools like command -v ls, the AsyncStreamReader hadn't delivered its output before the drain calls ran, causing ConsoleOutput to be empty.

Changes Made

The fix (behind ChangeWave 18.6) uses the Data==null EOF sentinel that AsyncStreamReader sends via DataReceived when each pipe reaches EOF:

Step 1: proc.WaitForExit(int.MaxValue)

Waits for the process handle only, not pipe EOF. Since _toolExited already fired before WaitForProcessExit is called, the process is dead and this returns immediately.

Step 2: WaitHandle.WaitAll(eofEvents, 2000)

Waits for our own _standardOutputEOF / _standardErrorEOF events, which are set by ReceiveStandardErrorOrOutputData when Data==null arrives from the AsyncStreamReader.

  • Normal case (no grandchild): Pipe closes immediately after tool exits, EOF arrives within milliseconds, events fire, all data including final partial line is guaranteed delivered. This provides identical guarantees to the original proc.WaitForExit().
  • Grandchild case: Grandchild holds pipe open, EOF never arrives, events time out after 2 seconds, proceed. The tool's line-by-line output was already delivered during the HandleToolNotifications loop.

Why this doesn't lose data (unlike #10297)

Our _standardOutputEOF fires inside AsyncStreamReader.FlushMessageQueue(), which is called before the internal eofEvent.Set(). By the time our event fires, the AsyncStreamReader has already:

  1. Read all remaining bytes from the pipe buffer
  2. Decoded them into characters
  3. Flushed the final partial line from its StringBuilder
  4. Delivered every line (including the final one) via DataReceived callbacks

This is functionally equivalent to WaitUtilEOF - just observed from our callback instead of the internal event.

Testing

ToolTaskDoesNotHangWhenGrandchildInheritsPipeHandles

Spawns cmd.exe /c echo hello & start /b ping -n 120 127.0.0.1 > nul:

  • cmd.exe writes "hello" and exits immediately
  • ping inherits pipe handles and runs for 120 seconds
  • With fix: Test completes in ~2 seconds, "hello" is captured
  • Without fix (MSBUILDDISABLEFEATURESFROMVERSION=18.6): Test hangs until 30s timeout

ToolTaskCapturesAllOutputWithFix

Spawns cmd.exe /c echo line1 & echo line2 & echo line3 (no grandchild):

When a tool spawned by ToolTask creates child processes that inherit
stdout/stderr pipe handles, Process.WaitForExit() blocks forever because
the parameterless overload waits for pipe EOF via AsyncStreamReader.WaitUtilEOF().
The grandchild holds the pipe open indefinitely, causing the MSBuild node
to hang and orphan worker processes with file locks.

The fix (behind ChangeWave 18.6) replaces the parameterless WaitForExit()
with a two-step approach:
1. WaitForExit(Timeout.Infinite) - waits for process handle only, not pipe EOF
2. WaitAll on EOF sentinel events with bounded 2s timeout - waits for
   AsyncStreamReader to deliver Data=null (the EOF callback), which fires
   after all data including the final partial line has been flushed

This provides identical data guarantees to the original WaitForExit() in the
normal case (EOF arrives within milliseconds), while preventing infinite hangs
when grandchild processes hold pipe handles.

Fixes #2981
Copilot AI review requested due to automatic review settings March 9, 2026 21:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses a long-standing ToolTask hang caused by Process.WaitForExit() waiting for stdout/stderr pipe EOF when grandchild processes inherit the redirected pipe handles, by switching to a bounded EOF-drain wait observed via DataReceived EOF sentinels (behind ChangeWave 18.6).

Changes:

  • Add stdout/stderr EOF ManualResetEvents and use them to wait (bounded) for async stream-drain completion after waiting for the process handle.
  • Update the DataReceived handler to signal EOF events when Data == null.
  • Add regression tests for the hang scenario and for preserving tool output; introduce ChangeWave 18.6 and document it.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/Utilities/ToolTask.cs Implements Wave18.6 guarded hang fix by decoupling process-exit wait from pipe-EOF wait, using EOF callbacks with a bounded timeout.
src/Utilities.UnitTests/ToolTask_Tests.cs Adds regression tests for the grandchild-inherited-pipe hang and for output capture behavior.
src/Framework/ChangeWaves.cs Introduces Wave18_6 and adds it to AllWaves.
documentation/wiki/ChangeWaves.md Documents the new 18.6 change wave and its associated feature.

Comment thread src/Utilities/ToolTask.cs Outdated
Comment thread src/Utilities.UnitTests/ToolTask_Tests.cs Outdated
Comment thread src/Utilities.UnitTests/ToolTask_Tests.cs
Comment thread src/Utilities.UnitTests/ToolTask_Tests.cs Outdated
- Update remarks to clarify hang affects both .NET Framework and modern .NET
- Shorten ping duration from 120 to 10 in test (still exceeds 2s EOF timeout)
- Use Shouldly assertions (ShouldContain) instead of engine.AssertLogContains
Copy link
Copy Markdown
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @YuliiaKovalova ! I left some comments, but they are all just nits.

Comment thread src/Utilities.UnitTests/ToolTask_Tests.cs
Comment thread src/Utilities/ToolTask.cs Outdated
Comment thread src/Utilities/ToolTask.cs Outdated
Comment thread src/Utilities/ToolTask.cs Outdated
Comment thread src/Utilities/ToolTask.cs
Copy link
Copy Markdown
Member

@baronfel baronfel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks targeted, tested, and protected by a change wave. Nice work!

@YuliiaKovalova YuliiaKovalova merged commit 6c499a7 into main Mar 23, 2026
10 checks passed
@YuliiaKovalova YuliiaKovalova deleted the dev/fix-tooltask-hang-grandchild branch March 23, 2026 16:18
dfederm pushed a commit to dfederm/msbuild that referenced this pull request Apr 9, 2026
…net#13351)

## Context

`ToolTask` can hang indefinitely when the tool it spawns creates
grandchild processes that inherit stdout/stderr pipe handles. This is a
long-standing issue reported in dotnet#2981 (opened 2018), with a previous fix
attempt in dotnet#10297 that was **reverted** (dotnet#10395) because it caused
output loss (dotnet#10378).

### Root Cause

On .NET Framework (and .NET Core), the parameterless
`Process.WaitForExit()` internally calls
`AsyncStreamReader.WaitUtilEOF()`, which blocks until **all** write
handles to the stdout/stderr pipes are closed. When a tool like `cl.exe`
spawns a grandchild process (e.g., `mspdbsrv.exe`) that inherits the
pipe handles, EOF is never reached even though the tool itself has
exited - causing an infinite hang.


### Why the previous fix (dotnet#10297) failed

PR dotnet#10297 changed `proc.WaitForExit()` to
`proc.WaitForExit(int.MaxValue)`. The `int` overload skips
`WaitUtilEOF()`, preventing the hang. **But it also skips waiting for
all `DataReceived` callbacks to be delivered.** For fast tools like
`command -v ls`, the `AsyncStreamReader` hadn't delivered its output
before the drain calls ran, causing `ConsoleOutput` to be empty.

## Changes Made

The fix (behind **ChangeWave 18.6**) uses the `Data==null` EOF sentinel
that `AsyncStreamReader` sends via `DataReceived` when each pipe reaches
EOF:

### Step 1: `proc.WaitForExit(int.MaxValue)`
Waits for the **process handle only**, not pipe EOF. Since `_toolExited`
already fired before `WaitForProcessExit` is called, the process is dead
and this returns immediately.

### Step 2: `WaitHandle.WaitAll(eofEvents, 2000)`
Waits for our own `_standardOutputEOF` / `_standardErrorEOF` events,
which are set by `ReceiveStandardErrorOrOutputData` when `Data==null`
arrives from the `AsyncStreamReader`.

- **Normal case (no grandchild):** Pipe closes immediately after tool
exits, EOF arrives within milliseconds, events fire, all data including
final partial line is guaranteed delivered. This provides **identical
guarantees** to the original `proc.WaitForExit()`.
- **Grandchild case:** Grandchild holds pipe open, EOF never arrives,
events time out after 2 seconds, proceed. The tool's line-by-line output
was already delivered during the `HandleToolNotifications` loop.

### Why this doesn't lose data (unlike dotnet#10297)

Our `_standardOutputEOF` fires **inside**
`AsyncStreamReader.FlushMessageQueue()`, which is called **before** the
internal `eofEvent.Set()`. By the time our event fires, the
`AsyncStreamReader` has already:
1. Read all remaining bytes from the pipe buffer
2. Decoded them into characters
3. Flushed the final partial line from its `StringBuilder`
4. Delivered every line (including the final one) via `DataReceived`
callbacks

This is functionally equivalent to `WaitUtilEOF` - just observed from
our callback instead of the internal event.

## Testing

### ToolTaskDoesNotHangWhenGrandchildInheritsPipeHandles
Spawns `cmd.exe /c echo hello & start /b ping -n 120 127.0.0.1 > nul`:
- `cmd.exe` writes "hello" and exits immediately
- `ping` inherits pipe handles and runs for 120 seconds
- **With fix:** Test completes in ~2 seconds, "hello" is captured
- **Without fix (MSBUILDDISABLEFEATURESFROMVERSION=18.6):** Test hangs
until 30s timeout

### ToolTaskCapturesAllOutputWithFix
Spawns `cmd.exe /c echo line1 & echo line2 & echo line3` (no
grandchild):
- Verifies all three lines are captured - regression test for dotnet#10378

---------

Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants