Skip to content

Fix race condition: set _canceled before SignalCore in ProcessWaitState#127312

Merged
adamsitnik merged 4 commits intomainfrom
copilot/fix-process-safe-handle-test-failure
Apr 23, 2026
Merged

Fix race condition: set _canceled before SignalCore in ProcessWaitState#127312
adamsitnik merged 4 commits intomainfrom
copilot/fix-process-safe-handle-test-failure

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 23, 2026

Description

ProcessSafeHandle_WaitForExitOrKillOnCancellationAsync_KillsOnCancellation fails intermittently in CI because _canceled is set after SignalCore(PosixSignal.SIGKILL) returns. The reaping thread triggered by the kill signal can enter ChildReaped and read _canceled == false before the cancellation/timeout path writes true, producing exitStatus.Canceled == false.

Changes

  • Set _canceled = true before calling SignalCore(PosixSignal.SIGKILL) at all call sites (WaitForExitOrKillOnCancellationAsync, WaitForExitOrKillOnTimeoutCore on both Unix and Windows), so the reaping thread always observes the canceled state
  • Remove volatile from _canceled since the _gate lock acquire in ChildReaped provides the necessary memory barrier on the read side

@tmds
Copy link
Copy Markdown
Member

tmds commented Apr 23, 2026

Add Cancel(SafeProcessHandle) method to ProcessWaitState.Unix.cs that sets _canceled under _gate lock

Suggestion: remove the volatile, don't take a lock, and do this:

waitState._canceled = true;
SignalCore(PosixSignal.SIGKILL);

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-diagnostics-process
See info in area-owners.md if you want to be subscribed.

The test ProcessSafeHandle_WaitForExitOrKillOnCancellationAsync_KillsOnCancellation
was failing intermittently because _canceled was being set outside the _gate lock,
while ChildReaped reads it under the lock to build ProcessExitStatus. This caused
a race where ChildReaped could read _canceled as false before the cancellation
callback set it to true.

Changes:
- Add Cancel() method to SafeProcessHandle (platform-specific implementations)
- On Unix, Cancel() delegates to ProcessWaitState.Cancel() which acquires _gate
  before setting _canceled, ensuring atomicity with ChildReaped
- On Windows, Cancel() wraps the existing SignalCore + Canceled property set
- Remove volatile from _canceled field since it's now protected by the lock
- Update all call sites to use Cancel() instead of directly setting Canceled

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/d23310ac-167d-4a45-a389-45918504a964

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 23, 2026 08:23
Copilot AI changed the title [WIP] Fix ProcessSafeHandle_WaitForExitOrKillOnCancellationAsync test failure in CI Fix race condition: set _canceled under _gate lock in ProcessWaitState Apr 23, 2026
Copilot AI requested a review from adamsitnik April 23, 2026 08:25
@adamsitnik adamsitnik marked this pull request as ready for review April 23, 2026 08:45
Copilot AI review requested due to automatic review settings April 23, 2026 08:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an intermittent CI failure by ensuring the “canceled” state is set atomically with process reaping on Unix, eliminating a race between the cancellation/timeout path and ChildReaped exit-status construction.

Changes:

  • Add a Unix ProcessWaitState.Cancel(SafeProcessHandle) that takes _gate before signaling and setting _canceled.
  • Introduce platform-specific SafeProcessHandle.Cancel() helpers and route cancellation/timeout call sites through them (or through the wait state on Unix).
  • Remove volatile from _canceled now that access is lock-protected.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs Adds lock-protected cancel path and removes volatile on _canceled to make cancellation state consistent with ChildReaped.
src/libraries/System.Diagnostics.Process/src/Microsoft/Win32/SafeHandles/SafeProcessHandle.cs Switches cancellation callback to call the new Cancel() helper.
src/libraries/System.Diagnostics.Process/src/Microsoft/Win32/SafeHandles/SafeProcessHandle.Windows.cs Adds Cancel() helper and uses it for timeout kill path.
src/libraries/System.Diagnostics.Process/src/Microsoft/Win32/SafeHandles/SafeProcessHandle.Unix.cs Adds Cancel() helper that routes through ProcessWaitState.Cancel, and updates timeout kill path.

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/22fb6e3e-d0e3-4cf8-91b3-5b227dbbc8a4

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
@adamsitnik
Copy link
Copy Markdown
Member

Add Cancel(SafeProcessHandle) method to ProcessWaitState.Unix.cs that sets _canceled under _gate lock

Suggestion: remove the volatile, don't take a lock, and do this:


waitState._canceled = true;

SignalCore(PosixSignal.SIGKILL);

@copilot revert all the changes and implement this suggestion.

Copilot AI requested review from Copilot and removed request for Copilot April 23, 2026 11:21
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 23, 2026

Add Cancel(SafeProcessHandle) method to ProcessWaitState.Unix.cs that sets _canceled under _gate lock

...

Reverted all previous changes and implemented the simpler approach in 6bc7251: set _canceled = true before SignalCore(PosixSignal.SIGKILL) at all call sites (Unix timeout, Unix cancellation, Windows timeout, shared cancellation callback), removed volatile from _canceled. No lock needed. All 561 tests pass.

Add Cancel(SafeProcessHandle) method to ProcessWaitState.Unix.cs that sets _canceled under _gate lock

...

Reverted all previous changes and implemented the simpler approach in 6bc72519. At all 3 call sites, _canceled is now set to true before SignalCore(PosixSignal.SIGKILL), and volatile is removed. All 561 tests pass.

Copilot AI changed the title Fix race condition: set _canceled under _gate lock in ProcessWaitState Fix race condition: set _canceled before SignalCore in ProcessWaitState Apr 23, 2026
@adamsitnik
Copy link
Copy Markdown
Member

/ba-g timeouts are unrelated

@adamsitnik adamsitnik merged commit caa9958 into main Apr 23, 2026
84 of 92 checks passed
@adamsitnik adamsitnik deleted the copilot/fix-process-safe-handle-test-failure branch April 23, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ProcessSafeHandle_WaitForExitOrKillOnCancellationAsync_KillsOnCancellation failuring in CI

5 participants