Fix race condition: set _canceled before SignalCore in ProcessWaitState#127312
Fix race condition: set _canceled before SignalCore in ProcessWaitState#127312adamsitnik merged 4 commits intomainfrom
Conversation
Suggestion: remove the volatile, don't take a lock, and do this: |
|
Tagging subscribers to this area: @dotnet/area-system-diagnostics-process |
The test ProcessSafeHandle_WaitForExitOrKillOnCancellationAsync_KillsOnCancellation was failing intermittently because _canceled was being set outside the _gate lock, while ChildReaped reads it under the lock to build ProcessExitStatus. This caused a race where ChildReaped could read _canceled as false before the cancellation callback set it to true. Changes: - Add Cancel() method to SafeProcessHandle (platform-specific implementations) - On Unix, Cancel() delegates to ProcessWaitState.Cancel() which acquires _gate before setting _canceled, ensuring atomicity with ChildReaped - On Windows, Cancel() wraps the existing SignalCore + Canceled property set - Remove volatile from _canceled field since it's now protected by the lock - Update all call sites to use Cancel() instead of directly setting Canceled Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/d23310ac-167d-4a45-a389-45918504a964 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes an intermittent CI failure by ensuring the “canceled” state is set atomically with process reaping on Unix, eliminating a race between the cancellation/timeout path and ChildReaped exit-status construction.
Changes:
- Add a Unix
ProcessWaitState.Cancel(SafeProcessHandle)that takes_gatebefore signaling and setting_canceled. - Introduce platform-specific
SafeProcessHandle.Cancel()helpers and route cancellation/timeout call sites through them (or through the wait state on Unix). - Remove
volatilefrom_cancelednow that access is lock-protected.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/libraries/System.Diagnostics.Process/src/System/Diagnostics/ProcessWaitState.Unix.cs | Adds lock-protected cancel path and removes volatile on _canceled to make cancellation state consistent with ChildReaped. |
| src/libraries/System.Diagnostics.Process/src/Microsoft/Win32/SafeHandles/SafeProcessHandle.cs | Switches cancellation callback to call the new Cancel() helper. |
| src/libraries/System.Diagnostics.Process/src/Microsoft/Win32/SafeHandles/SafeProcessHandle.Windows.cs | Adds Cancel() helper and uses it for timeout kill path. |
| src/libraries/System.Diagnostics.Process/src/Microsoft/Win32/SafeHandles/SafeProcessHandle.Unix.cs | Adds Cancel() helper that routes through ProcessWaitState.Cancel, and updates timeout kill path. |
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/22fb6e3e-d0e3-4cf8-91b3-5b227dbbc8a4 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
@copilot revert all the changes and implement this suggestion. |
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/e3d7eda5-6eea-46ce-8caf-8dbe57c1b899 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Reverted all previous changes and implemented the simpler approach in 6bc7251: set
Reverted all previous changes and implemented the simpler approach in |
|
/ba-g timeouts are unrelated |
Description
ProcessSafeHandle_WaitForExitOrKillOnCancellationAsync_KillsOnCancellationfails intermittently in CI because_canceledis set afterSignalCore(PosixSignal.SIGKILL)returns. The reaping thread triggered by the kill signal can enterChildReapedand read_canceled == falsebefore the cancellation/timeout path writestrue, producingexitStatus.Canceled == false.Changes
_canceled = truebefore callingSignalCore(PosixSignal.SIGKILL)at all call sites (WaitForExitOrKillOnCancellationAsync,WaitForExitOrKillOnTimeoutCoreon both Unix and Windows), so the reaping thread always observes the canceled statevolatilefrom_canceledsince the_gatelock acquire inChildReapedprovides the necessary memory barrier on the read side