Fixes some theoretical races and retentions in AsyncOperation and ManualResetValueTaskSource#127256
Fixes some theoretical races and retentions in AsyncOperation and ManualResetValueTaskSource#127256VSadov wants to merge 4 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @VSadov |
There was a problem hiding this comment.
Pull request overview
This PR updates the completion/continuation signaling mechanics for System.Threading.Channels.AsyncOperation and ManualResetValueTaskSourceCore<TResult> to reduce theoretical races and retained-state rooting by making completion primarily signaled via a sentinel and clearing captured state more aggressively.
Changes:
- Switch completion detection to use a volatile read of the continuation sentinel.
- Refactor
OnCompletedrace handling to queue continuations directly and clear stored context/state earlier. - Add
ClearRetainedStatehooks for pooled async operations to reduce retained references.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/libraries/System.Threading.Channels/src/System/Threading/Channels/AsyncOperation.cs | Adjusts completion detection/OnCompleted race path and introduces retained-state clearing for pooled operations. |
| src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Sources/ManualResetValueTaskSourceCore.cs | Removes _completed flag, uses sentinel-based completion, and refactors OnCompleted/SignalCompletion to align with the new completion model. |
Comments suppressed due to low confidence (1)
src/libraries/System.Threading.Channels/src/System/Threading/Channels/AsyncOperation.cs:283
- SetCompletionAndInvokeContinuation sets
_continuation = s_completedSentinelvia a normal write, while completion is now observed viaVolatile.Read(ref _continuation)in IsCompleted/GetStatus. To avoid weak-memory visibility issues (e.g., other threads seeing completion without reliably seeing prior _error/_result writes, or seeing a stale non-sentinel value), set the sentinel using a release operation (Volatile.Write or an interlocked write) in both the non-EC and EC paths.
Action<object?> c = _continuation!;
_continuation = s_completedSentinel;
object? state = _continuationState;
_continuationState = null;
c(state);
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Threading.Channels/src/System/Threading/Channels/AsyncOperation.cs:283
IsCompletednow usesVolatile.Read(ref _continuation), but_continuationis set tos_completedSentinelvia a plain write inSetCompletionAndInvokeContinuation. Consider usingVolatile.Write(orInterlocked.Exchange) when publishing the completion sentinel so other threads that observe completion viaIsCompletedreliably see prior writes (e.g.,_error/ result) and don't observe stale state.
object? ctx = _capturedContext;
_capturedContext = null;
ExecutionContext? ec =
ctx is null ? null :
ctx as ExecutionContext ??
(ctx as CapturedSchedulerAndExecutionContext)?._executionContext;
if (ec is null)
{
Action<object?> c = _continuation!;
_continuation = s_completedSentinel;
object? state = _continuationState;
_continuationState = null;
c(state);
| _continuation = null; | ||
| Debug.Assert(_continuationState == null); | ||
| Debug.Assert(_capturedContext == null); | ||
| _continuationState = null; |
There was a problem hiding this comment.
I am still clearing _continuationState and _capturedContext here, not relying on assertions since the type is public.
The assertions are there to catch internal misuse.
|
|
||
| Action<object?>? continuation = | ||
| Volatile.Read(ref _continuation) ?? | ||
| Interlocked.CompareExchange(ref _continuation, ManualResetValueTaskSourceCoreShared.s_sentinel, null); |
There was a problem hiding this comment.
Here the MRVTS becomes officially completed. That is - unless it already has a continuation, in such case it becomes completed a few lines later, right before we deal with continuation.
There is no need for _completed field.
There was a problem hiding this comment.
This is the same pattern as used in AsyncOperation
| @@ -139,7 +139,7 @@ private CancellationToken CancellationToken | |||
| #endif | |||
|
|
|||
| /// <summary>Gets whether the operation has completed.</summary> | |||
| internal bool IsCompleted => ReferenceEquals(_continuation, s_completedSentinel); | |||
| internal bool IsCompleted => ReferenceEquals(Volatile.Read(ref _continuation), s_completedSentinel); | |||
There was a problem hiding this comment.
Performing an ordinary read could allow prefetching _error and we could end up assuming that a failed/cancelled operation actually cleanly completed.
|
I think this is ready for a review. Wasm failures appear unrelated. |
| // and this write is after that. | ||
| _continuation = ManualResetValueTaskSourceCoreShared.s_sentinel; |
There was a problem hiding this comment.
In SignalCompletion, _continuation = ManualResetValueTaskSourceCoreShared.s_sentinel; is a non-volatile write. Since IsCompleted uses Volatile.Read(ref _continuation) to publish/observe completion, this should be a releasing write (e.g., Volatile.Write or an interlocked exchange) to ensure other threads can't observe completion before _result / _error are visible (especially on weak memory models like ARM64).
| // and this write is after that. | |
| _continuation = ManualResetValueTaskSourceCoreShared.s_sentinel; | |
| // and this write is after that. Use a volatile write so observers that | |
| // use Volatile.Read(ref _continuation) can't observe completion before | |
| // the preceding writes are visible. | |
| Volatile.Write(ref _continuation, ManualResetValueTaskSourceCoreShared.s_sentinel); |
There was a problem hiding this comment.
As explained in the comments the interlocked operation that stores continuation already orders this write the way we want.
Possibly fixes: #127234
AsyncOperationin completed state, very rarely on ARM64. Seems like an ordering issue, since it is on ARM64.I see one case where it seems we miss a fence. Not sure if it is related. The actual failure is very hard to repro on purpose.
Possibly fixes: #126735
While logically runtime async captures the same state, there are some implementation details like in the
{_continuation, _continuationState}tuple, the continuation may retain less, but state may retain more, compared to regular async. There are code paths that clear continuation, but do not clear the state, thus can observe the difference.We should fix that regardless, even if #126735 is caused by something else.
The main idea is that continuation/state are needed only until the continuation runs. Retaining longer than that in poolable/reusable containers (i.e. until container is reused) can potentially lead to observable issues like last used connection not disposing/closing if no further activity in the app.
Same applies to the captured context - as soon as we do not need it, it would not hurt to clear fields holding to the context. No problem was observed due to this, but just in case...
This PR also makes AsyncOperation and ManualResetValueTaskSource to be slightly more similar. Mostly by using the completion sentinel as the only way to signal completion + some minor tweaks, cross-copying comments,...