Skip to content

Fix WebSocket close single-consumer violation #127183

Open
cittaz wants to merge 1 commit intodotnet:mainfrom
cittaz:fix/websocket-http2-close-receive-race
Open

Fix WebSocket close single-consumer violation #127183
cittaz wants to merge 1 commit intodotnet:mainfrom
cittaz:fix/websocket-http2-close-receive-race

Conversation

@cittaz
Copy link
Copy Markdown
Contributor

@cittaz cittaz commented Apr 20, 2026

Summary

Fixes #121157 (and the likely related #117267 tracking the same assertion).

Adds a CAS guard so WaitForServerToCloseConnectionAsync runs at most once per WebSocket, eliminating a race that causes Http2Stream.TryReadFromBuffer's Debug.Assert(!_hasWaiter) to fire under concurrent CloseAsync + pending ReceiveAsync scenarios over HTTP/2 (RFC 8441).

Root cause

WaitForServerToCloseConnectionAsync has two call sites, both issuing _stream.ReadAsync for the RFC 6455 §7.1.1 "wait for server TCP close" step:

  1. HandleReceivedCloseAsync (line 1122) — inside the receive loop; holds _receiveMutex. Gated on !_isServer && _sentCloseFrame.
  2. SendCloseFrameAsync (line 1566) — after sending the close frame; does not hold _receiveMutex. Gated on !_isServer && _receivedCloseFrame.

When a user has a pending ReceiveAsync and concurrently calls CloseAsync, the two flags flip near-simultaneously. Both gates pass, both paths call WaitForServerToCloseConnectionAsync, both issue a read on the underlying stream.

Over HTTP/1.1 the behaviour is merely wrong (two consumers racing for the same socket bytes) and historically was tolerated by the error-handling paths. Over HTTP/2 (Http2Stream.Http2ReadWriteStream), the stream enforces a single-consumer invariant that trips Debug.Assert(!_hasWaiter) in TryReadFromBuffer, crashing the test process.

The scenario is explicitly listed as supported in the class-level thread-safety contract:

/// - It's acceptable to have a pending ReceiveAsync while
///   CloseOutputAsync or CloseAsync is called.

Fix

Guard WaitForServerToCloseConnectionAsync at its top with an interlocked compare-exchange on a new _waitedForServerClose flag. The first caller flips the flag and runs the wait; any subsequent caller sees the flag set and returns immediately.

Behavior preserved

  • RFC 6455 §7.1.1 "client waits for server TCP close" still runs exactly once for every WebSocket close handshake.
  • No public API or observable state-machine change.

Test-side comment clarification

The test that exercises this contract — RunClient_CloseAsync_DuringConcurrentReceiveAsync_ExpectedStates in CloseTest.cs — had a long-stale comment that misattributed where OperationCanceledException comes from in the abort branch. Updated the comment to accurately describe the outcomes and where the exception originates (ReceiveAsyncPrivate's catch block translates a stream exception to OperationCanceledException when _state == Aborted).

No test logic is changed.

Note for reviewers about work item visibility

I'm a community contributor and do not have access to the work-item logs for #121157 / #117267. I tried to identify the failing test, the race condition, and the fix by reading the code:

  • The stack trace in System.Net.WebSockets.Client.Tests: Assertion failed: !_hasWaiter #121157 narrows the failure to WaitForServerToCloseConnectionAsync running under Http2Stream.
  • Combined with the "HTTP/2 WebSocket tests under Helix" context and the class-level thread-safety doc, the test CloseAsync_DuringConcurrentReceiveAsync_ExpectedStates under the CloseTest_*_Http2Loopback classes matches exactly.

If a reviewer can confirm from the actual work-item logs that this test is the one triggering the assertion, I'll reference the run in this PR.

Local verification

  • Full libraries build (./build.cmd -subset libs -c Release): clean.
  • System.Net.WebSockets.Client.Tests build: clean.
  • Targeted test runs:
    • CloseTest_Invoker_Http2Loopback.CloseAsync_DuringConcurrentReceiveAsync_ExpectedStates (both useSsl values)
    • CloseTest_HttpClient_Http2Loopback.CloseAsync_DuringConcurrentReceiveAsync_ExpectedStates (both useSsl values)
    • Full CloseTest* regression run (no new failures)

When a user has a pending ReceiveAsync and calls CloseAsync over an
HTTP/2 WebSocket (RFC 8441), two code paths could both reach
WaitForServerToCloseConnectionAsync and each issue a ReadAsync on the
underlying Http2Stream:

 1. HandleReceivedCloseAsync (line 1122) — inside the receive loop;
    holds _receiveMutex.
 2. SendCloseFrameAsync (line 1566) — after sending the close frame;
    does not hold _receiveMutex.

Both paths gate on _sentCloseFrame and _receivedCloseFrame respectively.
In rare concurrent Close + Receive scenarios the two flags are set
near-simultaneously, both checks pass, and both paths try to read from
the stream. Http2Stream enforces a single-consumer invariant and trips
Debug.Assert(!_hasWaiter), crashing the test process.

The scenario is explicitly listed as supported in the class-level
thread-safety contract:

    /// - It's acceptable to have a pending ReceiveAsync while
    ///   CloseOutputAsync or CloseAsync is called.

Guard WaitForServerToCloseConnectionAsync with an Interlocked flag so
that only the first caller performs the wait; the other is a no-op.
This preserves RFC 6455 section 7.1.1 behavior (the wait still runs
exactly once), introduces no lock acquisition, and has no reentrancy
or lock-order implications.

Also clarify the long-stale comment in the test that exercises this
contract — the original attributed the OperationCanceledException path
to CloseAsync "receiving" the close frame, which is imprecise. The
exception originates from ReceiveAsyncPrivate's catch block when the
socket becomes Aborted during the close handshake (e.g. a 1s timeout
in WaitForServerToCloseConnectionAsync triggers Abort).
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 20, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @karelz, @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Net community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

System.Net.WebSockets.Client.Tests: Assertion failed: !_hasWaiter

2 participants