You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ASP.NET Core has tests where the client reads response data at an artificially slow rate, but above the configured minimum rate enforced by the Kestrel HTTP server. These tests started becoming flaky with the client observing a “Connection reset by peer” SocketExeption on Linux when we updated the AspNetCore repo to depend on .NET Core 5.0.
In these tests, Kestrel calls Socket.Shutdown(SocketShutdown.Both) and then Socket.Dispose() immediately after the last Socket.SendAsync() Task completes. There isn’t any special LingerState or anything like that. I know the standard way to close a socket is to close the sending side, wait to receive a FIN (a 0-length read with a timeout), and then dispose the socket, but this is the logic we’ve had in the Socket transport since 2.0 and the libuv transport since 1.0 and these tests weren’t flaky before and still aren’t flaky on Windows or macOS.
The PR (dotnet/aspnetcore#13532) where I clean up the flaky tests a couple of days after taking the .NET Core 5.0 dependency goes into more detail about the flaky tests. @jkotalik looked through changes made to Sockets after 3.0 that might explain this regression, and he found dotnet/corefx#38804 which is a PR titled “Socket: improve cross-platform behavior on Dispose.” I agree that this PR looks pretty suspicious.
I tried creating a minimal repro for this issue without Kestrel or any testing infrastructure, but to this point I haven’t been successful in getting a repro. I thought that simply reading response data slowly from a Socket that was already shutdown and disposed by the peer would be sufficient, but apparently there’s something more to this regression than I realize. Here’s a gist with my minimal repro attempt (that doesn’t repro yet).
Although you've called Send and Shutdown in the server, the data may still be in the kernel send buffer (so the peer hasn't observed a FIN close). Then when we call Disconnect, the data gets thrown away and the peer immediately sees a RST close.
The fix is to call Shutdown instead of Disconnect in TryUnblockSocket in case the user has already made an explicit call to Shutdown for the Send/Both end.
I think so. In the test, the client is still receiving data for several seconds after server calls Socket.Shutdown(SocketShutdown.Both). Then again, so does my gist which I haven't seen repro my issue.
ASP.NET Core has tests where the client reads response data at an artificially slow rate, but above the configured minimum rate enforced by the Kestrel HTTP server. These tests started becoming flaky with the client observing a “Connection reset by peer” SocketExeption on Linux when we updated the AspNetCore repo to depend on .NET Core 5.0.
In these tests, Kestrel calls Socket.Shutdown(SocketShutdown.Both) and then Socket.Dispose() immediately after the last Socket.SendAsync() Task completes. There isn’t any special LingerState or anything like that. I know the standard way to close a socket is to close the sending side, wait to receive a FIN (a 0-length read with a timeout), and then dispose the socket, but this is the logic we’ve had in the Socket transport since 2.0 and the libuv transport since 1.0 and these tests weren’t flaky before and still aren’t flaky on Windows or macOS.
The PR (dotnet/aspnetcore#13532) where I clean up the flaky tests a couple of days after taking the .NET Core 5.0 dependency goes into more detail about the flaky tests. @jkotalik looked through changes made to Sockets after 3.0 that might explain this regression, and he found dotnet/corefx#38804 which is a PR titled “Socket: improve cross-platform behavior on Dispose.” I agree that this PR looks pretty suspicious.
I tried creating a minimal repro for this issue without Kestrel or any testing infrastructure, but to this point I haven’t been successful in getting a repro. I thought that simply reading response data slowly from a Socket that was already shutdown and disposed by the peer would be sufficient, but apparently there’s something more to this regression than I realize. Here’s a gist with my minimal repro attempt (that doesn’t repro yet).
@tmds @stephentoub
The text was updated successfully, but these errors were encountered: