Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Socket Regression in .NET Core 5.0 on Linux #30895

Closed
halter73 opened this issue Sep 18, 2019 · 4 comments · Fixed by dotnet/corefx#41250
Closed

Possible Socket Regression in .NET Core 5.0 on Linux #30895

halter73 opened this issue Sep 18, 2019 · 4 comments · Fixed by dotnet/corefx#41250

Comments

@halter73
Copy link
Member

ASP.NET Core has tests where the client reads response data at an artificially slow rate, but above the configured minimum rate enforced by the Kestrel HTTP server. These tests started becoming flaky with the client observing a “Connection reset by peer” SocketExeption on Linux when we updated the AspNetCore repo to depend on .NET Core 5.0.

In these tests, Kestrel calls Socket.Shutdown(SocketShutdown.Both) and then Socket.Dispose() immediately after the last Socket.SendAsync() Task completes. There isn’t any special LingerState or anything like that. I know the standard way to close a socket is to close the sending side, wait to receive a FIN (a 0-length read with a timeout), and then dispose the socket, but this is the logic we’ve had in the Socket transport since 2.0 and the libuv transport since 1.0 and these tests weren’t flaky before and still aren’t flaky on Windows or macOS.

The PR (dotnet/aspnetcore#13532) where I clean up the flaky tests a couple of days after taking the .NET Core 5.0 dependency goes into more detail about the flaky tests. @jkotalik looked through changes made to Sockets after 3.0 that might explain this regression, and he found dotnet/corefx#38804 which is a PR titled “Socket: improve cross-platform behavior on Dispose.” I agree that this PR looks pretty suspicious.

I tried creating a minimal repro for this issue without Kestrel or any testing infrastructure, but to this point I haven’t been successful in getting a repro. I thought that simply reading response data slowly from a Socket that was already shutdown and disposed by the peer would be sufficient, but apparently there’s something more to this regression than I realize. Here’s a gist with my minimal repro attempt (that doesn’t repro yet).

@tmds @stephentoub

@davidsh
Copy link
Contributor

davidsh commented Sep 18, 2019

@wfurt

@wfurt
Copy link
Member

wfurt commented Sep 18, 2019

cc: @tmds

@tmds
Copy link
Member

tmds commented Sep 19, 2019

These tests started becoming flaky with the client observing a “Connection reset by peer”

This is caused by calling Disconnect:

https://github.com/dotnet/corefx/blob/19b304f7815894b13cb61e87e1c9eac49a474c7e/src/System.Net.Sockets/src/System/Net/Sockets/SafeSocketHandle.Unix.cs#L397

This happens in TryUnblockSocket to cancel on-going operations from the socket when the handle isn't released immediately on Dispose:

https://github.com/dotnet/corefx/blob/b49a8a9be1d53cd9e50cb68fd8540be25c65d433/src/System.Net.Sockets/src/System/Net/Sockets/SafeSocketHandle.cs#L176-L184

Although you've called Send and Shutdown in the server, the data may still be in the kernel send buffer (so the peer hasn't observed a FIN close). Then when we call Disconnect, the data gets thrown away and the peer immediately sees a RST close.

The fix is to call Shutdown instead of Disconnect in TryUnblockSocket in case the user has already made an explicit call to Shutdown for the Send/Both end.

@halter73 does this match with the test?

@halter73
Copy link
Member Author

@halter73 does this match with the test?

I think so. In the test, the client is still receiving data for several seconds after server calls Socket.Shutdown(SocketShutdown.Both). Then again, so does my gist which I haven't seen repro my issue.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants