Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kevent signals data available on a socket but ioctl(...,FIONREAD,...) returns 0 #19407

Closed
stephentoub opened this issue Nov 17, 2016 · 16 comments · Fixed by dotnet/corefx#18384
Closed
Assignees
Labels
area-System.Net.Sockets bug disabled-test The test is disabled in source code against the issue os-mac-os-x macOS aka OSX test-run-core Test failures in .NET Core test runs tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Milestone

Comments

@stephentoub
Copy link
Member

There appears to be a race condition in the implementation of kqueues on macOS, such that kevent can wake up for a read on a socket (meaning at least 1 byte of data is available) but an ioctl(..., FIONREAD, ...) call immediately after that completion can still return 0. This can manifest as code like:

await socket.ReceiveAsync(Array.Empty<byte>(), 0, 0, SocketFlags.None);
int count = socket.Available;

getting a count of 0 instead of something larger.

@karelz
Copy link
Member

karelz commented Mar 2, 2017

@stephentoub how often it happens? Mac is not production environment, could we move it to Future?
cc @geoffkizer

@stephentoub
Copy link
Member Author

stephentoub commented Mar 2, 2017

how often it happens?

Very frequently, at least on 10.11; I've not tried on 10.12.

@karelz
Copy link
Member

karelz commented Mar 2, 2017

OK, assigning to @geoffkizer for further look.

@geoffkizer
Copy link
Contributor

Is this actually a bug? It seems like users just need to be aware that this happens and deal with it.

@stephentoub
Copy link
Member Author

Is this actually a bug?

Here's the situation that caused this to be opened. You've got a high-scale server receiving, let's say, on 50,000 sockets. You don't want to have to create 50,000 buffers, so instead you issue a 0-length ReceiveAsync on each, so that you can be notified when there is data available to be read, and only then get a buffer to use. To know the size buffer to use, you check DataAvailable, but on macOS it sometimes returns 0 even though you were just notified that there's data available.

Certainly seems like a bug, but if nothing else, it's a noticeable difference in behavior between Windows/Linux and macOS.

I'm not exactly sure how you'd deal with it in a situation like that above. Spin until DataAvailable became positive? Spin in an asynchronous loop issuing ReadAsync/DataAvailable calls until it became positive? Something else?

@geoffkizer
Copy link
Contributor

Spin in an asynchronous loop issuing ReadAsync/DataAvailable calls until it became positive?

Yes, this seems like the right approach. Most likely you've got an async loop that's consuming incoming data. If you get DataAvailable = 0, just continue to the next iteration of the loop.

I do agree this is annoying, but I'm not sure how we would fix it without introducing potential perf problems. Thoughts?

@stephentoub
Copy link
Member Author

If you get DataAvailable = 0, just continue to the next iteration of the loop.

How do you differentiate "ReadAsync completed because there's data available" and "ReadAsync completed because the server is done sending data"?

@geoffkizer
Copy link
Contributor

That's a very good question. Seems like we need to do something here, then.

@geoffkizer
Copy link
Contributor

Can you reproduce this easily?

@stephentoub
Copy link
Member Author

I've not tried since Nov. At the time I was able to repro it almost on demand, just running a little repro in a loop.

@stephentoub
Copy link
Member Author

Can you reproduce this easily?

Yes. I made the test run 1000 times and submitted it to CI:
stephentoub/corefx@dfba21c
https://ci.dot.net/job/dotnet_corefx/job/master/job/osx_debug_prtest/4001/
It failed basically every iteration.

@geoffkizer
Copy link
Contributor

Do we know if freebsd/MacOS actually supports this? I'm just speculating, but it seems possible that freebsd just happily succeeds any zero-byte read, even when no data is available.

@stephentoub
Copy link
Member Author

That may very well be the problem. In which case that's the underlying issue for which it'd be great to find a solution :)

@geoffkizer
Copy link
Contributor

It would be easy to determine if that is the underlying issue -- the assert about IsCompleted would fire. Unfortunately the previous test run results no longer seem to be available.

@geoffkizer
Copy link
Contributor

Per @Priya91, this is in fact failing on the IsCompleted assert.

Need to think about how to fix this.

@geoffkizer
Copy link
Contributor

I think the right way to fix this is to special-case zero-length receive and check DataAvailable instead. If DataAvailable is 0, we need to enqueue the op as usual, and then check DataAvailable again.

There are some cases where we might be able to know in advance that data is available; e.g. we've received an epoll notification and haven't done a receive in the mean time. It's not clear to me if trying to do this optimization is worth it.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Sockets bug disabled-test The test is disabled in source code against the issue os-mac-os-x macOS aka OSX test-run-core Test failures in .NET Core test runs tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants