-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test 1564 failing intermittently on illumos #5037
Comments
You are saying that this failure is new in 7.69.0 though and it didn't use to do this in 7.68.0 ? If so, then it smells like perhaps #5019 caused this regression? |
Yes, this is a new failure with 7.69.0. I'll try reverting that change and see if the test starts working again. |
It looks like the same test failure is there in 7.68.0 too, I just didn't see it when I ran the testsuite during that upgrade, just (un)lucky I suppose. |
Ah, ok. Then the explanation isn't the commit mentioned above at least... |
@citrus-it any chance you can debug it? The test case first calls wakeup Then the test calls Then it calls |
Yes, I can look into it over the next week and get back to you. |
I did a bit of digging last night and it looks like the I added some sread() calls after the recv() loop and see this when the test works:
but when the test fails:
Interestingly, even in the loop not all calls to
|
Clearly on this platform, previous excessive amounts of wakeup-calls can linger in the socketpair pipe so that they will cancel future |
Yes, it appears that EAGAIN indicates that no data is available just now, but not that the pipe is necessarily empty. It's as if there is some latency there. I will ask some other developers about this. |
I wrote a small program to test this behaviour and tried it on a few different platforms that I have to hand. On OmniOS, OpenIndiana and Solaris the read() loop does not always fully drain the pipe. I assume from the comment in the code that Windows is maybe in the first category too. |
Thanks for this excellent input and data. So maybe we can change the test to just not be that excessive? Does the test work if you change We could possibly even consider changing the documentation to say something about these new findings but it also seems like a rather extreme edge case when someone would call wakeup() on the handle this many times without it even being "active". |
I did try with 8192 writes since I noticed that was the number that were successfully written on FreeBSD and MacOSX before write started returning EAGAIN whereas on the other platforms that number is 21504. I just ran the attached test program 100 times in a loop like this:
It was also fine with 512, 1024, 2048 and 4096 With 8192, I got
So, yes, I don't think this is a problem with normal operation of the library. The test could be less aggressive :) |
This test does A LOT of *wakeup() calls and then calls curl_multi_poll() twice. The first *poll() is then expected to return early and the second not - as the first is supposed to drain the socketpair pipe. It turns out however that when given "excessive" amounts of writes to the pipe, some operating systems (the Solaris based are known) will return EAGAIN before the pipe is drained, which in our test case causes the second *poll() call to also abort early. This change attempts to avoid the OS-specific behaviors in the test by reducing the amount of wakeup calls from 1234567 to 10. Reported-by: Andy Fiddaman Fixes #5037
@citrus-it, you good with the suggested PR? |
Yes, looks good to me, thanks. |
I did this
Upgrading to curl 7.69.0 on OmniOS (an illumos distribution)
I expected the following
No new failing tests. The
curl_multi_poll
test (1564) is failing around 7/10 of the time with:always in the block which is disabled for Windows - I wonder if illumos/Solaris has the same asynchronous socketpair feature?
operating system
OmniOS bloody
The text was updated successfully, but these errors were encountered: