New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TFTP retries too quickly on mingw-w64 (test 286) #12040
Comments
Created #12047 for researching this |
I tried applying this patch to delay the first data block ACK to see if my speculation about whether that is the trigger was true, but on Linux everything worked as expected and test 286 succeeded. It would be great if someone could try this on msys2 and see if it triggers the bug or not. diff --git a/tests/server/tftpd.c b/tests/server/tftpd.c
index 670897c0d..c41aaf3e4 100644
--- a/tests/server/tftpd.c
+++ b/tests/server/tftpd.c
@@ -1271,6 +1271,7 @@ send_ack:
goto abort;
}
write_behind(test, pf->f_convert);
+ sleep(90); /* DEBUGGING #12040 */
for(;;) {
#ifdef HAVE_ALARM
alarm(rexmtval); |
I caught this test failing (CMake, mingw-w64, gcc 9, Debug x64, Schannel, Static, Unity) with your extra logging added. Here's the start of the log (it continues in the same way for 2 msec until the retries are up):
The logging shows I don't see any relationship between the various time values above on the bit level (although it's interesting that 65023 has 15 of 16 binary 1 bits). Both numbers are set with time(2) and they're both using |
One more data point: this test has been failing only on the CMake, mingw-w64, gcc 9, Debug x64, Schannel, Static, Unity build, and it started failing the day after the job was created. It's failing at the rate of 30.2% of runs. It's likely to be an issue specific in that one environment, perhaps, dare I suggest it, a compiler bug. I don't think it's the only build using gcc 9, though. I can't tell what version of gcc 9 but most of the Circle CI builds are on Ubuntu 20.04 which is also running a gcc 9 (9.3.0), although it could be a problem affecting the specific point revision it's using. Whatever the root cause, I see no evidence that it's a problem with curl. My inclination is to just ignore the results of this test on this one build and be done with it. |
It gets my vote as well. 👍 |
This test fails sometimes with a super fast retry loop due to what may just be a compiler bug. The test results are ignored on the one CI job where it occurs because there seems to be nothing we can do to fix it. Fixes curl#12040 Closes curl#12106
I did this
This CI run shows test 286 (TFTP send of boundary case 512 byte file) failing with a timeout. The logs shows it sending the write request and (presumably) getting the response, then sending the first (and only) data block and waiting for the ACK, but the ACK is missing (possibly due to the server being slow). It then retries and goes through this in a loop 50 times before giving up.
This is what I expect should happen in this case except for one thing: it only waits about 40 µsec between retries. The entire retry attempt (50 retries) takes about 3 msec. Something is clearly wrong with the retry timer here.
What might be relevant here is that this is the very first data block whose ACK is missing. Could it be the right retry time hasn't been set yet and so it retries for 0 seconds?
I expected the following
curl should way 6 seconds between retries. That should be enough for even an extremely slow server to respond with an ACK, and the test should succeed.
curl/libcurl version
8.4.0-DEV
operating system
MSYS_NT-10.0-14393 APPVYR-WIN 3.0.7-338.x86_64 2019-07-11 10:58 UTC x86_64 Msys
The text was updated successfully, but these errors were encountered: