Fix test 1238 causing shortly following tests 1242 and 1243 to fail #5364
The problem can be observed on AppVeyor (Windows) and CirrusCI (FreeBSD):
So the hypothesis is that the server spends more time on the test than the client.
The relevant tftpd server code can be found here:
Based on an analysis of the logs, the server code and a packet capture, the test seems misconfigured. It has one iteration too much.
output for: ./runtests.pl -n -p -v 1238
During the test I ran
Packet capture summary
33042 is taken from the debug output above. The columns shown are: 1) time since begin of the capture, 2) unique ip/port tuple identifier, 3) UDP source port, 4) UDP destination port, 5) extra info.
However if for whatever reason (timing, failure to deliver ICMP, etc.) the server receives an ACK, then the server will continue to read the next. In total the server could be spending about 6 seconds whereas the client fails within a second.
I believe that another way to fix this issue is to reduce the test data for test1238 from 1200 bytes (3 blocks) to two blocks (e.g. 600 bytes). Assuming that
Yes, but I would not call it a hypothesis since I observed it on 2 CIs and in my local tests.
That "read: fail" does not happen on FreeBSD or Windows. The server is stuck reading and will timeout after 5 tries of 5 second waits using the alarm functionality instead of read timeouts.
Why would there be an ICMP deliver? curl just does not send the data via UDP packet anymore and the server is stuck waiting for that. Since there is no connection handling in UDP unlike TCP, the server will never get a "connection close event" from the client and read will not return 0.
Anyway, I will try your suggestion about reducing the transfer size. But are you sure that doesn't change the purpose of test 1238 timing out during the transfer instead of after the transfer has already finished and reached the server?
setjmp.h should only be included if HAVE_SETJMP_H is defined. Add additional log statements to see wether reads and writes are blocking or finishing before an alarm signal is received. Assisted-by: Peter Wu Part of #5364
Just for reference here is the the output of tftp_server.log after applying the logging patch:
As you can see curl is timing out during the 2nd block (as you described with Round 2), but there is no "read: fail" during test 1238 which can lead to tftp still waiting for 1238 when 1242 starts.
I have now also tried your proposal and this does not help with the TFTP server still being busy:
Also the test explicitly states the size of 1200 as part of the test behavior:
Therefore I would propose my original patch with an additional comment about the reason why killing the server is required in order to never affect other afterwards following TFTP tests.
The tftpd server may still be busy if the total timeout of 25 seconds has not been reached or no sread error was received during or after the execution of the timeout test 1238. Once the next TFTP test comes around (eg. 1242 or 1243), those will fail because the tftpd server is still waiting on data from curl due to the UDP protocol being stateless and having no connection close. On Linux this error may not happen, because ICMP errors generated due to a swrite error can also be returned async on the next sread call instead. Therefore we will now just kill the tftpd server after test 1238 to make sure that the following tests are not affected. This enables us to no longer ignore tests 1242, 1243, 2002 and 2003 on the CI platforms CirrusCI and AppVeyor. Assisted-by: Peter Wu Closes curl#5364
More background why you may be seeing a "read: fail" on Linux while that is not related to the read itself, but the write instead:
As far as I can tell FreeBSD and Windows do not behave like that. Also see comments here: 000f721.
Regarding my quoted above comment of course, I now understand that you most likely refer to the ICMP error messages regarding send and not read failure according to what I described above.