New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data race in ftp test server #12002
Comments
Here's a similar log from another failed test (test 932, SMTP CRAM-MD5 graceful cancellation). In this one, the client receives the SMTP banner message 220 code twice in a row. The server log shows the server sending the 220, then
i.e. the client supposedly disconnected then immediately reconnected. The curl log shows no such disconnecte/reconnect. The sockfilt log shows the a connect/disconnect/connect happening before the first 220 is sent, then the second 220 is sent without any intervening disconnect. So, the problem looks to be in the server and not the sockfilt. This one is for a different build (still Windows) but CMake, VS2010 this time, but I can't say for certain that it's Windows-only. |
Here's my current working theory for what's happening in the test 932 case above. This matches the logs nicely, although I can't match up the timing because hi-res timing isn't available in the Windows sockfilt.
† The only open question is why in step 2 the client disconnects. Could it be this same race condition with the test server's verifyftp() call? In any case, it doesn't really matter because an unexpected disconnect could happen for any reason and the sockfilt+server would get out of sync and potentially stay out of sync. I can think of two easy(ish) ways to fix this:
The first one is probably the simplest. |
If a client disconnected and reconnected quickly, before the ftp server has a chance to respond, the protocol message/ack (ping/pong) sequence got out of sync, causing messages sent to the old client to be delivered to the new. A disconnect must now be acknowledged and intermediate requests thrown out until it is, which ensures that such synchronization problems can't occur. This problem could affect ftp, pop3, imap and smtp tests. Fixes #12002 Closes #12049
Thanks a lot for debugging and fixing this. 👍 |
If a client disconnected and reconnected quickly, before the ftp server had a chance to respond, the protocol message/ack (ping/pong) sequence got out of sync, causing messages sent to the old client to be delivered to the new. A disconnect must now be acknowledged and intermediate requests thrown out until it is, which ensures that such synchronization problems can't occur. This problem could affect ftp, pop3, imap and smtp tests. Fixes curl#12002 Closes curl#12049
I did this
I've seen many FTP tests failing that have appear to result in a control message intended for one FTP client to be given to the next client. This results in a test log like this one (test 1091). The client errors out with
curl: (8) Got a 226 ftp-server response when 220 was expected
in this case.What appears to happen (with timestamps from the sample
ftp_server.log
in the log above) is as follows:226 File transfer complete
status code on the control connection (which was just closed—race is somewhere here)There are some problems with the simple explanation of this scenario:
The
ftp_sockctrl.log
is probably the most explicit in showing what's going on. Some excerpts:I'm not sure if the client here is actually the test server or curl, but it doesn't really matter. It's showing that it's sending a 226 code (response to a completed file transfer) as the first thing to a new client that just connected.
I expected the following
Test servers should not have race conditions.
curl/libcurl version
8.4.0-DEV
operating system
The sample log comes from mingw-w64, and this problem likely happens more often on Windows, but it may have happen on other OSes as well.
The text was updated successfully, but these errors were encountered: