I've narrowed down an issue in our use of libcurl to this repro case. Getting the repro to trigger seems to be timing-dependent so it might take a couple of runs to hit.
This is CURL verbose output for a failed test run: fail.txt
The user-visible problem is that the response data to a simple http request occasionally is associated to another http request in flight (to the same host), when multiple requests are pipelined over the same connection. The curl pipelining maintains a queue of easy handles for feeding the results to, under some conditions it seems that the queue gets sorted into incorrect order. A necessary condition for triggering the error seems to be that there are request both in sending and receiving states, the repro case accomplishes this by pushing in new requests in response to reading responses.
I'm using current github master (commit 55f4aba) but the issue triggers with stable versions as well (tried 7_39_0 and 7_53_1). The issue seems platform independent, FWIW I'm running on Win10 with configuration "LIB Debug - DLL Windows SSPI".
I can get the issue to repro against a number of different HTTP servers and I've verified that there's nothing suspicious in the data returned by the server on the wire.
I'm happy to provide any further data that may be useful for tracking down the root cause.
The text was updated successfully, but these errors were encountered:
I spent some time root causing this and I believe I know what is happening.
The culprit seems to be USE_RECV_BEFORE_SEND_WORKAROUND. Received data may be buffered in connectdata.postponed and it might get silently discarded on connection reuse in the call to conn_reset_all_postponed_data() in url.c:6232
conn_reset_all_postponed_data(conn); /* reset unprocessed data */
This causes the next read to read the response of a more recent request, I suppose with larger payloads the buffered data might not be limited to full response boundaries either so garbage would be read in place of valid HTTP headers.
Since this requires USE_RECV_BEFORE_SEND_WORKAROUND my original statement of being platform independent is clearly wrong, the issue is Windows specific.
I don't understand what the motivation behind discarding buffered data upon connection reuse is, but just commenting out that offending line does seem to resolve the issue.
@jay I can test later, but I don't think that patch is going to work. The original problem is that there is postponed data in "conn" that gets reset. There is no postponed data in "old_conn" and how could there be, since that connection was just created. Your change overwrites existing postponed data with no postponed data, leaking the buffer(s) in doing so. Right?
Your change overwrites existing postponed data with no postponed data, leaking the buffer(s) in doing so. Right?
Yes I had it backwards, old_conn is a pointer to the temporary connection struct created after conn and conn holds the connection, so there's no reason to transfer. Your fix is fine then. Thanks for the great analysis and repro case.
locked as resolved and limited conversation to collaborators
May 6, 2018