HTTP pipelining and reusing easy handles causes mismatched request/response #1380

Closed
ahatala opened this Issue Apr 3, 2017 · 7 comments

Projects

None yet

3 participants

@ahatala
ahatala commented Apr 3, 2017

I've narrowed down an issue in our use of libcurl to this repro case. Getting the repro to trigger seems to be timing-dependent so it might take a couple of runs to hit.

This is CURL verbose output for a failed test run: fail.txt

The user-visible problem is that the response data to a simple http request occasionally is associated to another http request in flight (to the same host), when multiple requests are pipelined over the same connection. The curl pipelining maintains a queue of easy handles for feeding the results to, under some conditions it seems that the queue gets sorted into incorrect order. A necessary condition for triggering the error seems to be that there are request both in sending and receiving states, the repro case accomplishes this by pushing in new requests in response to reading responses.

I'm using current github master (commit 55f4aba) but the issue triggers with stable versions as well (tried 7_39_0 and 7_53_1). The issue seems platform independent, FWIW I'm running on Win10 with configuration "LIB Debug - DLL Windows SSPI".

I can get the issue to repro against a number of different HTTP servers and I've verified that there's nothing suspicious in the data returned by the server on the wire.

I'm happy to provide any further data that may be useful for tracking down the root cause.

ahatala commented Apr 3, 2017

I spent some time root causing this and I believe I know what is happening.

The culprit seems to be USE_RECV_BEFORE_SEND_WORKAROUND. Received data may be buffered in connectdata.postponed and it might get silently discarded on connection reuse in the call to conn_reset_all_postponed_data() in url.c:6232

conn_reset_all_postponed_data(conn); /* reset unprocessed data */

This causes the next read to read the response of a more recent request, I suppose with larger payloads the buffered data might not be limited to full response boundaries either so garbage would be read in place of valid HTTP headers.

Since this requires USE_RECV_BEFORE_SEND_WORKAROUND my original statement of being platform independent is clearly wrong, the issue is Windows specific.

I don't understand what the motivation behind discarding buffered data upon connection reuse is, but just commenting out that offending line does seem to resolve the issue.

@jay jay added the HTTP label Apr 3, 2017
Owner
jay commented Apr 3, 2017

Yeah that looks strange, it probably needs a move/copy_all_postponed_data instead of freeing it.

I suppose with larger payloads the buffered data might not be limited to full response boundaries either so garbage would be read in place of valid HTTP headers.

Can you explain how that could happen?

Owner
bagder commented Apr 3, 2017

That code section comes from commit 72d5e14, issue #657 and PR #668 ... Maybe we just need to add a condition to not do that if pipelining is used?

Owner
jay commented Apr 3, 2017

I think it would be better to just keep it regardless so it doesn't come back to bite some other way. I don't understand why it's being freed there.

@jay jay added a commit to jay/curl that referenced this issue Apr 4, 2017
@jay jay url: fix conn reuse when postponed data. draft1
- Don't free the postponed struct on conn reuse; transfer it to the
  replacement conn instead.

- Encapsulate postponed struct initialization.

Bug: curl#1380
Reported-by: Antti Hätälä
46c79fd
ahatala commented Apr 4, 2017

@jay I can test later, but I don't think that patch is going to work. The original problem is that there is postponed data in "conn" that gets reset. There is no postponed data in "old_conn" and how could there be, since that connection was just created. Your change overwrites existing postponed data with no postponed data, leaking the buffer(s) in doing so. Right?

@jay jay added a commit that closed this issue Apr 4, 2017
@jay Antti Hätälä + jay url: don't free postponed data on connection reuse
- Don't free postponed data on a connection that will be reused since
  doing so can cause data loss when pipelining.

Only Windows builds are affected by this.

Closes #1380
36e604f
@jay jay closed this in 36e604f Apr 4, 2017
Owner
jay commented Apr 4, 2017 edited

Your change overwrites existing postponed data with no postponed data, leaking the buffer(s) in doing so. Right?

Yes I had it backwards, old_conn is a pointer to the temporary connection struct created after conn and conn holds the connection, so there's no reason to transfer. Your fix is fine then. Thanks for the great analysis and repro case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment