Our project uses libcurl (7.61.1, on Centos6, via the Rust crate) for (OpenSSL) HTTP/2 transfers, and we reuse the same Curl_easy session for multiple transfers. Importantly, we reuse the session even if a transfer fails.
At several customers, we started to see transfers repeatedly timing out on one Curl_easy session, and continuing to do this until we restarted the application. Strangely, logs proved that the transfer stalled at the point after they'd sent headers but before sending the body.
I expected the following
That we shouldn't generally timeout during a send, especially if we've already sent some headers!
I debugged the bug :-)
After many repros with a lot of extra logs added to the Curl code, I discovered that the transfers were failing because the Curl_easy's state.drain was permanently nonzero. (If this is set but there's nothing to drain, Curl just spins looking for something to drain until timeout.)
Many more temporary logs later, I discovered that the drain field can remain set on the Curl_easy when a transfer has failed, because the last time it's unset is at the beginning of Curl_http2_done(), but the nghttp2 functions called within that function can set it again.
I fixed the bug
Moving the call to drained_transfer() to a point later in Curl_http2_done() stopped our customers from getting into this permanent timeout state - further monitoring with another few logs confirmed that they were occasionally seeing regular timeouts such that the accused nghttp2 functions were indeed setting drain on after the last time it would previously have been switched off.
I'll attach a pull request of what I did in our codebase, but do feel free to point out edge-cases where this won't work for the wider Curl userbase!
The text was updated successfully, but these errors were encountered:
Various functions called within Curl_http2_done() can have the
side-effect of setting the Easy connection into drain mode (by calling
drain_this()). However, the last time we unset this for a transfer (by
calling drained_transfer()) is at the beginning of Curl_http2_done().
If the Curl_easy is reused for another transfer, it is then stuck in
drain mode permanently, which in practice makes it unable to write any
data in the new transfer.
This fix moves the last call to drained_transfer() to later in
Curl_http2_done(), after the functions that could potentially call for a