Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious "Connection died, tried 5 times before giving up" using HTTP/2 #7400

Closed
alexcrichton opened this issue Jul 15, 2021 · 14 comments
Closed
Assignees
Labels

Comments

@alexcrichton
Copy link

@alexcrichton alexcrichton commented Jul 15, 2021

I did this

Currently the Cargo package manager for Rust uses Curl to download files over HTTP, namely dependency libraries. Recently Cargo, on Rust's nightly channel, updated from curl 7.76.0 to curl 7.77.0. Since updating users are reporting an increase in errors seen while downloading.

I've personally been seeing this in a number of CI builds that use Cargo with curl 7.77.0, and the errors as-reported by Cargo is:

error: failed to download from `https://crates.io/api/v1/crates/wat/1.0.38/download`

Caused by:
  [55] Failed sending data to the peer (Connection died, tried 5 times before giving up)

I'm able to reproduce this locally on a machine where it just downloads dependencies in a loop, and frequently it will hit this error while downloading. Since this was an apparent regression in Curl I did a bisection of commits between 7.76.0 and 7.77.0 where on each commit I built that version of Curl into Cargo and then downloaded all of Cargo's own dependencies 100 times, where if any download failed that mean the commit was "bad". Bisection resulted in pointing to this commit -- 252790c -- where download would fail spuriously after that but would succeed at least 100 times before that.

Unfortunately reproduction is unlikely to be super easy unless you want to dive into Cargo (which I'm assuming you'd rather not). Cargo uses Curl as a library, so Cargo's usage is pretty far removed from the curl command line tool for example. In theory Cargo is doing pretty a pretty bland request of "please download all these files over HTTP/2" with some reasonable timeout and retry limits configured. Cargo enables HTTP/2 and pipewait with a maximum of 2 connections per-host at this time.

To hopefully help in assisting with debugging this I captured debug logs from curl during a failed transfer. These debug logs were captured at 252790c with DEBUG_HTTP2 enabled as well.

I'm happy to help test out patches and/or get more debugging information if necessary. I can post reproduction steps as well but it requires using Rust to build Cargo and having a nontrivial setup to thread through a custom Curl dependency into Cargo.

I expected the following

Given the frequency of these errors it seems unlikely that this is caused by an actual spurious error, so the expectation is that something is retried internally as necessary or requeued as necessary without bubbling up an error.

curl/libcurl version

This relates to curl 7.77.0, specifically an apparent regression introduced with 252790c

operating system

The system I tested this on was:

Linux 4.19.0-9-arm64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) aarch64 GNU/Linux

but this seems to appear frequently with GitHub Actions runners as well, and I've seen it locally on my macbook too.

@alexcrichton
Copy link
Author

@alexcrichton alexcrichton commented Jul 20, 2021

Some further debugging and I think I've got a way to reproduce this a bit easier. Given this urls.txt file I can reproduce the issue with the curl command line executable:

curl --remote-name-all -Ss $(cat /urls.txt) -L -Z --parallel-max 100 --http2-prior-knowledge

I've re-bisected using this command, and bisection again points at 252790c as the commit between 7.76 and 7.77 where errors started happening.

For reference I used this Dockerfile for bisection and for nghttp2 it was using nghttp2/nghttp2@31253f4 (current HEAD at the time of me running this). I was also building locally with ./configure --disable-shared --enable-debug --with-openssl --with-nghttp2.

The custom build also yields:

# curl --version --verbose
curl 7.77.0-DEV (aarch64-unknown-linux-gnu) libcurl/7.77.0-DEV OpenSSL/1.1.1f zlib/1.2.11 nghttp2/1.40.0
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS Debug HSTS HTTP2 HTTPS-proxy IPv6 Largefile libz NTLM NTLM_WB SSL TLS-SRP TrackMemory UnixSockets

and these are the verbose logs of a failed transfer

@alexcrichton
Copy link
Author

@alexcrichton alexcrichton commented Jul 20, 2021

Er sorry, I forgot to enable http2 debug logging for those logs. This is a failure log with DEBUG_HTTP2 defined in lib/http2.c.

Additionally I'm not sure if this is related to #5611, it may be the same issue?

@bagder
Copy link
Member

@bagder bagder commented Jul 29, 2021

curl --remote-name-all -Ss $(cat /urls.txt) -L -

Does this command line reliably reproduce the problem for you? I've run it many times now on my dev box (using current curl built from git) but it always succeeds. Does the problem still remain for you with curl 7.78.0 ?

@bagder bagder self-assigned this Jul 29, 2021
@alexcrichton
Copy link
Author

@alexcrichton alexcrichton commented Jul 29, 2021

I indeed cannot reproduce on the current master branch, but I believe the reason for that is that nghttp2 is no longer detected after b4b34db, so http2 isn't used (and there's no bug unless http2 is used).

Inside of config.log I see:

/usr/bin/ld: cannot find -l-lnghttp2

Is this a local configuration error for me? When you build locally do you have http2 enabled with nghttp2?

@alexcrichton
Copy link
Author

@alexcrichton alexcrichton commented Jul 29, 2021

For reference, though, on fbf2659 (current master) if I revert b4b34db I'm able to still reproduce the issue on my end. Curl still occasionally prints out:

$ curl --remote-name-all -Ss $(cat /urls.txt) -L -Z --parallel-max 100 --http2-prior-knowledge
curl: (55) Connection died, tried 5 times before giving up

@bagder
Copy link
Member

@bagder bagder commented Jul 29, 2021

How silly, so okey #7515 should fix that diversion...

jay added a commit to jay/curl that referenced this issue Jul 30, 2021
- Use double brackets for m4 style escape of brackets in regex.

Follow-up to b4b34db where I forgot that m4 needs brackets escaped.

Bug: curl#7400 (comment)
Reported-by: Alex Crichton

Reported-by: Rui Pinheiro

Fixes curl#7514
Closes curl#7515
@jay
Copy link
Member

@jay jay commented Jul 30, 2021

I'm able to reproduce this in Windows but rarely and I'm unable to get it down to fewer URLs to eliminate the noise. I had to put the URL list in a config style file (eg lines prefixed with url= like url = http://theurl) and passed them via config (curl ... -K urls.txt) due to shell differences. I looked through a wireshark decrypted http2 capture and I see all redirects and nothing unusual. I should note I did it manually and may have missed something, since I don't have an automated way to scan the packet capture for unusual circumstance; it's more of I know it when I see it.

@alexcrichton
Copy link
Author

@alexcrichton alexcrichton commented Jul 30, 2021

For your reproduction on Windows, were following redirects enabled? It's expected that the URL https://crates.io/api/v1/crates/quote/1.0.9/download redirects to https://static.crates.io/crates/quote/quote-1.0.9.crate, and at least when I was testing locally I had to ensure redirects were followed to get it to more reliably fail.

@jay
Copy link
Member

@jay jay commented Jul 30, 2021

Yes redirects were enabled but it occurs to me I had a capture filter on just crates.io and not any redirected to domains so I'll have to check again.

@Logarithmus
Copy link
Contributor

@Logarithmus Logarithmus commented Aug 1, 2021

@alexcrichton would applying 252790c as a reverse patch to curl 7.78.0 fix the issue?

@alexcrichton
Copy link
Author

@alexcrichton alexcrichton commented Aug 2, 2021

To confirm, reverting both b4b34db (to get nghttp2 detected locally for me) as well as 252790c appears to fix the issue on my end.

@bagder
Copy link
Member

@bagder bagder commented Aug 8, 2021

That commit fixed #6862 though, so presumably reverting this commit will bring that back... Thoughts on this @ngg ?

@ngg
Copy link
Contributor

@ngg ngg commented Aug 9, 2021

There were 4 commits merged together in #6910 which solved #6862 (my original bug was that after a server returned HTTP_1_1_REQUIRED error on a HTTP/2 stream, curl would sometimes retry the connection over HTTP/2 again):

605e842 http2: move the stream error field to the per-transfer storage
I'm pretty sure this commit is necessary in some form, as it is much more logical to store stream errors per-transfer and not per-connection.

9c18c0b multi: don't close connection HTTP_1_1_REQUIRED
This commit is necessary to not close other streams when one stream requires HTTP/1.1, I think it is ok as well.

9cb4845 ConnectionExists: respect requests for h1 connections better
This commit is necessary as well to choose a HTTP/1.1 connection from the pool when it is explicitly needed (either by forcing HTTP/1.1 from curl_easy_setopt ot after downgrading the connection due to an error)

252790c http2: call the handle-closed function correctly on closed stream
This is the commit in question, and I don't really know what it actually does.
Its description says This was this one condition where the stream could be closed due to an error and the function would still wrongly just return 0 for it.
The comment in this branch /* closed overrides paused */ is a bit strange because that case is seemingly already handled within the if(httpc->pause_stream_id) branch.
I don't think that reverting this change would change curl's behavior about HTTP_1_1_REQUIRED, but I don't really know what it actually does and what conditions are required to trigger that branch.

@bagder
Copy link
Member

@bagder bagder commented Aug 16, 2021

Thanks @ngg. I'm going to go ahead and merge the revert and we'll keep our eyes open and see if we run into any problems going forward.

@bagder bagder closed this in ccffb6e Aug 16, 2021
benfred added a commit to benfred/py-spy that referenced this issue Aug 28, 2021
There seems to be an issue in curl curl/curl#7400
that causes building on freebsd to fail frequently. Circumvent this by
installing rust from rustup instead of the system package manager.
benfred added a commit to benfred/py-spy that referenced this issue Aug 28, 2021
There seems to be an issue in curl curl/curl#7400
that causes building on freebsd to fail frequently. Circumvent this by
installing rust from rustup instead of the system package manager.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants