Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libcurl does not seem to send user agent reliably #6312

Closed
Flameborn opened this issue Dec 12, 2020 · 35 comments
Closed

Libcurl does not seem to send user agent reliably #6312

Flameborn opened this issue Dec 12, 2020 · 35 comments

Comments

@Flameborn
Copy link

@Flameborn Flameborn commented Dec 12, 2020

Transmission does not always send a user agent string when announcing to a torrent tracker.

Even when specified in its config file, the sent user agent string is empty.

This is after upgrading to Curl (libcurl) 7.71.0 and above, using Transmission 2.94 and 3.0. Older Curl versions work as expected.

See a forum thread here: https://forum.transmissionbt.com/viewtopic.php?f=2&t=20449&sid=5ab5f6c305ead5a64dbd72b0608cf729

I expected the following

User agent sent when announcing.

curl/libcurl version

Curl 7.71.0 and above.

operating system

Arch Linux ARM (Linux idris 5.4.81-1-ARCH #1 SMP PREEMPT Mon Dec 7 20:38:46 UTC 2020 armv7l GNU/Linux)

@bagder
Copy link
Member

@bagder bagder commented Dec 12, 2020

Please tell us how to reproduce this. And if any of the details in that forum thread is important for the issue, I would appreciate if you would include them in the issue here.

@bagder bagder changed the title Libcurl does not seem to send user agent reliably when used via Transmission Libcurl does not seem to send user agent reliably Dec 12, 2020
@Flameborn
Copy link
Author

@Flameborn Flameborn commented Dec 12, 2020

I am unfortunately unable to specify exact steps to reproduce, as the tracker specifics are not known to me. As a result,I can only assume that this is a common issue with any tracker, possibly with HTTPS enabled. I am aware that this makes debugging and possibly fixing the issue a lot harder than it should be.

In general:

  • Install the transmission-cli package, along with core/curl (currently 7.74.0).
  • Set up a torrent tracker, such as https://github.com/skipperbent/easy-torrent-tracker. (Requires a web server, PHP and possibly a certificate for HTTPS).
  • Use transmission-create to create a torrent file, specifying the locally installed tracker.
  • Start transmission via systemctl.
  • Add the previously created torrent file via transmission-remote -a torrent-file-name.torrent
@bagder
Copy link
Member

@bagder bagder commented Dec 12, 2020

I can only assume that this is a common issue with any tracker,

That seems like a very far-going assumption that I can't see any evidence for.

We need a recipe for this issue in order to conclude that this is in fact a libcurl problem and not just a side effect of something in the application code. libcurl is after all used by a large number of applications and we've had no others report an issue like this during the half year 7.71.0 has been out.

The fact that no user-agent appears when you run the application is not a sure sign of a libcurl issue...

@Flameborn
Copy link
Author

@Flameborn Flameborn commented Dec 12, 2020

That seems like a very far-going assumption that I can't see any evidence for.

You are right, it is not guaranteed, but the fact that a Curl upgrade specifically causes this issue makes it likely that it was indeed a change in libcurl in 7.71.0. Transmission 2.94 was released on the 1st of May, 2018. This release and the subsequent 3.0 version (released on the 22nd of May 2020) work with Arch Linux's core/curl 7.70.0 and do not have issues when announcing.

Supposing that there were some API changes for lib curl in 7.71.0 and above, and Transmission is not using the lib curl API as expected, so the user agent is not set properly and it is indeed an application side effect as suggested, I believe that this is something that should be handled as it might break backward compatibility.

@bagder
Copy link
Member

@bagder bagder commented Dec 12, 2020

The sending of and setting user-agent has not changed in many years (maybe 15?), which is partly why I have a hard time to suspect thhat this is a libcurl regression. But I've been wrong before.

@jay
Copy link
Member

@jay jay commented Dec 13, 2020

I can't think of any other report like this. You could bisect.

@bagder
Copy link
Member

@bagder bagder commented Dec 16, 2020

I don't think we can do anything about this issue without either a way to reproduce ourselves, or a very detailed description of exactly what the app does (libcurl-wise) when it fails.

@Flameborn
Copy link
Author

@Flameborn Flameborn commented Dec 17, 2020

I have raised an issue for the Transmission client to request more information, as I believe they can provide details on how libcurl is used in the background.

@sveiki1
Copy link

@sveiki1 sveiki1 commented Jan 2, 2021

Can confirm that I'm experiencing the same issue on Arch Linux ARM and Transmission.

Issue seems to be present in:

Downgrading to curl-7.67.0-3-armv7h.pkg.tar.xz (SHA-1: cc5d4f89e05a945a74bfce36f84786eb9e68e64d) seems to be a workaround. I was unable to test other versions as those were the only ones present in the pacman cache in my machine.

@DontWorryScro
Copy link

@DontWorryScro DontWorryScro commented Jan 2, 2021

+1 to confirm this is an issue. I'm getting intermittent "banned client" errors due to this.

Edit: Using rTorrent:0.9.6/0.13.6

@bagder
Copy link
Member

@bagder bagder commented Jan 2, 2021

Adding more "me toos" to this issue doesn't help. We need more details. For example, show us the full HTTP requests done with a working libcurl and one with the problematic request? Perhaps even going so far to bisect to the exact libcurl change that introduced the problem.

No other users report these problems and I libcurl is used quite widely.

@bagder
Copy link
Member

@bagder bagder commented Jan 2, 2021

As seen in the Transmission issue thread, this problem is also not easily reproducible for all Transmission users or authors so apparently it's not just a libcurl upgrade that's needed to trigger this.

@AVerwer
Copy link

@AVerwer AVerwer commented Jan 4, 2021

I've also been experiencing this issue since 7.71. I used mitmproxy to look at the requests transmission makes. I've only tried with trackers using TLS. I started transmission after editing the service by adding:

[Service]
Environment=https_proxy=http://127.0.0.1:8080
Environment=CURL_CA_BUNDLE=/home/arjan/.mitmproxy/mitmproxy-ca-cert.pem

After which I downloaded some ubuntu images from: https://torrent.ubuntu.com/tracker_index
All torrents announced correctly, each sending a request with a user-agent.

After that I changed the tracker to: https://tracker.nanoha.org:443/announce
which i got from: https://github.com/ngosang/trackerslist/blob/master/trackers_all_https.txt
Here, some requests announced without a user-agent. After playing around a bit I found out that if multiple torrents announced within a certain time (pretty much at the same time), one sends a request with a user-agent, the others don't.

I've tried this with the ubuntu tracker, which all send a user-agent even if they announce at the same time.
The only difference I could see was that the ubuntu tracker uses HTTP/1.1 and the other one uses HTTP/2.0.

Now I'm not sure if this is intended/supposed to happen with HTTP/2.0. Is curl supposed to send a user-agent with each request, or is the server (using HTTP/2.0) supposed to know the user-agent is the same for all requests? I don't know enough about the specifications.

I could try a git bisect sometime later, but a quick search in the commits between curl-7_70_0 and curl-7_71_0 did show this issue which might be related? #5567

@bagder
Copy link
Member

@bagder bagder commented Jan 4, 2021

Ah yes, I had forgot about that change that indeed was introduced in 7.71.0. That change fixed a user-agent issue that otherwise could make curl use the same user-agent for multiple requests on the same connection as the agent would (wrongly) be associated with the connection and not with the particular individual transfers that could be using the same connection (when multiplexed like with HTTP/2).

This could thus make a mistake in the client-code using libcurl to change behavior. This could also explain why Transmission is still the only application that shows this problem half a year after that libcurl version was released.

Now I'm not sure if this is intended/supposed to happen with HTTP/2.0. Is curl supposed to send a user-agent with each request, or is the server (using HTTP/2.0) supposed to know the user-agent is the same for all requests?

libcurl sends no user-agent at all by default and HTTP doesn't mandate that one is used. It is an optional header. If the user-agent is set for a transfer in libcurl with the CURLOPT_USERAGENT option, that particular transfer will use that specific user-agent in the request.

@AVerwer
Copy link

@AVerwer AVerwer commented Jan 4, 2021

Thanks for the quick response :-)

Looking at this comment it appears to also be affecting rTorrent.

I've tried the code from this comment, Which is supposedly how transmission uses curl. I adjusted it by setting:

curl_easy_setopt(e, CURLOPT_URL, "https://tracker.nanoha.org:443/announce");
curl_easy_setopt(e, CURLOPT_CAINFO, "/home/arjan/.mitmproxy/mitmproxy-ca-cert.pem");
curl_easy_setopt(e, CURLOPT_PROXY, "127.0.0.1:8080");

and set #define MAX_PAUSED 100 to #define MAX_PAUSED 10 (It send a lot of requests :x)

The first 11 requests had a user-agent, the rest didn't. However, I'm not familiar enough with this to know if it's an implementation issue or not. I can imagine it's not up to you to debug the code of others. If I should continue this in the transmission issue let me know.

@bagder
Copy link
Member

@bagder bagder commented Jan 4, 2021

If you have a way to reproduce that sounds like an excellent step no matter what!

  1. Does this way use HTTP/2? You mention 11 requests with user-agent and "the rest" without.
  2. Are the requests done to different hosts or to the same?
  3. Would it be possible to set CURLOPT_VERBOSE to 1L for perhaps the 2 last working ones and the first one or two non-working ones to get trace output from libcurl's activities there?
  4. Is the code setting CURLOPT_USERAGENT for all transfers?
@AVerwer
Copy link

@AVerwer AVerwer commented Jan 4, 2021

  1. Yes. At least, looking at all the request bodies I could see in mitmproxy, they all used HTTP/2. The code I used send requests indefinitely, so I stopped the program (CTRL+C) and looked at the requests it send. I didn't count them all, but only the first 11 requests had a user-agent in the request headers.
  2. The same host. I used the tracker mentioned in the previous comments because I knew it supported HTTP/2.
  3. I can try, I'm not very familiar with C though.
  4. I assume so. The code I used from the comment in the transmission issue has curl_easy_setopt(e, CURLOPT_USERAGENT, "foobar/1.0"); and it doesn't look like that line gets skipped. And the user-agent matches the one I see in the requests by mitmproxy.
@jay
Copy link
Member

@jay jay commented Jan 5, 2021

I tried the code from that comment (adjusted to init CURL_GLOBAL_ALL) and let it run for 15 minutes but nothing happened. The python server is HTTP/1.0. I then changed to https://http2.golang.org/ECHO and it showed up pretty quickly in the verbose output:

* Using Stream ID: 1 (easy handle 0x5a5260)
> GET /ECHO HTTP/2
Host: http2.golang.org
user-agent: foobar/1.0
accept: */*
accept-encoding: identity

* STATE: DO => DO_DONE handle 0x5a5260; line 1940 (connection #13)
* multi changed, check CONNECT_PEND queue!
* STATE: DO_DONE => PERFORM handle 0x5a5260; line 2061 (connection #13)
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
* multi changed, check CONNECT_PEND queue!
* Connected to http2.golang.org (130.211.116.44) port 443 (#14)
* STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x28e1508; line 1848 (connection
 #14)
* Marked for [keep alive]: HTTP default
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* STATE: SENDPROTOCONNECT => PROTOCONNECT handle 0x28e1508; line 1866 (connectio
n #14)
* STATE: INIT => CONNECT handle 0x29676b0; line 1654 (connection #-5000)
* Found bundle for host http2.golang.org: 0x55bf18 [can multiplex]
* Multiplexed connection found!
* Re-using existing connection! (#0) with host http2.golang.org
* STATE: CONNECT => DO handle 0x29676b0; line 1708 (connection #0)
* Using Stream ID: 3 (easy handle 0x29676b0)
> GET /ECHO HTTP/2
Host: http2.golang.org
accept: */*
accept-encoding: identity

wireshark:
wireshark capture

@bagder
Copy link
Member

@bagder bagder commented Jan 5, 2021

Woah! What code is that exactly? Can you make zip for me to try?

@andreygursky
Copy link

@andreygursky andreygursky commented Jan 5, 2021

Here is a reproducer:
test-http2.go

package main

import (
	"log"
	"net/http"
)

func main() {
	srv := &http.Server{Addr: ":8000", Handler: http.HandlerFunc(handle)}

	log.Printf("Serving on https://0.0.0.0:8000")
	log.Fatal(srv.ListenAndServeTLS("cert.pem", "key.pem"))
}

func handle(w http.ResponseWriter, r *http.Request) {
        log.Printf("UserAgent: %s", r.UserAgent())
	if (r.UserAgent() != "foobar/1.0") {
                log.Fatalf("Unexpected UserAgent")
	}
	w.Write([]byte("dummy"))
}
$ go build test-http2.go && ./test-http2
2021/01/05 13:47:19 Serving on https://0.0.0.0:8000
2021/01/05 13:47:21 UserAgent: foobar/1.0
2021/01/05 13:47:21 UserAgent: foobar/1.0
2021/01/05 13:47:21 UserAgent: 
2021/01/05 13:47:21 Unexpected UserAgent
@bagder bagder removed the needs-info label Jan 5, 2021
@bagder
Copy link
Member

@bagder bagder commented Jan 5, 2021

Thanks, that's neat, but I was curious about the code for the client side...

@andreygursky
Copy link

@andreygursky andreygursky commented Jan 5, 2021

I've used the same client code test-https.c from my comment transmission/transmission#1563 (comment).

@bagder
Copy link
Member

@bagder bagder commented Jan 5, 2021

Ah, I missed that part. Thanks, will run ahead and try this now!

@bagder
Copy link
Member

@bagder bagder commented Jan 5, 2021

It reproduces for me too.

@andreygursky
Copy link

@andreygursky andreygursky commented Jan 5, 2021

Thanks for looking into it!

@bagder
Copy link
Member

@bagder bagder commented Jan 5, 2021

It is a libcurl flaw and I know why it happens. Working on a fix.

bagder added a commit that referenced this issue Jan 5, 2021
... and not in the connection setup, as for multiplexed transfers the
connection setup might be skipped and then the transfer would end up
without the set user-agent!

Reported-by: Flameborn on github
Assisted-by: Andrey Gursky
Assisted-by: Jay Satiro
Fixes #6312
@bagder
Copy link
Member

@bagder bagder commented Jan 5, 2021

Please try out #6417 if you can build a local libcurl. It is a fairly small patch. It fixes the problem in my tests.

@andreygursky
Copy link

@andreygursky andreygursky commented Jan 5, 2021

I've just built current git master, which fails the test. After applying the patch I've let the test running for 7 minutes without facing the bug. Looks good, thanks!

@AVerwer
Copy link

@AVerwer AVerwer commented Jan 5, 2021

Glad @jay and @andreygursky helped with making it reproduce-able. It would probably take me a bit longer. I hope my info helped speed things up.

I've build curl with the fix and can confirm the issue is now gone (even with transmission). Thanks @bagder

@Flameborn
Copy link
Author

@Flameborn Flameborn commented Jan 5, 2021

@bagder bagder closed this in 0936ecd Jan 5, 2021
@bagder
Copy link
Member

@bagder bagder commented Jan 5, 2021

Thank you everyone! The fix has now landed in master.

StefanKarpinski added a commit to JuliaLang/Downloads.jl that referenced this issue Jan 29, 2021
This works around curl/curl#6312 which was the
root cause behind JuliaLang/Pkg.jl#2357. TDLR:
if you make two concurrent HTTP/2 requests to the same server using an
already-established connection with the `CURLOPT_USERAGENT` option set,
the user agent header only gets sent for one of the requests, not both.
This is normally fairly harmless, but GitHub refuses to serve downloads
unless the user agent header is set, so we were getting 403 errors.
StefanKarpinski added a commit to JuliaLang/Downloads.jl that referenced this issue Jan 29, 2021
* header tests: test setting the user agent header
* set `User-Agent` as a header in addition to `CURLOPT_USERAGENT`
* re-enable HTTP/2

This works around curl/curl#6312 which was the
root cause behind JuliaLang/Pkg.jl#2357. TDLR:
if you make two concurrent HTTP/2 requests to the same server using an
already-established connection with the `CURLOPT_USERAGENT` option set,
the user agent header only gets sent for one of the requests, not both.
This is normally fairly harmless, but GitHub refuses to serve downloads
unless the user agent header is set, so we were getting 403 errors.

Co-authored-by: Stefan Karpinski <stefan@karpinski.org>
@StefanKarpinski
Copy link

@StefanKarpinski StefanKarpinski commented Jan 29, 2021

We got hit by this issue in Julia as well (as of the upcoming Julia 1.6 release, the built-in download functionality uses libcurl). Since this is already fixed, this may be redundant but I wanted to note what we found in our analysis here:

  • only seems to happen with HTTP/2
  • occurs when an HTTP/2 connection to a particular server already exists
  • occurs when two or more concurrent requests to that server are made
  • the User-Agent header is only sent for one of the concurrent requests

This is, of course, usually harmless since most servers work fine without a user agent header, but in our case it turned out that the GitHub API refuses to serve downloads unless there is a user agent. We have worked around the issue by explicitly setting User-Agent as a header in addition to using the CURLOPT_USERAGENT option.

I think I understand the fix: it moves the logic that sends the user agent header from the connection setup to the request (transfer) setup. What I don't quite understand is why the bug only manifested when concurrent connections where made. I would think it would happen whenever any reuse of an already-established HTTP/2 connection was reused for a subsequent request. Is it because libcurl tears down the HTTP/2 connection when there are no more in-progress requests to that server but leaves the TCP connection around in case it gets used again? That's the only thing that makes sense to me. The concurrency + HTTP/2 requirement would then make sense since only with HTTP/2 are multiple requests multiplexed onto the same HTTP connection (as opposed to TCP connection), so you'd see the first multiplexed request with a user agent and the rest without. Am I understanding the bug and fix correctly?

@StefanKarpinski
Copy link

@StefanKarpinski StefanKarpinski commented Jan 29, 2021

I guess the part that my theory doesn't explain is why the bug doesn't manifest when two concurrent HTTP/2 requests are made to the same server when an existing connection doesn't already exist. Is that because each of the concurrent requests then gets their own connection? That's the only explanation for that difference I can think of.

@bagder
Copy link
Member

@bagder bagder commented Jan 29, 2021

The mistake was exactly that: it set the header only when the connection was first established so subsequent multiplexed requests over the same connection wouldn't get it. Or in fact, new transfer handles that reused the same connection would probably also be affected. This was a regression introduced in e15e513.

Is that because each of the concurrent requests then gets their own connection?

They do. If you add two requests at once, they prefer speed to action rather than minimizing the number of connections and will then go for one connection each. An application can change this behavior by setting CURLOPT_PIPEWAIT which if set, would make the second request wait for the first one to get the connection up and then do the second request multiplexed over the same connection. Thus prioritizing fewer connections.

@StefanKarpinski
Copy link

@StefanKarpinski StefanKarpinski commented Jan 29, 2021

Thanks for the explanation and thanks, of course, for the great software!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

8 participants