New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl to localhost takes more than 200 ms to connect #2281

Closed
pps83 opened this Issue Feb 3, 2018 · 18 comments

Comments

Projects
None yet
3 participants
@pps83
Copy link
Contributor

pps83 commented Feb 3, 2018

I use libcurl and easy interface and at some point I noticed that requests to localhost were unusually slow. When I added some logging, I noticed that all requests were taking more than 200ms. When I tried to use 127.0.0.1 then it would become instant. After some debugging I realized that libcurl tries to connect first to ipv6 and then ipv4 and somewhere this triggers huge delay.

I tried to log and seems that one of selects takes these 200ms:

   0:023 [tid:00004888] getaddrinfo took 0.004000 sec
   0:023 [tid:00004888] ^ last message was repeated 1 times
   0:023 [tid:00004888] ....
   0:023 [tid:00004854] before connect << == connect to ipv6
   0:023 [tid:00004854] after connect
   0:024 [tid:00004854] before select 1
   0:024 [tid:00004854] select done 1
   0:024 [tid:00004854] before select
   0:025 [tid:00004854] select done
   0:034 [tid:00004854] before select 1
   0:034 [tid:00004854] select done 1
   0:034 [tid:00004854] before select
   0:219 [tid:00004854] select done    <<====== 200 ms long select
   0:219 [tid:00004854] before select 1
   0:219 [tid:00004854] select done 1
   0:219 [tid:00004854] before connect << == connect to ipv4
   0:220 [tid:00004854] after connect
   0:220 [tid:00005500] accepted clientSock:860
   0:224 [tid:00004854] before select 1
   0:229 [tid:00004854] select done 1
   0:229 [tid:00005120] new client thread started (clientSock:860)
...

to confirm the problem exists I took latest curl binary and it seems to have the same problem. Even though I cannot get any useful logging to see timings, the total time I get is 280ms. The server I use is my own simple code that listens on ipv4 only.

@pps83

This comment has been minimized.

Copy link
Contributor

pps83 commented Feb 3, 2018

Looks like I can consistently repro this problem on multiple machines.

Here's how. Run the server:
node -e "require('http').createServer((req, res) => { res.end('OK\n') }).listen(1337, '127.0.0.1')"

and then connect to it using curl binary:

curl -w %{time_total} http://localhost:1337/test

This is the output that I get:

curl -w %{time_total} http://localhost:1337/test
OK
0.203000

If I change to http://127.0.0.1:1337/test then request becomes instant.

I use win10. This problem is also present in linux, although delay isn't that huge. Looks like it's the code issue and perhaps it should somewhere check if result is already available on the socket before selecting it

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Feb 4, 2018

You didn't say which curl versions you've used nor what name resolver backend you use. I guess the threaded resolver?

While the threaded resolver is used, there is no socket for anyone to wait on so it has to wait a while and then check again. libcurl uses an exponential backoff so it first checks after one millisecond, then again 2 ms later then 4ms etc up all the way until it stays polling every 250ms. (code here)

This is what it does on my linux curl using the threaded resolver:

$ curl -w %{time_total} http://localhost:22/test
SSH-2.0-OpenSSH_7.6p1 Debian-3
Protocol mismatch.
curl: (56) Recv failure: Connection reset by peer
0.011593

$ curl -V
curl 7.58.0 (x86_64-pc-linux-gnu) libcurl/7.58.0 OpenSSL/1.0.2n zlib/1.2.8 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) libssh2/1.8.0 nghttp2/1.29.0 librtmp/2.3

@pps83

This comment has been minimized.

Copy link
Contributor

pps83 commented Feb 4, 2018

I dont't think it was resolver that resulted in this delay. As you could see from the quick logs that I added, at 0.023 there was a socket connect to ::1 result of local host resolution (eg after resolver completed). Then there were a few selects, perhaps checking if connected to ::1? Then one of these selects is 200ms long, right after there is a new connect, I assume this time it connects to 127.0.0.1 and everything goes fast from that point.

I use some ancient version of libcurl in my project, and then I took latest build of curl and it has exactly the same problem.

Regarding resolver. If there was some dns query that took 260ms, there will be 250ms penalty in (wait from 256 to 512ms?) Am I correct, or the code will use some way of notification as soon as dns resolution result become available?

curl - V

curl 7.58.0 (i386-pc-win32) libcurl/7.58.0 OpenSSL/1.1.0g (WinSSL) zlib/1.2.11 WinIDN libssh2/1.8.0 nghttp2/1.30.0
Release-Date: 2018-01-24

Please, try to use that simple node http server one liner, it specifically listens on ipv4 and not ipv6. SSH listens on both of them, and you won't see the problem.

These are a few lines that explain the problem:

curl -w %{time_total} http://localhost:1337/test
OK
0.234000
curl -w %{time_total} http://127.0.0.1:1337/test
OK
0.000000
curl -w %{time_total} http://[::1]:1337/test
0.203000curl: (7) Failed to connect to ::1 port 1337: Connection refused

As you can see, the localhost version has the same huge delay as ipv6 ::1 version. There is no way on earth I'd believe that it takes 200ms to figure our that local port isn't open and that's basically is the problem. Same huge delay connecting to some random unbound local port:

curl -w %{time_total} http://127.0.0.1:44321/test
0.219000curl: (7) Failed to connect to 127.0.0.1 port 44321: Connection refused

@jay

This comment has been minimized.

Copy link
Member

jay commented Feb 5, 2018

This has something to do with the happy eyeballs timeout in Windows I think. I tried the same thing but with IPv4 since I don't have IPv6 enabled in either Windows or the build of libcurl I tested with. My localhost resolves to 127.0.0.1 only. If I try 127.0.0.1 to a closed port it takes about 200ms for the connection refused in Windows. In Ubuntu it's 5ms. But in Windows if I change libcurl's HAPPY_EYEBALLS_TIMEOUT to 0 then it's about 15 to 30ms.

repro yes:
curl 7.59.0-DEV (i386-pc-win32) libcurl/7.59.0-DEV OpenSSL/1.0.2n nghttp2/1.30.0

repro no:
curl 7.58.0 (x86_64-pc-linux-gnu) libcurl/7.58.0 OpenSSL/1.0.2n zlib/1.2.8 nghttp2/1.30.0 librtmp/2.3

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Feb 5, 2018

When I run this on my linux machine, I can see that using dual-stack takes longer time than when using -4, but the longer time is still in the 5-15 millisecond range while the IPv4-only transfers are always less than 1 millisecond(!)

I don't have a working localhost for IPv6 so using -6 makes it fail.

Conclusion: it seems to make my version slower too, even if not by as much.

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Feb 5, 2018

This seems to be because when I resolve "localhost" dual-stack, curl ends up resolving the host name with the DNS server since I have no IPv6 address for it in /etc/hosts, but for IPv4 I do. So when only resolving IPv4, the local file is enough and that then (naturally) runs much faster...

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Feb 5, 2018

@pps83

This comment has been minimized.

Copy link
Contributor

pps83 commented Feb 7, 2018

Not sure what you mean, but it doesn't seem to be related to resolver in my opinion. I don't have neither v4 nor v6 entries for localhost in hosts file. As you can see, curl to some random unbound port on ::1 or on 127.0.0.1 takes 200+ms on Win10 and that's the reason I get these outrageously long times. Also, if there is a resolver on ipv4 it still can return AAA records regardless if it runs on ipv4 or ipv6.

Basically, if localhost can be resolved to ::1 or to 127.0.0.1 curl has no other choice by to try one then the other and that's what it does: tries first ::1 and this part simply takes 200ms to attempt connection to unbound local port and then it tries next to connect to 127.0.0.1. The problem is that the connection attempt takes way too much time to figure out that the port isn't listening. I could try to write simpel socket code and see how much time is needed to detect that some port on localhost isn't listening.

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Feb 13, 2018

@pps83: can you try to edit the HAPPY_EYEBALLS_TIMEOUT value in lib/connect.h to something much lower and see if that changes the timeout you see on windows?

@bagder bagder added the help wanted label Feb 18, 2018

@pps83

This comment has been minimized.

Copy link
Contributor

pps83 commented Feb 23, 2018

Sorry for delay, I'll try and report.

As a side note, is there a way to tell libcurl what dns timeout to use for requests? Let's say I wan to curl http://fdsjfsladjkf.facebook.com which should fail dns resolution (unless it's the man's personal obscure hidden homepage). In case of misconfigured dns settings dns resolution will take looong time until it times out for unexisting hosts (it would still work ok for exiting hosts). I would like to prevent curl from waiting for these long timeouts. Is that possible?

@pps83

This comment has been minimized.

Copy link
Contributor

pps83 commented Feb 23, 2018

Yes, modifying HAPPY_EYEBALLS_TIMEOUT to 10 ms changes time that libcurl takes to connect to localhost.

So, yes, it's related to localhost not hardcoded, and to happy eyeballs. If my server is listening on ::1 then it takes around 30 ms to connect to it through localhost because of dns involvement. If I connect through ::1 then it's 1ms. In case of server listening on 127.0.0.1 then connect through localhost will take HAPPY_EYEBALLS_TIMEOUT + 30ms (or whatever extra delay introduced by localhost resolution).

So, I guess, the other TODO should be to use different eyeball value for loopback connection, something like HAPPY_EYEBALLS_TIMEOUT/10?..
Interestingly, raw blocking socket connect to some random port on localhost takes 1 second to return an error.

Current default for HAPPY_EYEBALLS_TIMEOUT is 200ms, but some connection might take more than that to establish. Does this mean that they will fail, or curl after HAPPY_EYEBALLS_TIMEOUT ms tries to start a second connection on ipv4 without stopping first ipv6 attempt?

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Feb 23, 2018

It sounds more like a bug to me. The HAPPY_EYEBALLS_TIMEOUT is meant to be used so that if the IPv6 connection is still in progress after that time, start the IPv4 connection attempt in parallel and then pick the one that completes first.

In your description, it sounds as if the IPv6 connection already failed (faster than 200ms) and it just waits before starting the IPv4 connection attempt. That's not how it is supposed to work.

Since dd027c8, which landed just a few days ago, this timeout is now configurable which thus allows you to reduce the impact of this behavior, but I still maintain that libcurl shouldn't behave like this.

@pps83

This comment has been minimized.

Copy link
Contributor

pps83 commented Feb 23, 2018

It sounds more like a bug to me

What are you referring to? I was only asking about curl behavior and how this timeout was used

In your description, it sounds as if the IPv6 connection already failed (faster than 200ms) and it just waits before starting the IPv4 connection attempt. That's not how it is supposed to work.

No, I think this works properly in curl, my connection didn't timeout in 200ms. Originally I wrote that I find it hardly possible that it takes that long to fail localhost connection, but my tests show that in fact on windows loopback connection to some random unbound port fails in 1000ms

IMO libcurl should:

  • reduce eyeball timeout to 5-10 ms for loopback connections (with dd027c8 I'll be able to do that manually).
  • not use dns api to resolve localhost

Would it make sense to add an option to override dns resolution timeout in curl? I think if you have 3 dns servers configured and one of the dns server is down and if you try to resolve some random hostname and if two these servers return 0 results, then system dns resolver will wait for reply from the third unexisting dns server until it times out. In win10 I get 10 second timeouts. Basically run libcurl code on some random PCs where I have zero expectation that network is properly configured and I've seen this particular error where it takes long time to timeout requests. Some of my http requests might take 10-20 seconds to complete, so I cannot set total timeout to 2-3 seconds, I want to limit dns resolution timeout only.

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Feb 23, 2018

I think this works properly in curl, my connection didn't timeout in 200ms

Ok.

reduce eyeball timeout to 5-10 ms for loopback connections

Yeah, that'd be nice, but we can really only do that once we hard-code "localhost" to be local as we don't otherwise know when a local host is requested.

Would it make sense to add an option to override dns resolution timeout in curl?

There's a connection timeout that will timeout the resolving too - but then of course cause a failure when that happens. The stock API to resolve host names doesn't allow us to set a custom time-out for its server communications. That's all just magic handled for us without curl having a say about it.

@pps83

This comment has been minimized.

Copy link
Contributor

pps83 commented Feb 23, 2018

but we can really only do that once we hard-code "localhost" to be local as we don't otherwise know when a local host is requested.

I think loopback should be checked by comparing actual addr, e.g. 127\8, and whatever is the proper check for ipv6, that's what I meant.

There's a connection timeout that will timeout the resolving too - but then of course cause a failure when that happens. The stock API to resolve host names doesn't allow us to set a custom time-out for its server communications. That's all just magic handled for us without curl having a say about it.

connection timeout is the same as entire request timeout? Or once connected, there are no more timeouts? Imagine I connect and try to GET something, what if the server is too slow to serve me and I want my request to take no more than 3 seconds. But... at the same time, I really don't want it all fail in 3 seconds if it wasn't able to resolve host name in the first place, in this case I'd like it to fail within 500ms so I could retry another/alternative url. Regarding dns stock api: even if dns resolution cannot be stopped prematurely, request itself might already abort further processing and return error to the caller without waiting for dns resolution to finish.

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Feb 23, 2018

@pps83

This comment has been minimized.

Copy link
Contributor

pps83 commented Feb 23, 2018

regarding loopback and eyeball. All that eyeball timeout only applies after DNS resolution completes, doesn't it? As I understand what should happen when curl is requested to connect to localhost:123, first it tries to resolve localhost and ask both ipv6 and ipv4 addresses, and if both A and AAA returned, then try ipv6 first and after eyeball timeout attempt ipv4 in parallel if ipv6 didn't complete yet.

You said that

but we can really only do that once we hard-code "localhost"

Is that because curl doesn't request ipv6 and ipv4 resolutions of localhost at the same time, and has to independently resolve ipv6 and if the entire process doesn't succeed in eyeball timeout, then it start ipv4 resolution + connect. Not sure if stock dns resolution allows to request A and AAA at the same time, but dns protocol itself obviously allows to requests all these records in a single request.

@bagder

This comment has been minimized.

Copy link
Member

bagder commented Mar 1, 2018

Is that because curl doesn't request ipv6 and ipv4 resolutions of localhost at the same time

No, it gets both sets at the same time (getaddrinfo). I just don't think scanning the lists of returned addresses to make sure that they all are "local" and based on that change the timeout, sounds like a nice approach. But I agree that it would then at least work without a hard-coded "localhost" and even for cases where "localhost.example.com" returns only local addresses...

@bagder bagder closed this in 6bc8304 Jun 14, 2018

@lock lock bot locked as resolved and limited conversation to collaborators Sep 12, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.