Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Apply the Happy Eyeballs philosophy to parallel c-ares queries #3699
When it supports both IPv4 and IPv6, cURL follows the happy eyeballs algorithm by trying to connect via both protocols in a staggered fashion. When cURL is configured to use c-ares, this also means doing two simultaneous, parallel DNS requests, one for the A record and one for the AAAA record, so that cURL can have both the IPv4 and IPv6 address in hand. (Ideally, there'd be a single request, but that's not the case for a variety of reasons.)
Today, cURL issues both of these requests at the same time, which is good. However, this parallelism is encapsulated within the c-ares driver, so the larger cURL logic is unaware that there are actually two requests and not just a single (combined A and AAAA) DNS request. The rest of cURL waits for this meta request to finish before it tries to start any connections.
Unfortunately, when either one of the parallel c-ares requests takes a long time to finish, cURL sits and waits for it, even if the c-ares driver does have a viable DNS result from the other request. Practically speaking, this means that if you are on a network that claims to support IPv6 and allows binding such sockets and addresses locally, but whose DNS server is misconfigured and can't properly respond to AAAA requests, you can find yourself in a situation where you have the usable A record in hand, but you're waiting for a series of timeouts and/or failures from all the malfunctioning DNS servers in the list.
This PR makes a small tweak to the existing c-ares driver that takes same "happy eyeballs" philosophy from the actual TCP connections (connect to one and then after a short timeout, try the other) and it applies it to the parallel DNS requests themselves. With this change, once a usable DNS response is in hand (be it A or AAAA), c-ares will only wait a short time before giving up on the second response and just going with what it has.
The exact algorithm and rationale are documented in the comments inside the PR. The changes were specifically chosen to be simple and not overly aggressive. This might not be the best "fix", but it seems better than the status quo, and I hope that it can at least be the start of a discussion on how to handle real-world cases like this.