-
-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connect: stop halving the remaining timeout when less than N ms left #11693
Conversation
Alternative take: only do /2 for the first attempt and then go all the way for the rest |
A more advanced and aggressive take: fire off two attempts in parallel, per IP family and allow both to use the full timeout period... |
First, I agree that the /2 way is non-optimal when more than 2 addresses are available. So, what would be a good approach to partition the time for attempts? If we believe there is a meaningful, minimal duration for an attempt, assigning less time is not productive. A ping to nghttp2.org from my place is 260ms for me. Statistics here and here list maximum ping times of 300ms, in extreme 360ms. Testing this site with eissing.org results in pings <300ms, except Seoul listed with >600ms. Which I interpret as 300ms being a good limit for most parts of the net to succeed. 1000ms seems a safe limit. For easy calculation, say someone starts a transfer with connect_timeout of 3sec. ipv4 and v6 are handled in parallel, so lets focus just on one family. What we wan then is:
Read: if an attempt does not succeed after 1sec, and we have more to try, we move on. The last address tried takes all the remaining time. We think 300ms is the minimum viable wait time. If we have less overall time than (addr_count *300ms), some will never be tried. But that should be ok. The connection timeout needs to be raised then. Does that make sense? (Maybe 1200ms and 600ms are better values, but I think the distribution should work as outlined in the table.) |
I was in Cornwall UK on a summer holiday maybe ten years ago. In my rental cabin there, I experienced 60000 millisecond pings to my mail server at home. Meaning: there are architectures that introduce substantial delays that cannot be explained simply by physical (cable) distance. Maybe some users still want to be able to connect over that network - assuming that TCP allows it. I don't think we can assume that 1000ms will be enough to get through... |
The last address will always get the remaining time. If you have a single address for your mail server, this proposal would wait the full connect timeout. Shortening the first attempts and giving the remaining time to the last address should give a shorter average connect time. If the probability of success is the same for all addresses. AFAIK, there is no priority for A records in DNS, so one cannot assume that the first addresses are "better". |
If the server needs 2000 ms to connect, and I set a 3000 ms timeout and that works when the host has a single IP. Then one day they instead announce 10 addresses and then curl can't connect anymore. Annoying, since 3000ms could be enough if we only knew how to spend the time... |
The proposed implementation would not work with 2 addresses, since each one would get 1500ms and this would fail in your example. So...I believe there is no time distribution working for all cases one can dream up, unless we go parallel for each address found. |
Right, I think what I primarily want is to make sure that not all the time is wasted on the first address if there are more than one. The current /2 method works for this. If all addresses timeout however, the /2 method makes it next to impossible to connect to address number N (5? 6?) unless the timeout is set very large from the beginning. My idea in this PR would be to avoid /2-shrinking the timeout down to silly levels. |
This is probably not the ideal approach, but I think it improves the current situation at least a little. I'm in favor of merging this so I made it a "real" PR now. And we can continue the search and think about how this can be improved further in the future. |
b3e7c2c
to
7522a72
Compare
I wouldn't get caught up in the ping times. Some transfers are taking place over local networks where there's hardly any latency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest have a macro like this in ALLCAPS to signal a variable can be evaluated multiple times, even if it's unlikely here it's still best practice. Like which is easier to spot the bug isspace(*foo++) ISSPACE(*foo++)
When curl wants to connect to a host, it always has a TIMEOUT. The maximum time it is allowed to spend until a connect is confirmed. curl will try to connect to each of the IP adresses returned for the host. Two loops, one for each IP family. During the connect loop, while curl has more than one IP address left to try within a single address family, curl has traditionally allowed (TIME-LEFT/2) for *this* connect attempt. This, to not get stuck on the initial addresses in case the timeout but still allow later addresses to get attempted. This has the downside that when users set a very short timeout and the host has a large number of IP addresses, the effective result might be that every attempt gets a little too short time. This change stop doing the divided-by-two if the total time left is below a threshold. This threshold is 600 milliseconds. Closes #11693
7522a72
to
0c802f4
Compare
Good call. I agree. |
When curl wants to connect to a host, it always has a TIMEOUT. The maximum time it is allowed to spend until a connect is confirmed. curl will try to connect to each of the IP adresses returned for the host. Two loops, one for each IP family. During the connect loop, while curl has more than one IP address left to try within a single address family, curl has traditionally allowed (time left/2) for *this* connect attempt. This, to not get stuck on the initial addresses in case the timeout but still allow later addresses to get attempted. This has the downside that when users set a very short timeout and the host has a large number of IP addresses, the effective result might be that every attempt gets a little too short time. This change stop doing the divided-by-two if the total time left is below a threshold. This threshold is 600 milliseconds. Closes curl#11693
When curl wants to connect to a host, it always has a TIMEOUT. The maximum time it is allowed to spend until a connect is confirmed.
curl will try to connect to each of the IP adresses returned for the host. Two loops, one for each IP family.
During the connect loop, while curl has more than one IP address left to try within a single address family, curl has traditionally allowed (TIME-LEFT/2) for this connect attempt. This, to not get stuck on the initial addresses in case the timeout but still allow later addresses to get attempted.
This has the downside that when users set a very short timeout and the host has a large number of IP addresses, the effective result might be that every attempt gets a little too short time.
This change proposes that we stop doing the divided-by-two if the total time left is below a threshold. This threshold is 600 milliseconds in this initial commit.
To discuss:
A) a good/bad idea?
B) what's the threshold?
C) is there instead a better fix to do?