New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong Curl_timeleft result leads to connection timeout when CURLOPT_TIMEOUT_MS / CURLOPT_CONNECTTIMEOUT_MS is close to LONG_MAX #5583
Comments
Could you please test this against the current |
Also |
Indeed, but shouldn't LONG_MAX be also a valid value, also? |
It should. |
It's done to reduce the number of calls to the time function. The problem here seems to be the mix of using the previous value and a refreshed value... |
@ioancea do you feel like taking a stab at a PR to fix this? |
@mback2k I've just checked it against the current master (aka hash @bagder Thanks for the explanation, I'd assumed it has something to do with #5574 :)
Yes, since the timestamps that mark the start of operations (i.e. progress.startop and progress.startsingle) are set within the loop of multi_runsingle, the |
Out of curiosity: is there actually a valid real-world use case for a connect timeout longer than 300 seconds? |
There's no clear rule there, but in general we stay in the loop until we run into a |
Setting a timeout to INT_MAX could cause an immediate error to get returned as timeout because of an overflow when different values of 'now' were used. This is primarily fixed by having Curl_pgrsTime() return the "now" when TIMER_STARTSINGLE is set so that the parent function will continue using that time. Reported-by: Ionuț-Francisc Oancea Fixes #5583
I did this
I reached into this problem by trying to simulate an "infinite" timeout for the connection. Since 0 means 300 seconds, it felt natural to force it to LONG_MAX to achieve the goal of "no connection timeout"
configure curl without threaded resolver (aka --disable-threaded-resolver) - I have similar requirements as here "./configure --enable-threaded-resolver --disable-rt" doesn't work #3294
the system should be under a bit of load but it also reproduces if the program is run through a profiler, like valgrind (it just needs a bit of delay between instructions)
set CURLOPT_TIMEOUT_MS or/and CURLOPT_CONNECTTIMEOUT_MS to LONG_MAX (or close enough to it)
if curl is compiled on debug, disable signals / alarms by setting CURLOPT_NOSIGNAL to 1L (this is to workaround the DEBUGASSERT from warnless.c:295)
I expected the following
The example should work since the timeout is "infinite" (aka LONG_MAX)
curl/libcurl version
Latest release: libcurl/7.70.0, Release-Date: 2020-04-29
operating system
I've run the example on WSL but it can be reproduced on genuine Linux systems as well.
Linux 4.19.84-microsoft-standard #1 SMP Wed Nov 13 11:44:37 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Analysis of the problem
Well, the title of the issue it's a bit misleading because the problem isn't in Curl_timeleft itself but in its parameters. See below my analysis
While checking for the root cause of the issue, I have debugged the following relevant parts:
the error is returned because Curl_timeleft from https://github.com/curl/curl/blob/curl-7_70_0/lib/multi.c#L1572 returns a negative value
the negative value results from here https://github.com/curl/curl/blob/curl-7_70_0/lib/connect.c#L233 due to an integer overflow -
timeout_ms -= Curl_timediff
actually overflows timeout_ms (which was set to LONG_MAX) because Curl_timediff returns a negative value [so that timeout_ms = timeout_ms - (-value) --> timeout_ms = timeout_ms + value]Why does Curl_timediff return a negative value? The short answer is: because the newer timestamp is older than the older one :). What happens is the following:
now timestamp (which later is passed as newer to Curl_timediff) gets set in curl_multi_perform https://github.com/curl/curl/blob/curl-7_70_0/lib/multi.c#L2349 then passed to the multi_runsingle function
inside multi_runsingle, the loop of the state machine sets (among other things) the timestamp that marks the start of connection https://github.com/curl/curl/blob/curl-7_70_0/lib/multi.c#L1636. Then, in the next loop, this timestamp is given to Curl_timediff as older - from https://github.com/curl/curl/blob/curl-7_70_0/lib/multi.c#L1572 to https://github.com/curl/curl/blob/curl-7_70_0/lib/connect.c#L233
I'm not proficient enough in curl's internals to understand why the
now
timestamp is calculated from outside of themulti_runsingle
function. At a 1st glance, it looks natural to me to always use the actual current time to calculate the time left for an operation but i'm sure there must be an explanation for the current design.Why is this problem only visible with
--disable-threaded-resolver
? It seems there's a special branch whenasync
is enabled (see https://github.com/curl/curl/blob/curl-7_70_0/lib/multi.c#L1662, w/o the threaded resolver, aka the else branch, rc is set toCURLM_CALL_MULTI_PERFORM
which makes the function reloop) which makes the multi_runsingle function to stop its loop right afterprogress.t_startsingle
gets set. Then, at the next run,now
will be recalculated so it will correctly end up being newer thanprogress.t_startsingle
.Questions
Is there a workaround for this problem rather than enabling the threaded resolver? From my tests, it would be enough to lower the value (e.g. instead of LONG_MAX, set LOG_MAX / 1000) in order not to easily overflow. Is this assumption correct?
Can this issue affect other things too? For example, i noticed the multi_runsingle function also sets the
progress.t_startop
(here https://github.com/curl/curl/blob/curl-7_70_0/lib/multi.c#L1624) which will be at a later time thannow
.The text was updated successfully, but these errors were encountered: