Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi perform hang waiting for threaded resolver #4852

Closed
jay opened this issue Jan 27, 2020 · 5 comments
Closed

Multi perform hang waiting for threaded resolver #4852

jay opened this issue Jan 27, 2020 · 5 comments

Comments

@jay
Copy link
Member

@jay jay commented Jan 27, 2020

Similar to #2975.

I did this

Two or three dozen transfers in a multi to the same host concurrently. When the threaded resolver is used (maybe c-ares as well?) this causes the same amount of concurrent resolves. Occasionally one of the resolves will not receive a reply or it is delayed by 5-10 seconds. (It is a network issue and I haven't investigated it yet, possibly has to do with so many being sent at once.) However the other resolves will succeed and since libcurl is opportunistic with the DNS cache all the transfers will succeed. The problem is when the transfer that has the pending resolve is transfer done but not resolve done it hangs waiting for that resolve thread to end which holds up any other transfer that hasn't completed.

I expected the following

As discussed in #2975 if the thread can clean up its own resources I don't think libcurl should hang waiting for it. Also is it intentional that multiple resolves for the same host are sent out when pipewait is not used?

I wrote a small program to test out the different methods, I only occasionally get somewhat arbitrary results when testing multi without pipewait.

curl/libcurl version

libcurl/7.69.0-DEV OpenSSL/1.0.2t nghttp2/1.40.0
Asynchronous DNS (threaded)

operating system

Windows 7 Enterprise

@bagder bagder added the name lookup label Jan 27, 2020
@bagder
Copy link
Member

@bagder bagder commented Jan 27, 2020

Isn't this identical to #2975? By waiting for resolver threads the function may have a delayed return.

is it intentional that multiple resolves for the same host are sent out when pipewait is not used?

It is a known area we could improve. See TODO item 2.2 (added to TODO in 6d79722)

jay added a commit to jay/curl that referenced this issue Feb 5, 2020
This commit changes the behavior of libcurl to no longer block on
incomplete resolve threads from the parent thread (ie the user's thread
which is the multi perform thread) during multi_done.

Instead it now orphans those threads in a master list which is culled
periodically by soon-to-be exiting orphans to wait on and destroy those
that are in the process of or have since exited, which is fast. On
global cleanup we wait on and destroy any remaining threads, which may
be slow but at that point we cannot defer it any longer.

Ideally we would not wait for orphaned threads at all, but we have to
because after global cleanup the user may choose to unload the shared
library that is/contains libcurl.

Note: thread_wait_resolv (Curl_resolver_wait_resolv) which blocks
waiting for a resolver thread remains unchanged, since that is
essentially the purpose of the function. It is currently called by FTP
and SOCKS code, and continues to be a known issue that those two may
block waiting for async resolves to finish.

Fixes curl#4852
Closes #xxxx
jay added a commit to jay/curl that referenced this issue Feb 5, 2020
This commit changes the behavior of libcurl to no longer block on
incomplete resolve threads from the parent thread (ie the user's thread
which is the multi perform thread) during multi_done.

Instead it now orphans those threads in a master list which is culled
periodically by soon-to-be exiting orphans to wait on and destroy those
that are in the process of or have since exited, which is fast. On
global cleanup we wait on and destroy any remaining threads, which may
be slow but at that point we cannot defer it any longer.

Ideally we would not wait for orphaned threads at all, but we have to
because after global cleanup the user may choose to unload the shared
library that is/contains libcurl.

Note: thread_wait_resolv (Curl_resolver_wait_resolv) which blocks
waiting for a resolver thread remains unchanged, since that is
essentially the purpose of the function. It is currently called by FTP
and SOCKS code, and continues to be a known issue that those two may
block waiting for async resolves to finish.

Fixes curl#4852
Closes #xxxx
jay added a commit to jay/curl that referenced this issue Feb 5, 2020
This commit changes the behavior of libcurl to no longer block on
incomplete resolve threads from the parent thread (ie the user's thread
which is the multi perform thread) during multi_done.

Instead it now orphans those threads in a master list which is culled
periodically by soon-to-be exiting orphans to wait on and destroy those
that are in the process of or have since exited, which is fast. On
global cleanup we wait on and destroy any remaining threads, which may
be slow but at that point we cannot defer it any longer.

Ideally we would not wait for orphaned threads at all, but we have to
because after global cleanup the user may choose to unload the shared
library that is/contains libcurl.

Note: thread_wait_resolv (Curl_resolver_wait_resolv) which blocks
waiting for a resolver thread remains unchanged, since that is
essentially the purpose of the function. It is currently called by FTP
and SOCKS code, and continues to be a known issue that those two may
block waiting for async resolves to finish.

Fixes xxxxs://github.com/curl/issues/4852
Closes #xxxx
@Totktonada
Copy link

@Totktonada Totktonada commented Mar 3, 2020

My two coins. We use libcurl as base of built-in http client in tarantool (it is the app server + database). It uses cooperative multitasking and so http requests should not block a thread, because it means complete freeze for an application. Any API that would allow us to continue execution of other fibers (coroutines) would be fine.

However freeze on multi handle cleanup does not fit good for us: it leads to the same problem—the whole application would freeze at this point. We can track handles that needs extra cleanup on our side and perform attempts to do the clean up periodically if it will be non-blocking.

Thanks in advance!

Our issue: tarantool/tarantool#4591

@jay
Copy link
Member Author

@jay jay commented Jul 22, 2020

My PR to fix this did not gain enough support. I still think some remedy for this issue is a good idea. Someone will have to implement something more agreeable, or eventually I will close it in known issues (or TODO).

@andrewshadura
Copy link

@andrewshadura andrewshadura commented Aug 27, 2020

I’m curious why #2975 has got closed and locked, it’s still an issue that’s not been fixed.

@bagder
Copy link
Member

@bagder bagder commented Aug 27, 2020

  1. We close issues that linger for a long time without being worked on. If they're significant enough, they should instead get added to TODO or KNOWN_BUGS
  2. We lock all closed issues after a period of time after closure to avoid spam and people doing follow-ups in old issues instead of creating new ones

In the #2975 case, it seems we never added it to KNOWN_BUGS which looks like an oversight. But since this PR exists now, I don't think the closing of #2975 or the missing entry in known_bugs matter - this PR is here, and is still open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants