windows libcurl, crash in getaddrinfo_thread. #997

Closed
zcg19 opened this Issue Sep 7, 2016 · 13 comments

Projects

None yet

3 participants

@zcg19
zcg19 commented Sep 7, 2016 edited

I did this

I use libcurl in a dll, and export a fuction. dll code sample:
int send_http_msg(const char * msg, int timeout_second)
{
curl_easy_init();

// ...
curl_easy_setopt();

curl_easy_perform();
curl_easy_cleanup();

}

The program load this DLL and call send_http_msg(msg, 3).
Then free this DLL when send_http_msg() is returned.

Here is a question.
The thread of getaddrinfo_thread() will can not be executed completely, when getaddrinfo() takes more than 3s. But DLL has been released. Then program will be crash.

I trace the code and find it will do Curl_thread_destroy() when timeout.

curl_threads.c
void Curl_thread_destroy(curl_thread_t hnd)
{
CloseHandle(hnd);
}

curl/libcurl version

libcurl1.46.0

operating system

windows(xp, win7)

@bagder bagder added the crash label Sep 7, 2016
@bagder
Member
bagder commented Sep 7, 2016

So can you please provide a full example source code that reproduces the problem?

@zcg19
zcg19 commented Sep 8, 2016

I wirte a sample and upload to github.
https://github.com/zcg19/test_curl

@zcg19 zcg19 closed this Sep 8, 2016
@zcg19 zcg19 reopened this Sep 8, 2016
@bagder
Member
bagder commented Sep 8, 2016

I'm not really a Windows person/developer myself so I can't really review the full effects and details of this recipe. I suspect this crash happens because of some of the funny DLL/thread tricks involved and not because of libcurl itself.

You pointed out a particular call to CloseHandle as the culprit. Are you suggesting it is used wrongly somehow?

@zcg19
zcg19 commented Sep 8, 2016

I mean that Curl_thread_destroy() do not to terminate thread of getaddrinfo_thread.
The thread is still running, but Dll has been released.

@zcg19
zcg19 commented Sep 8, 2016

You can change the DNS into an incorrect IP address, so that the getaddrinfo() will take more than 3 seconds, causing the thread of getaddrinfo_thread can not to exit.
And then run the program will cause dump.

@jay
Member
jay commented Sep 8, 2016

I have not looked at the dump but I'm sure this is an access violation because he already unloaded the DLL. Basically libcurl has thread(s) running in the background and if his DLL (which libcurl is a part of) is unloaded during that time there will surely be an access violation.

A possible fix for this could require him calling curl_global_cleanup and then in Curl_resolver_global_cleanup we do a graceful cleanup of any running threads which are usually going to be waiting on getaddrinfo to return. However ordinarily if an application is terminating and calls curl_global_cleanup I think they would not expect a hang which could happen if we start waiting for all getaddrinfo to return. And TerminateThread is no good because it can disrupt various synchronizations in winsock.

Is it possible to not unload the DLL? @captain-caveman2k @mback2k any ideas?

@zcg19
zcg19 commented Sep 8, 2016

My current modification plan is no longer to unload the DLL after it being loaded.

@bagder
Member
bagder commented Sep 11, 2016

Okay, so this problem happens because you rip out the carpet from underneath the thread so there's nothing to come back to. It would be a bit unfortunate to add that wait -for-the-last thread now since it'll introduce blocking behaviors. I could agree to us adding another flag to curl_global_init() for this purpose, but that would be a new feature and isn't strictly a bug fix.

I propose that we document this "limitation" somehow, although I'm not sure exactly where we would write this. Where would someone in this situation look for clues? The curl_global_cleanup man page? The docs talking about multi-threading?

@jay
Member
jay commented Sep 13, 2016

I think that this is probably a behavior we should deal with I'm just now sure how yet. There is FreeLibraryAndExitThread so conceivably we could increase the DLL's refcount for every thread and then call that function to exit the thread. One problem with that is _beginthreadex allocates per-thread structs for the CRT and ExitThread doesn't dealloc in some cases so then there's a memory leak. Another problem is the refcount only goes up to 32k before it (conceivably) wraps around and since other code can call LoadLibrary on the DLL there could be a wraparound.

I'm thinking we could have some sort of monitor thread that monitors until the threads are done and keeps the DLL loaded during that time by increasing the refcount only once. Then that monitor thread, created with CreateThread, can call FreeLibraryAndExitThread when it is the last thread. But it gets more complicated when taking into account global initialization and cleanup and what if we have a user who initializes and cleans up repeatedly, it could create a lot of threads.

@bagder
Member
bagder commented Sep 20, 2016

I propose the following documentation fix while we ponder on what kind of fix we can do for future versions. I figured the curl_global_cleanup would be the most suitable place to put this note.

diff --git a/docs/libcurl/curl_global_cleanup.3 b/docs/libcurl/curl_global_cleanup.3
index 2e3ff03..c669a25 100644
--- a/docs/libcurl/curl_global_cleanup.3
+++ b/docs/libcurl/curl_global_cleanup.3
@@ -40,10 +40,16 @@ This doesn't just mean no other thread that is using libcurl.  Because
 similarly thread unsafe, it could conflict with any other thread that uses
 these other libraries.

 See the description in \fIlibcurl(3)\fP of global environment requirements for
 details of how to use this function.
-
+.SH CAUTION
+If libcurl is built to use a threaded resolver and a recent libcurl transfered
+returned with a timeout error, there might still be a resolver thread alive at
+the time this function is called and your application is done with
+libcurl. This becomes a problem if the application forcibly unloads libcurl
+from memory before that thread has completed, as it might lead to crashes or
+other unintended consequences.
 .SH "SEE ALSO"
 .BR curl_global_init "(3), "
 .BR libcurl "(3), "
-
+.BR libcurl-thread "(3), "
@jay
Member
jay commented Sep 20, 2016

I think that's too narrow. How about this:

.SH WARNING
curl_global_cleanup does not block waiting for any libcurl-created threads to terminate. If a module containing libcurl is dynamically unloaded while libcurl-created threads are still running then your program may crash or other corruption may occur. We recommend you do not run libcurl from any module that may be unloaded dynamically. This behavior may be addressed in the future.

Also does this apply only to Windows or can this happen on other OSes? I assume dlclose would have the same problem.

@bagder bagder added a commit that referenced this issue Sep 20, 2016
@bagder bagder curl_global_cleanup.3: don't unload the lib with sub threads running
Discussed in #997

Assisted-by: Jay Satiro
aab94da
@bagder
Member
bagder commented Sep 20, 2016

thanks @jay, I just pushed a variation of that. in commit aab94da.

or can this happen on other OSes

I'm pretty sure it can happen for all systems that can run the threaded resolver.

@bagder bagder closed this Oct 15, 2016
@jay
Member
jay commented Dec 2, 2016

This is a problem even if the libcurl.dll isn't unloaded. If any thread from the threaded resolver is in the middle of getaddrinfo during or after WSACleanup() is called (curl_global_cleanup -> win32_cleanup) and nothing else initialized winsock then it may crash with an access violation.

I think if we have threads working we will have to fix it so we don't actually cleanup or unload at least until they terminate. OpenSSL fixed their issue by leaving the DLL loaded until process termination, and they did the same in Linux using some DSO method. Also see my comment in that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment