-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Windows DNS resolution: Curl crash when GetAddrInfoExW callback invoked on shutdown #13509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@pps83 any ideas? |
It looks to me like Curl_resolver_kill is supposed to wait for threads to terminate and doesn't do that for the GetAddrInfoExW threads. Lines 746 to 762 in de7b3e8
I am not sure how quick exit works in this circumstance. If we don't wait for thread cleanup in resolver_kill then our getaddrinfo thread or the winsock getaddrinfo thread (ie the GetAddrInfoExW async dns thread) can call back into winsock, even after the user calls curl_global_cleanup? |
The best way for me to fix is to reproduce the error. @Ch40zz is it easy for you to provide some minimal example to repro the issue? Overall, it was hard to understand how threading/asynch/dns logic worked in the curl, as some things were quite confusing. Basically, I wrote all that code making a logical assumption that winapi creates these threads. Then, for all places where libcurl manages the async data I needed to add equivalent code for winapi. So, overall, Here's the commit that was merged: a6bbc87 If you look for all the places
However, as @jay pointed out, it looks like I had to add code to handle complete_ev case where thread_hnd was touched to make sure thread_wait_resolv was called inside |
Sorry for the lengthy iterative "debugging". I don't know this code, and I didn't touch curl since that PR was merged. Perhaps, all the @bagder before merging the PR I think a call to cancel the request needs to be added, as it's not logically clear what needs to be added there. What do you think? Here's the alternative impl that ignores @Ch40zz can you try these changes with your code to see if the issue gets fixed? |
#13518 sadly did not fix the issue yet, I posted more info on the PR. |
Quick update from us after testing for a day: EDIT: repro with either curl or pure WinAPI generating the exact same stack trace when compiled with ASAN and MSVC can be found here: https://gist.github.com/Ch40zz/f3a33139f35fd608d71db5e4085e0bee EDIT2: Created a bug on MS Developer Community: https://developercommunity.visualstudio.com/t/ASAN:-heap-use-after-free-in-NdrFullPoin/10654169 |
Thanks. We'll consider the remaining issue a Windows bug until proven otherwise. |
I'm sorry to revive this closed issue but it seems to be the best place to write about my findings with regard to curl and the Staring with curl 8.6.0, our product which loads at runtime (LoadLibrary) a DLL that has curl statically linked, crashes in places that point to what @Ch40zz reported to MS. After digging into the issue, I'm confident the implementation in curl is correct and respects Microsofts's documentation. From my tests, I have seen that, after the "completion" callback ( To me this definitely looks like a buggy MS implementation (if |
Did you test with curl 8.8.0 to see if you get the same issue? |
Yes. I've also tested directly w/o curl, just with the |
Hello, I am adding my observations to this issue. As @ioancea investigated, this seems to be a bug on Microsoft's
After running the modified curl you can see that curl code is not waiting for the spawned thread to complete and the thread is still running code inside curl. There should be a way to completely wait for the async operation to complete (similar on how joining a thread blocks until nothing on that thread is executing anymore). |
Hello, @bagder, @jay. Sorry for pinging you but you are the only curl maintainers that replied to this issue. |
Though I consider this a Windows bug and not a bug in curl I think we should consider reverting GetAddrInfoExW use and go back to the threaded resolver. Thoughts? |
Although this is a Windows bug I would still expect a library to work correctly, especially if there are alternatives that could be used. In my opinion it would make sense to add a define to enable the buggy feature for people who explicitly need it but they opt in to a well known bug by doing so and accept the risks. Default should never run any bugged code for small performance gains. In the future, should MS ever fix the bug the code could be used again by default after adding an OS version check for when the bug was fully fixed in Windows. |
Given that it seems we cannot make it work with the async call, I agree.
While it could be convenient to some, it also adds a lot of work. At least if we want to make sure that it keeps working. |
@bagder perhaps this can be configurable? Should be OFF by default, but can be enabled with options/flags at runtime? This way, it can be tested with updated OSes if it's fixed. Can be build-time config as well. The other option: before doing wsacleanup curl could add some sleep (this is obviously isn't a proper fix, but might make internal pool to join threads before doing cleanup. Also, in indicated in test code to trigger the issue it might actually manually signal that the callback is finished at the end of the callback, so that curl cleanup could make sure that we don't try to wsacleanup while async dns callback is in progress. |
- For the threaded resolver backend on Windows, revert back to exclusively use the threaded resolver with libcurl-owned threading instead of GetAddrInfoExW with Windows-owned threading. Winsock (the Windows sockets library) has a bug where it does not wait for all of the name resolver threads it is managing to terminate before returning from WSACleanup. The threads continue to run and may cause a crash. This commit is effectively a revert of several commits that encompass all GetAddrInfoExW code in libcurl. A manual review of merge conflicts was used to resolve minor changes that had modified the code for aesthetic or build reasons in other commits. Prior to this change if libcurl was built with the threaded resolver backend for Windows, and Windows 8 or later was the operating system at runtime, and the caller was not impersonating another user, then libcurl would use GetAddrInfoExW to handle asynchronous name lookups. GetAddrInfoExW support was added in a6bbc87, which preceded 8.6.0, and prior to that the threaded resolver backend used libcurl-owned threading exclusively on Windows. Reported-by: Ionuț-Francisc Oancea Reported-by: Razvan Pricope Ref: https://developercommunity.visualstudio.com/t/ASAN:-heap-use-after-free-in-NdrFullPoin/10654169 Fixes curl#13509 (comment) Closes #xxxx --- Revert "asyn-thread: avoid using GetAddrInfoExW with impersonation" This reverts commit 0caadc1. Conflicts: lib/system_win32.c -- Revert "asyn-thread: fix curl_global_cleanup crash in Windows" This reverts commit 428579f. -- Revert "system_win32: fix a function pointer assignment warning" This reverts commit 26f002e. -- Revert "asyn-thread: use GetAddrInfoExW on >= Windows 8" This reverts commit a6bbc87. Conflicts: lib/asyn-thread.c lib/system_win32.c --
Sorry but I'm proposing it be removed entirely. #14794. |
FYI, any code that uses libcurl will fail with asan because of excessive threads created without internet connection. Negative dns store would fix the issue though (PR wasn't completed for some reason). |
Is there an issue for this and can you give me a link. Is it another bug in Winsock or a bug in curl? Does it cause a crash? |
Initially, I patched libcurl to avoid the issue described in #12481. In short, builds with asan that make lots of requests easy run out of ram because of too many threads created (I'm not talking about concurrent threads here). It takes only 20K requests on x86 to crash with OOM error (or 200K on x64). I tried it on windows only, but it's possible the issue exists on other OSes when running with asan (the easiest way to test if the issue exists is to run sample program I reported to microsoft: https://developercommunity.visualstudio.com/t/ASAN-causes-OOM-error-when-many-threads/10508715). |
Hi there, I'm from the msvc ASan team, found this thread through the linked bug reports on "developer community". At first glance, I agree with much of the analysis here that it seems something is buggy with the Winsock API (though please note I'm no authority on that). @ioancea, just an FYI that I asked you about your ASan-less reproducer here: https://developercommunity.visualstudio.com/t/ASAN:-heap-use-after-free-in-NdrFullPoin/10654169 Also took a brief peek at your bug report @pps83 (https://developercommunity.visualstudio.com/t/ASAN-causes-OOM-error-when-many-threads/10508715). But don't mean to hijack this thread further so we can continue the conversation in the devcommunity forums. Thanks! |
I did this
We are using libcurl in our C++ project on windows 10 and noticed strange issues after upgrading our curl from 8.3.0 to 8.6.0 or later. After spending some time reproducing the issue, we managed to get an ASAN stacktrace of the exact crash:
What happens here is that a DNS resolver thread created by
GetAddrInfoExW
did not terminate yet, however curl is already shutting down, deinitializing Winsock withWSACleanup()
leading to an access violation.After investigating recent changes we came to the conclusion that there is most likely a bug in the current async windows DNS resolution code around
GetAddrInfoExW
implemented in #12482.We tried a few things to verify this:
Our test looks as follows:
We made sure that everything is thread safe by using proper locking mechanisms, the issue is reproducable roughly 60% of the time per run. The bug is most likely the cause of
Curl_resolver_kill
->Curl_resolver_cancel()
->destroy_async_data()
which then has this code:curl/lib/asyn-thread.c
Lines 555 to 569 in 0199104
Closing the completion event handle in the first branch (not done) seems like a bad idea as the next call to
Curl_GetAddrInfoExCancel()
might not finish instantly. We then would wait forWaitForSingleObject(td->complete_ev, INFINITE);
but this will instantly return with an error as the handle is closed. This is however just a theory and not verified yet.I expected the following
Curl shutdown / multi close properly stops all running DNS requests and doesn't crash.
Tests for the Windows implementation would also be great.
curl/libcurl version
>= curl 8.6.0
operating system
Windows 10 22H2
The text was updated successfully, but these errors were encountered: