Skip to content

Windows DNS resolution: Curl crash when GetAddrInfoExW callback invoked on shutdown #13509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Ch40zz opened this issue Apr 30, 2024 · 21 comments
Closed
Labels
crash help wanted name lookup DNS and related tech Windows Windows-specific

Comments

@Ch40zz
Copy link

Ch40zz commented Apr 30, 2024

I did this

We are using libcurl in our C++ project on windows 10 and noticed strange issues after upgrading our curl from 8.3.0 to 8.6.0 or later. After spending some time reproducing the issue, we managed to get an ASAN stacktrace of the exact crash:

=================================================================
==41196==ERROR: AddressSanitizer: heap-use-after-free on address 0x12184f8b06cc at pc 0x7ffbe4b442c4 bp 0x00c7a3dfdcf0 sp 0x00c7a3dfd480
READ of size 36 at 0x12184f8b06cc thread T-1
==41196==WARNING: Failed to use and restart external symbolizer!
++++++++++++++++++++++++++++++++++++++++deinit
    #0 0x7ffbe4b442c3 in _asan_wrap_memmove+0x193 (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x1800342c3)
    #1 0x7ffd4bc46936 in NdrFullPointerFree+0xd36 (C:\WINDOWS\System32\RPCRT4.dll+0x180006936)
    #2 0x7ffd4bd1d87c in NdrMesSimpleTypeEncodeAll+0x6e2c (C:\WINDOWS\System32\RPCRT4.dll+0x1800dd87c)
    #3 0x7ffd4bd1aa74 in NdrMesSimpleTypeEncodeAll+0x4024 (C:\WINDOWS\System32\RPCRT4.dll+0x1800daa74)
    #4 0x7ffd4bd1dc6f in NdrClientCall3+0xef (C:\WINDOWS\System32\RPCRT4.dll+0x1800ddc6f)
    #5 0x7ffd4a348fb0 in DnsGetAdaptersInfo+0xe40 (C:\WINDOWS\SYSTEM32\DNSAPI.dll+0x180008fb0)
    #6 0x7ffd4a3486bf in DnsGetAdaptersInfo+0x54f (C:\WINDOWS\SYSTEM32\DNSAPI.dll+0x1800086bf)
    #7 0x7ffd4a34c616 in DnsQueryEx+0x9b6 (C:\WINDOWS\SYSTEM32\DNSAPI.dll+0x18000c616)
    #8 0x7ffd4a34bdc5 in DnsQueryEx+0x165 (C:\WINDOWS\SYSTEM32\DNSAPI.dll+0x18000bdc5)
    #9 0x7ffd4a65b162 in WSPStartup+0x1902 (C:\WINDOWS\system32\mswsock.dll+0x18000b162)
    #10 0x7ffd4a65af33 in WSPStartup+0x16d3 (C:\WINDOWS\system32\mswsock.dll+0x18000af33)
    #11 0x7ffd4a65a60d in WSPStartup+0xdad (C:\WINDOWS\system32\mswsock.dll+0x18000a60d)
    #12 0x7ffd4d7c9e79 in WSALookupServiceBeginW+0x1139 (C:\WINDOWS\System32\WS2_32.dll+0x180009e79)
    #13 0x7ffd4d8e0cd8 in RtlDeactivateActivationContext+0x2c8 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x180070cd8)
    #14 0x7ffd4d8c31b9 in TpReleaseCleanupGroupMembers+0xad9 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800531b9)
    #15 0x7ffd4d717343 in BaseThreadInitThunk+0x13 (C:\WINDOWS\System32\KERNEL32.DLL+0x180017343)
    #16 0x7ffd4d8c26b0 in RtlUserThreadStart+0x20 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800526b0)

0x12184f8b06cc is located 140 bytes inside of 180-byte region [0x12184f8b0640,0x12184f8b06f4)
freed by thread T0 here:
    #0 0x7ffbe4b4f585 in _asan_wrap_RtlFreeHeap+0x4d5 (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x18003f585)
    #1 0x7ffd4a359747 in DnsApiFree+0x37 (C:\WINDOWS\SYSTEM32\DNSAPI.dll+0x180019747)
    #2 0x7ffd4a66cfff in Tcpip6_WSHGetWildcardSockaddr+0xc48f (C:\WINDOWS\system32\mswsock.dll+0x18001cfff)
    #3 0x7ffd4a65f6f6 in Tcpip6_WSHAddressToString+0xe6 (C:\WINDOWS\system32\mswsock.dll+0x18000f6f6)
    #4 0x7ffd4d7c1b41 in Ordinal487+0x1b41 (C:\WINDOWS\System32\WS2_32.dll+0x180001b41)
    #5 0x7ffd4d7d3350 in WSACreateEvent+0xf0 (C:\WINDOWS\System32\WS2_32.dll+0x180013350)
    #6 0x7ffd4d7d022b in WSCEnumProtocols+0x97b (C:\WINDOWS\System32\WS2_32.dll+0x18001022b)
    #7 0x7ffd4d7cfb33 in WSCEnumProtocols+0x283 (C:\WINDOWS\System32\WS2_32.dll+0x18000fb33)
    #8 0x7ffd4d7d03d9 in WahEnumerateHandleContexts+0xf9 (C:\WINDOWS\System32\WS2_32.dll+0x1800103d9)
    #9 0x7ffd4d7d04f5 in WSACleanup+0xe5 (C:\WINDOWS\System32\WS2_32.dll+0x1800104f5)
    #10 0x7ffbe6a5a472 in Curl_win32_cleanup C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\system_win32.c:164
    #11 0x7ffbe6a3eaba in curl_global_cleanup C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\easy.c:291
    ...
    #19 0x7ff65bae75ab in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288    #20 0x7ffd4d717343 in BaseThreadInitThunk+0x13 (C:\WINDOWS\System32\KERNEL32.DLL+0x180017343)
    #21 0x7ffd4d8c26b0 in RtlUserThreadStart+0x20 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800526b0)

previously allocated by thread T3 here:
    #0 0x7ffbe4b4ed29 in _asan_wrap_RtlAllocateHeap+0x179 (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x18003ed29)
    #1 0x7ffd4a65aafe in WSPStartup+0x129e (C:\WINDOWS\system32\mswsock.dll+0x18000aafe)
    #2 0x7ffd4a65a2a7 in WSPStartup+0xa47 (C:\WINDOWS\system32\mswsock.dll+0x18000a2a7)
    #3 0x7ffd4d7c9089 in WSALookupServiceBeginW+0x349 (C:\WINDOWS\System32\WS2_32.dll+0x180009089)
    #4 0x7ffd4d7c8e48 in WSALookupServiceBeginW+0x108 (C:\WINDOWS\System32\WS2_32.dll+0x180008e48)
    #5 0x7ffd4d7c83bf in GetNameInfoW+0x133f (C:\WINDOWS\System32\WS2_32.dll+0x1800083bf)
    #6 0x7ffd4d7c81c8 in GetNameInfoW+0x1148 (C:\WINDOWS\System32\WS2_32.dll+0x1800081c8)
    #7 0x7ffd4d7c6a76 in GetAddrInfoW+0xf56 (C:\WINDOWS\System32\WS2_32.dll+0x180006a76)
    #8 0x7ffd4d7c4390 in GetAddrInfoExW+0x4e0 (C:\WINDOWS\System32\WS2_32.dll+0x180004390)
    #9 0x7ffbe6a4a755 in init_resolve_thread C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\asyn-thread.c:657
    #10 0x7ffbe6a49cd5 in Curl_resolver_getaddrinfo C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\asyn-thread.c:940
    #11 0x7ffbe6a61049 in Curl_resolv C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\hostip.c:814
    #12 0x7ffbe6a551c6 in resolve_fresh C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\url.c:3297
    #13 0x7ffbe6a535b1 in create_conn C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\url.c:3802
    #14 0x7ffbe6a51979 in Curl_connect C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\url.c:3873
    #15 0x7ffbe6a407b8 in multi_runsingle C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\multi.c:2002
    #16 0x7ffbe6a3fc00 in curl_multi_perform C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\multi.c:2780
    ...
    #20 0x7ffbe6afd455 in thread_start<unsigned int (__cdecl*)(void *),1> minkernel\crts\ucrt\src\appcrt\startup\thread.cpp:97
    #21 0x7ffbe4b5ebde in _asan_default_suppressions__dll+0x122e (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x18004ebde)
    #22 0x7ffd4d717343 in BaseThreadInitThunk+0x13 (C:\WINDOWS\System32\KERNEL32.DLL+0x180017343)
    #23 0x7ffd4d8c26b0 in RtlUserThreadStart+0x20 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800526b0)

Thread T3 created by T0 here:
    #0 0x7ffbe4b60897 in _asan_wrap_CreateThread+0x77 (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x180050897)
    #1 0x7ffbe6afd59e in _beginthreadex minkernel\crts\ucrt\src\appcrt\startup\thread.cpp:209
    ...
    #10 0x7ff65bae75ab in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288    #11 0x7ffd4d717343 in BaseThreadInitThunk+0x13 (C:\WINDOWS\System32\KERNEL32.DLL+0x180017343)
    #12 0x7ffd4d8c26b0 in RtlUserThreadStart+0x20 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800526b0)

SUMMARY: AddressSanitizer: heap-use-after-free (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x1800342c3) in _asan_wrap_memmove+0x193
Shadow bytes around the buggy address:
  0x043b59796080: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x043b59796090: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x043b597960a0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x043b597960b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x043b597960c0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x043b597960d0: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fa
  0x043b597960e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043b597960f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043b59796100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043b59796110: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043b59796120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

What happens here is that a DNS resolver thread created by GetAddrInfoExW did not terminate yet, however curl is already shutting down, deinitializing Winsock with WSACleanup() leading to an access violation.

After investigating recent changes we came to the conclusion that there is most likely a bug in the current async windows DNS resolution code around GetAddrInfoExW implemented in #12482.
We tried a few things to verify this:

  • Use Curl 8.5.0 or older => works
  • Use Curl 8.6.0 or newer => crashes
  • Use Curl 8.6.0 or newer with c-ares => works

Our test looks as follows:

  • Initialize curl, create 1 thread to process all curl multi requests
  • Create a second thread to push work to the first thread
  • Shutdown all the new threads, join them, then deinit curl

We made sure that everything is thread safe by using proper locking mechanisms, the issue is reproducable roughly 60% of the time per run. The bug is most likely the cause of Curl_resolver_kill -> Curl_resolver_cancel() -> destroy_async_data() which then has this code:

curl/lib/asyn-thread.c

Lines 555 to 569 in 0199104

if(!done) {
#ifdef _WIN32
if(td->complete_ev)
CloseHandle(td->complete_ev);
else
#endif
Curl_thread_destroy(td->thread_hnd);
}
else {
#ifdef _WIN32
if(td->complete_ev) {
Curl_GetAddrInfoExCancel(&td->tsd.w8.cancel_ev);
WaitForSingleObject(td->complete_ev, INFINITE);
CloseHandle(td->complete_ev);
}

Closing the completion event handle in the first branch (not done) seems like a bad idea as the next call to Curl_GetAddrInfoExCancel() might not finish instantly. We then would wait for WaitForSingleObject(td->complete_ev, INFINITE); but this will instantly return with an error as the handle is closed. This is however just a theory and not verified yet.

I expected the following

Curl shutdown / multi close properly stops all running DNS requests and doesn't crash.
Tests for the Windows implementation would also be great.

curl/libcurl version

>= curl 8.6.0

operating system

Windows 10 22H2

@Ch40zz Ch40zz changed the title Windows DNS resolution: Curl crash when GetAddrInfoExW callback invoked Windows DNS resolution: Curl crash when GetAddrInfoExW callback invoked on shutdown Apr 30, 2024
@bagder bagder added crash Windows Windows-specific name lookup DNS and related tech help wanted labels Apr 30, 2024
@bagder
Copy link
Member

bagder commented Apr 30, 2024

@pps83 any ideas?

@jay
Copy link
Member

jay commented May 1, 2024

It looks to me like Curl_resolver_kill is supposed to wait for threads to terminate and doesn't do that for the GetAddrInfoExW threads.

curl/lib/asyn-thread.c

Lines 746 to 762 in de7b3e8

/*
* Until we gain a way to signal the resolver threads to stop early, we must
* simply wait for them and ignore their results.
*/
void Curl_resolver_kill(struct Curl_easy *data)
{
struct thread_data *td = data->state.async.tdata;
/* If we're still resolving, we must wait for the threads to fully clean up,
unfortunately. Otherwise, we can simply cancel to clean up any resolver
data. */
if(td && td->thread_hnd != curl_thread_t_null
&& (data->set.quick_exit != 1L))
(void)thread_wait_resolv(data, NULL, FALSE);
else
Curl_resolver_cancel(data);
}

I am not sure how quick exit works in this circumstance. If we don't wait for thread cleanup in resolver_kill then our getaddrinfo thread or the winsock getaddrinfo thread (ie the GetAddrInfoExW async dns thread) can call back into winsock, even after the user calls curl_global_cleanup?

@pps83
Copy link
Contributor

pps83 commented May 2, 2024

The best way for me to fix is to reproduce the error. @Ch40zz is it easy for you to provide some minimal example to repro the issue?

Overall, it was hard to understand how threading/asynch/dns logic worked in the curl, as some things were quite confusing. Basically, I wrote all that code making a logical assumption that winapi creates these threads. Then, for all places where libcurl manages the async data I needed to add equivalent code for winapi.

So, overall, thread_hnd access needs corresponding/equivalent access to complete_ev. Either I didn't do something right, or, libcurl had a bug dealing with thread_hnd and the bug got exposed when equivalent winapi code was used.

Here's the commit that was merged: a6bbc87
I did that code quite a wile ago, I'll just try to analyze the diff.

If you look for all the places complete_ev is accessed you'll see that it's touched at the sample places where thread_hnd gets handled:

  • line 549 Curl_thread_destroy(thread_hnd); internally simply calls CloseHandle(thread_hnd); - I added CloseHandle(complete_ev);
  • line 558 Curl_thread_join(thread_hnd) waits AND closes thread handler (this part is confusing). Equivalent code is WaitForSingleObject(complete_ev) and CloseHandle(complete_ev); Unlike regular threading code, async dns api has an option to cancel the request via Curl_GetAddrInfoExCancel(w8.cancel_ev) which is called before "closing the thread" part.
  • line 707 similar to line 558 above, it simply waits and closes complete_ev.

However, as @jay pointed out, it looks like I had to add code to handle complete_ev case where thread_hnd was touched to make sure thread_wait_resolv was called inside Curl_resolver_kill. This leads me to this PR: #13517

@pps83
Copy link
Contributor

pps83 commented May 2, 2024

Sorry for the lengthy iterative "debugging". I don't know this code, and I didn't touch curl since that PR was merged.

Perhaps, all the quick_exit part is irrelevant for the winapi DNS request, as the cancel request supposedly should cancel the outstanding DNS request without waiting.

@bagder before merging the PR I think a call to cancel the request needs to be added, as it's not logically clear what needs to be added there. What do you think? Here's the alternative impl that ignores quick_exit and cancels the request before waiting: #13518

@Ch40zz can you try these changes with your code to see if the issue gets fixed?

@Ch40zz
Copy link
Author

Ch40zz commented May 2, 2024

#13518 sadly did not fix the issue yet, I posted more info on the PR.
Needs a bit more investigation, we'll try to provide a repro asap.

@Ch40zz
Copy link
Author

Ch40zz commented May 6, 2024

Quick update from us after testing for a day:
PR #13518 is correct and fixes a bug in curl. Our initial bug is still occuring when building with ASAN, however it is not a curl issue anymore. It also occurs when pasting the msdn sample code for GetAddrInfoExW(). We suspect that either ASAN has a bug (false positive) or the API itself is buggy and causes a heap-use-after-free. We will report the bug to Microsoft and hope it gets fixed soon.
So long people building with ASAN will run into the error on shutdown. We will link the bug in the main issue when it has been created.
Thanks for your help!

EDIT: repro with either curl or pure WinAPI generating the exact same stack trace when compiled with ASAN and MSVC can be found here: https://gist.github.com/Ch40zz/f3a33139f35fd608d71db5e4085e0bee

EDIT2: Created a bug on MS Developer Community: https://developercommunity.visualstudio.com/t/ASAN:-heap-use-after-free-in-NdrFullPoin/10654169

@jay jay closed this as completed in 428579f May 7, 2024
@jay
Copy link
Member

jay commented May 7, 2024

Thanks. We'll consider the remaining issue a Windows bug until proven otherwise.

@ioancea
Copy link

ioancea commented Jul 12, 2024

I'm sorry to revive this closed issue but it seems to be the best place to write about my findings with regard to curl and the GetAddrInfoEx async DNS API.

Staring with curl 8.6.0, our product which loads at runtime (LoadLibrary) a DLL that has curl statically linked, crashes in places that point to what @Ch40zz reported to MS.
The basic crash flow is like (Note: make sure to run on Windows 8+ so the GetAddrInfoEx gets used): curl global init --> perform and action (aka curl_easy_perform) that results in DNS timeout --> curl global cleanup. If the above is ran multiple times (in the same process), there are higher chances for the problem to surface even without sanitizers (like ASAN).

After digging into the issue, I'm confident the implementation in curl is correct and respects Microsofts's documentation.
However, the actual MS implementation of the async DNS doesn't look to be properly synchronised, nor doesn't expose enough information for the user to be able to say "all the DNS lookup threads are finished, now it's safe to call WSACleanup".

From my tests, I have seen that, after the "completion" callback (LPLOOKUPSERVICE_COMPLETION_ROUTINE) is triggered, the actual DNS lookup thread might not be done with the work - especially when it's "interrupted" with GetAddrInfoExCancel. In these conditions, WSACleanup can be called while there are still WSA threads running, which currently leads to UB.

To me this definitely looks like a buggy MS implementation (if WSACleanup is supposed to wait for all its threads to finish) or an incomplete API (i.e. because the underlying thread can still access WSA things after the completion callback is triggered, the user should have a way to know when it's really done) and makes it impossible (for our use-case) to use the default curl threaded resolver on Windows. From what I have checked, I couldn't find an option to disable the usage of GetAddrInfoEx in order to stick with the old (curl pre 8.6.0) "one thread per lookup".
Do you think it would be a good idea add such a possibility - aka a compile definition to disable the usage of GetAddrInfoEx API in curl?

@pps83
Copy link
Contributor

pps83 commented Jul 12, 2024

Did you test with curl 8.8.0 to see if you get the same issue?

@ioancea
Copy link

ioancea commented Jul 12, 2024

Yes. I've also tested directly w/o curl, just with the GetAddrInfoEx API - starting from the async example from https://learn.microsoft.com/en-us/windows/win32/api/ws2tcpip/nf-ws2tcpip-getaddrinfoexa#example-code. I also wrote a small comment to https://developercommunity.visualstudio.com/t/ASAN:-heap-use-after-free-in-NdrFullPoin/10654169 which is waiting for moderator approval

@razvan-pricope
Copy link

razvan-pricope commented Sep 2, 2024

Hello, I am adding my observations to this issue. As @ioancea investigated, this seems to be a bug on Microsoft's GetAddrInfoExw / GetAddrInfoExCancel because you cannot be sure that the async thread has really finished executing user code. I see no mentions on the documentation on how to safely cancel & wait an async GetAddrInfoExW and I couldn't find an easy way to do it by manually testing it.
The issue still reproduces with curl 8.9.1 and can be made easier to trigger by adding a very long Sleep after SetEvent (https://github.com/curl/curl/blob/curl-8_9_1/lib/asyn-thread.c#L448)
For example changing the code to:

if (td->complete_ev) {
    SetEvent(td->complete_ev); /* Notify caller that the query completed */
    Sleep(10 * 1000); 
}

After running the modified curl you can see that curl code is not waiting for the spawned thread to complete and the thread is still running code inside curl. There should be a way to completely wait for the async operation to complete (similar on how joining a thread blocks until nothing on that thread is executing anymore).

@razvan-pricope
Copy link

Hello, @bagder, @jay. Sorry for pinging you but you are the only curl maintainers that replied to this issue.
Is there any possibility to revert the changes introduced in #12482? The current implementation is flawed and can crash even with the fix introduced in #13509 as I mentioned above. We are using curl as dll that's loaded on-demand and we can't know when it's safe to unload the dll because WSA callbacks could still be running curl code. As a workaround we internally applied a patch that disables the new resolver on Windows8+ but that's not going to work in the long run.
If it's not possible to revert, maybe an option to disable this new behavior? What is your take on this?

@jay
Copy link
Member

jay commented Sep 4, 2024

Though I consider this a Windows bug and not a bug in curl I think we should consider reverting GetAddrInfoExW use and go back to the threaded resolver. Thoughts?

@Ch40zz
Copy link
Author

Ch40zz commented Sep 4, 2024

Though I consider this a Windows bug and not a bug in curl I think we should consider reverting GetAddrInfoExW use and go back to the threaded resolver. Thoughts?

Although this is a Windows bug I would still expect a library to work correctly, especially if there are alternatives that could be used. In my opinion it would make sense to add a define to enable the buggy feature for people who explicitly need it but they opt in to a well known bug by doing so and accept the risks. Default should never run any bugged code for small performance gains. In the future, should MS ever fix the bug the code could be used again by default after adding an OS version check for when the bug was fully fixed in Windows.

@bagder bagder reopened this Sep 4, 2024
@bagder
Copy link
Member

bagder commented Sep 4, 2024

I think we should consider reverting GetAddrInfoExW use and go back to the threaded resolver

Given that it seems we cannot make it work with the async call, I agree.

it would make sense to add a define to enable the buggy feature for people who explicitly need it

While it could be convenient to some, it also adds a lot of work. At least if we want to make sure that it keeps working.

@pps83
Copy link
Contributor

pps83 commented Sep 5, 2024

@bagder perhaps this can be configurable? Should be OFF by default, but can be enabled with options/flags at runtime? This way, it can be tested with updated OSes if it's fixed. Can be build-time config as well.

The other option: before doing wsacleanup curl could add some sleep (this is obviously isn't a proper fix, but might make internal pool to join threads before doing cleanup. Also, in indicated in test code to trigger the issue it might actually manually signal that the callback is finished at the end of the callback, so that curl cleanup could make sure that we don't try to wsacleanup while async dns callback is in progress.

jay added a commit to jay/curl that referenced this issue Sep 5, 2024
- For the threaded resolver backend on Windows, revert back to
  exclusively use the threaded resolver with libcurl-owned threading
  instead of GetAddrInfoExW with Windows-owned threading.

Winsock (the Windows sockets library) has a bug where it does not wait
for all of the name resolver threads it is managing to terminate before
returning from WSACleanup. The threads continue to run and may cause a
crash.

This commit is effectively a revert of several commits that encompass
all GetAddrInfoExW code in libcurl. A manual review of merge conflicts
was used to resolve minor changes that had modified the code for
aesthetic or build reasons in other commits.

Prior to this change if libcurl was built with the threaded resolver
backend for Windows, and Windows 8 or later was the operating system at
runtime, and the caller was not impersonating another user, then libcurl
would use GetAddrInfoExW to handle asynchronous name lookups.

GetAddrInfoExW support was added in a6bbc87, which preceded 8.6.0, and
prior to that the threaded resolver backend used libcurl-owned threading
exclusively on Windows.

Reported-by: Ionuț-Francisc Oancea
Reported-by: Razvan Pricope

Ref: https://developercommunity.visualstudio.com/t/ASAN:-heap-use-after-free-in-NdrFullPoin/10654169

Fixes curl#13509 (comment)
Closes #xxxx

---

Revert "asyn-thread: avoid using GetAddrInfoExW with impersonation"

This reverts commit 0caadc1.

Conflicts:
	lib/system_win32.c

--

Revert "asyn-thread: fix curl_global_cleanup crash in Windows"

This reverts commit 428579f.

--

Revert "system_win32: fix a function pointer assignment warning"

This reverts commit 26f002e.

--

Revert "asyn-thread: use GetAddrInfoExW on >= Windows 8"

This reverts commit a6bbc87.

Conflicts:
	lib/asyn-thread.c
	lib/system_win32.c

--
@jay
Copy link
Member

jay commented Sep 5, 2024

Sorry but I'm proposing it be removed entirely. #14794.

@pps83
Copy link
Contributor

pps83 commented Sep 5, 2024

Sorry but I'm proposing it be removed entirely. #14794.

FYI, any code that uses libcurl will fail with asan because of excessive threads created without internet connection. Negative dns store would fix the issue though (PR wasn't completed for some reason).

@jay
Copy link
Member

jay commented Sep 6, 2024

FYI, any code that uses libcurl will fail with asan because of excessive threads created without internet connection.

Is there an issue for this and can you give me a link. Is it another bug in Winsock or a bug in curl? Does it cause a crash?

@jay jay closed this as completed in eb8ad66 Sep 8, 2024
@pps83
Copy link
Contributor

pps83 commented Sep 8, 2024

Is there an issue for this and can you give me a link. Is it another bug in Winsock or a bug in curl? Does it cause a crash?

Initially, I patched libcurl to avoid the issue described in #12481. In short, builds with asan that make lots of requests easy run out of ram because of too many threads created (I'm not talking about concurrent threads here). It takes only 20K requests on x86 to crash with OOM error (or 200K on x64). I tried it on windows only, but it's possible the issue exists on other OSes when running with asan (the easiest way to test if the issue exists is to run sample program I reported to microsoft: https://developercommunity.visualstudio.com/t/ASAN-causes-OOM-error-when-many-threads/10508715).

@davidmrdavid
Copy link

Hi there, I'm from the msvc ASan team, found this thread through the linked bug reports on "developer community". At first glance, I agree with much of the analysis here that it seems something is buggy with the Winsock API (though please note I'm no authority on that). @ioancea, just an FYI that I asked you about your ASan-less reproducer here: https://developercommunity.visualstudio.com/t/ASAN:-heap-use-after-free-in-NdrFullPoin/10654169

Also took a brief peek at your bug report @pps83 (https://developercommunity.visualstudio.com/t/ASAN-causes-OOM-error-when-many-threads/10508715). But don't mean to hijack this thread further so we can continue the conversation in the devcommunity forums. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment