Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows DNS resolution: Curl crash when GetAddrInfoExW callback invoked on shutdown #13509

Closed
Ch40zz opened this issue Apr 30, 2024 · 7 comments
Labels
crash help wanted name lookup DNS and related tech Windows Windows-specific

Comments

@Ch40zz
Copy link

Ch40zz commented Apr 30, 2024

I did this

We are using libcurl in our C++ project on windows 10 and noticed strange issues after upgrading our curl from 8.3.0 to 8.6.0 or later. After spending some time reproducing the issue, we managed to get an ASAN stacktrace of the exact crash:

=================================================================
==41196==ERROR: AddressSanitizer: heap-use-after-free on address 0x12184f8b06cc at pc 0x7ffbe4b442c4 bp 0x00c7a3dfdcf0 sp 0x00c7a3dfd480
READ of size 36 at 0x12184f8b06cc thread T-1
==41196==WARNING: Failed to use and restart external symbolizer!
++++++++++++++++++++++++++++++++++++++++deinit
    #0 0x7ffbe4b442c3 in _asan_wrap_memmove+0x193 (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x1800342c3)
    #1 0x7ffd4bc46936 in NdrFullPointerFree+0xd36 (C:\WINDOWS\System32\RPCRT4.dll+0x180006936)
    #2 0x7ffd4bd1d87c in NdrMesSimpleTypeEncodeAll+0x6e2c (C:\WINDOWS\System32\RPCRT4.dll+0x1800dd87c)
    #3 0x7ffd4bd1aa74 in NdrMesSimpleTypeEncodeAll+0x4024 (C:\WINDOWS\System32\RPCRT4.dll+0x1800daa74)
    #4 0x7ffd4bd1dc6f in NdrClientCall3+0xef (C:\WINDOWS\System32\RPCRT4.dll+0x1800ddc6f)
    #5 0x7ffd4a348fb0 in DnsGetAdaptersInfo+0xe40 (C:\WINDOWS\SYSTEM32\DNSAPI.dll+0x180008fb0)
    #6 0x7ffd4a3486bf in DnsGetAdaptersInfo+0x54f (C:\WINDOWS\SYSTEM32\DNSAPI.dll+0x1800086bf)
    #7 0x7ffd4a34c616 in DnsQueryEx+0x9b6 (C:\WINDOWS\SYSTEM32\DNSAPI.dll+0x18000c616)
    #8 0x7ffd4a34bdc5 in DnsQueryEx+0x165 (C:\WINDOWS\SYSTEM32\DNSAPI.dll+0x18000bdc5)
    #9 0x7ffd4a65b162 in WSPStartup+0x1902 (C:\WINDOWS\system32\mswsock.dll+0x18000b162)
    #10 0x7ffd4a65af33 in WSPStartup+0x16d3 (C:\WINDOWS\system32\mswsock.dll+0x18000af33)
    #11 0x7ffd4a65a60d in WSPStartup+0xdad (C:\WINDOWS\system32\mswsock.dll+0x18000a60d)
    #12 0x7ffd4d7c9e79 in WSALookupServiceBeginW+0x1139 (C:\WINDOWS\System32\WS2_32.dll+0x180009e79)
    #13 0x7ffd4d8e0cd8 in RtlDeactivateActivationContext+0x2c8 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x180070cd8)
    #14 0x7ffd4d8c31b9 in TpReleaseCleanupGroupMembers+0xad9 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800531b9)
    #15 0x7ffd4d717343 in BaseThreadInitThunk+0x13 (C:\WINDOWS\System32\KERNEL32.DLL+0x180017343)
    #16 0x7ffd4d8c26b0 in RtlUserThreadStart+0x20 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800526b0)

0x12184f8b06cc is located 140 bytes inside of 180-byte region [0x12184f8b0640,0x12184f8b06f4)
freed by thread T0 here:
    #0 0x7ffbe4b4f585 in _asan_wrap_RtlFreeHeap+0x4d5 (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x18003f585)
    #1 0x7ffd4a359747 in DnsApiFree+0x37 (C:\WINDOWS\SYSTEM32\DNSAPI.dll+0x180019747)
    #2 0x7ffd4a66cfff in Tcpip6_WSHGetWildcardSockaddr+0xc48f (C:\WINDOWS\system32\mswsock.dll+0x18001cfff)
    #3 0x7ffd4a65f6f6 in Tcpip6_WSHAddressToString+0xe6 (C:\WINDOWS\system32\mswsock.dll+0x18000f6f6)
    #4 0x7ffd4d7c1b41 in Ordinal487+0x1b41 (C:\WINDOWS\System32\WS2_32.dll+0x180001b41)
    #5 0x7ffd4d7d3350 in WSACreateEvent+0xf0 (C:\WINDOWS\System32\WS2_32.dll+0x180013350)
    #6 0x7ffd4d7d022b in WSCEnumProtocols+0x97b (C:\WINDOWS\System32\WS2_32.dll+0x18001022b)
    #7 0x7ffd4d7cfb33 in WSCEnumProtocols+0x283 (C:\WINDOWS\System32\WS2_32.dll+0x18000fb33)
    #8 0x7ffd4d7d03d9 in WahEnumerateHandleContexts+0xf9 (C:\WINDOWS\System32\WS2_32.dll+0x1800103d9)
    #9 0x7ffd4d7d04f5 in WSACleanup+0xe5 (C:\WINDOWS\System32\WS2_32.dll+0x1800104f5)
    #10 0x7ffbe6a5a472 in Curl_win32_cleanup C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\system_win32.c:164
    #11 0x7ffbe6a3eaba in curl_global_cleanup C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\easy.c:291
    ...
    #19 0x7ff65bae75ab in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288    #20 0x7ffd4d717343 in BaseThreadInitThunk+0x13 (C:\WINDOWS\System32\KERNEL32.DLL+0x180017343)
    #21 0x7ffd4d8c26b0 in RtlUserThreadStart+0x20 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800526b0)

previously allocated by thread T3 here:
    #0 0x7ffbe4b4ed29 in _asan_wrap_RtlAllocateHeap+0x179 (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x18003ed29)
    #1 0x7ffd4a65aafe in WSPStartup+0x129e (C:\WINDOWS\system32\mswsock.dll+0x18000aafe)
    #2 0x7ffd4a65a2a7 in WSPStartup+0xa47 (C:\WINDOWS\system32\mswsock.dll+0x18000a2a7)
    #3 0x7ffd4d7c9089 in WSALookupServiceBeginW+0x349 (C:\WINDOWS\System32\WS2_32.dll+0x180009089)
    #4 0x7ffd4d7c8e48 in WSALookupServiceBeginW+0x108 (C:\WINDOWS\System32\WS2_32.dll+0x180008e48)
    #5 0x7ffd4d7c83bf in GetNameInfoW+0x133f (C:\WINDOWS\System32\WS2_32.dll+0x1800083bf)
    #6 0x7ffd4d7c81c8 in GetNameInfoW+0x1148 (C:\WINDOWS\System32\WS2_32.dll+0x1800081c8)
    #7 0x7ffd4d7c6a76 in GetAddrInfoW+0xf56 (C:\WINDOWS\System32\WS2_32.dll+0x180006a76)
    #8 0x7ffd4d7c4390 in GetAddrInfoExW+0x4e0 (C:\WINDOWS\System32\WS2_32.dll+0x180004390)
    #9 0x7ffbe6a4a755 in init_resolve_thread C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\asyn-thread.c:657
    #10 0x7ffbe6a49cd5 in Curl_resolver_getaddrinfo C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\asyn-thread.c:940
    #11 0x7ffbe6a61049 in Curl_resolv C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\hostip.c:814
    #12 0x7ffbe6a551c6 in resolve_fresh C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\url.c:3297
    #13 0x7ffbe6a535b1 in create_conn C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\url.c:3802
    #14 0x7ffbe6a51979 in Curl_connect C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\url.c:3873
    #15 0x7ffbe6a407b8 in multi_runsingle C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\multi.c:2002
    #16 0x7ffbe6a3fc00 in curl_multi_perform C:\vcpkg\buildtrees\curl\src\curl-8_6_0-f17d00bdb6.clean\lib\multi.c:2780
    ...
    #20 0x7ffbe6afd455 in thread_start<unsigned int (__cdecl*)(void *),1> minkernel\crts\ucrt\src\appcrt\startup\thread.cpp:97
    #21 0x7ffbe4b5ebde in _asan_default_suppressions__dll+0x122e (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x18004ebde)
    #22 0x7ffd4d717343 in BaseThreadInitThunk+0x13 (C:\WINDOWS\System32\KERNEL32.DLL+0x180017343)
    #23 0x7ffd4d8c26b0 in RtlUserThreadStart+0x20 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800526b0)

Thread T3 created by T0 here:
    #0 0x7ffbe4b60897 in _asan_wrap_CreateThread+0x77 (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x180050897)
    #1 0x7ffbe6afd59e in _beginthreadex minkernel\crts\ucrt\src\appcrt\startup\thread.cpp:209
    ...
    #10 0x7ff65bae75ab in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288    #11 0x7ffd4d717343 in BaseThreadInitThunk+0x13 (C:\WINDOWS\System32\KERNEL32.DLL+0x180017343)
    #12 0x7ffd4d8c26b0 in RtlUserThreadStart+0x20 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800526b0)

SUMMARY: AddressSanitizer: heap-use-after-free (C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\clang_rt.asan_dynamic-x86_64.dll+0x1800342c3) in _asan_wrap_memmove+0x193
Shadow bytes around the buggy address:
  0x043b59796080: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x043b59796090: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x043b597960a0: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
  0x043b597960b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x043b597960c0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
=>0x043b597960d0: fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fa
  0x043b597960e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043b597960f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043b59796100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043b59796110: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x043b59796120: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

What happens here is that a DNS resolver thread created by GetAddrInfoExW did not terminate yet, however curl is already shutting down, deinitializing Winsock with WSACleanup() leading to an access violation.

After investigating recent changes we came to the conclusion that there is most likely a bug in the current async windows DNS resolution code around GetAddrInfoExW implemented in #12482.
We tried a few things to verify this:

  • Use Curl 8.5.0 or older => works
  • Use Curl 8.6.0 or newer => crashes
  • Use Curl 8.6.0 or newer with c-ares => works

Our test looks as follows:

  • Initialize curl, create 1 thread to process all curl multi requests
  • Create a second thread to push work to the first thread
  • Shutdown all the new threads, join them, then deinit curl

We made sure that everything is thread safe by using proper locking mechanisms, the issue is reproducable roughly 60% of the time per run. The bug is most likely the cause of Curl_resolver_kill -> Curl_resolver_cancel() -> destroy_async_data() which then has this code:

curl/lib/asyn-thread.c

Lines 555 to 569 in 0199104

if(!done) {
#ifdef _WIN32
if(td->complete_ev)
CloseHandle(td->complete_ev);
else
#endif
Curl_thread_destroy(td->thread_hnd);
}
else {
#ifdef _WIN32
if(td->complete_ev) {
Curl_GetAddrInfoExCancel(&td->tsd.w8.cancel_ev);
WaitForSingleObject(td->complete_ev, INFINITE);
CloseHandle(td->complete_ev);
}

Closing the completion event handle in the first branch (not done) seems like a bad idea as the next call to Curl_GetAddrInfoExCancel() might not finish instantly. We then would wait for WaitForSingleObject(td->complete_ev, INFINITE); but this will instantly return with an error as the handle is closed. This is however just a theory and not verified yet.

I expected the following

Curl shutdown / multi close properly stops all running DNS requests and doesn't crash.
Tests for the Windows implementation would also be great.

curl/libcurl version

>= curl 8.6.0

operating system

Windows 10 22H2

@Ch40zz Ch40zz changed the title Windows DNS resolution: Curl crash when GetAddrInfoExW callback invoked Windows DNS resolution: Curl crash when GetAddrInfoExW callback invoked on shutdown Apr 30, 2024
@bagder bagder added crash Windows Windows-specific name lookup DNS and related tech help wanted labels Apr 30, 2024
@bagder
Copy link
Member

bagder commented Apr 30, 2024

@pps83 any ideas?

@jay
Copy link
Member

jay commented May 1, 2024

It looks to me like Curl_resolver_kill is supposed to wait for threads to terminate and doesn't do that for the GetAddrInfoExW threads.

curl/lib/asyn-thread.c

Lines 746 to 762 in de7b3e8

/*
* Until we gain a way to signal the resolver threads to stop early, we must
* simply wait for them and ignore their results.
*/
void Curl_resolver_kill(struct Curl_easy *data)
{
struct thread_data *td = data->state.async.tdata;
/* If we're still resolving, we must wait for the threads to fully clean up,
unfortunately. Otherwise, we can simply cancel to clean up any resolver
data. */
if(td && td->thread_hnd != curl_thread_t_null
&& (data->set.quick_exit != 1L))
(void)thread_wait_resolv(data, NULL, FALSE);
else
Curl_resolver_cancel(data);
}

I am not sure how quick exit works in this circumstance. If we don't wait for thread cleanup in resolver_kill then our getaddrinfo thread or the winsock getaddrinfo thread (ie the GetAddrInfoExW async dns thread) can call back into winsock, even after the user calls curl_global_cleanup?

@pps83
Copy link
Contributor

pps83 commented May 2, 2024

The best way for me to fix is to reproduce the error. @Ch40zz is it easy for you to provide some minimal example to repro the issue?

Overall, it was hard to understand how threading/asynch/dns logic worked in the curl, as some things were quite confusing. Basically, I wrote all that code making a logical assumption that winapi creates these threads. Then, for all places where libcurl manages the async data I needed to add equivalent code for winapi.

So, overall, thread_hnd access needs corresponding/equivalent access to complete_ev. Either I didn't do something right, or, libcurl had a bug dealing with thread_hnd and the bug got exposed when equivalent winapi code was used.

Here's the commit that was merged: a6bbc87
I did that code quite a wile ago, I'll just try to analyze the diff.

If you look for all the places complete_ev is accessed you'll see that it's touched at the sample places where thread_hnd gets handled:

  • line 549 Curl_thread_destroy(thread_hnd); internally simply calls CloseHandle(thread_hnd); - I added CloseHandle(complete_ev);
  • line 558 Curl_thread_join(thread_hnd) waits AND closes thread handler (this part is confusing). Equivalent code is WaitForSingleObject(complete_ev) and CloseHandle(complete_ev); Unlike regular threading code, async dns api has an option to cancel the request via Curl_GetAddrInfoExCancel(w8.cancel_ev) which is called before "closing the thread" part.
  • line 707 similar to line 558 above, it simply waits and closes complete_ev.

However, as @jay pointed out, it looks like I had to add code to handle complete_ev case where thread_hnd was touched to make sure thread_wait_resolv was called inside Curl_resolver_kill. This leads me to this PR: #13517

@pps83
Copy link
Contributor

pps83 commented May 2, 2024

Sorry for the lengthy iterative "debugging". I don't know this code, and I didn't touch curl since that PR was merged.

Perhaps, all the quick_exit part is irrelevant for the winapi DNS request, as the cancel request supposedly should cancel the outstanding DNS request without waiting.

@bagder before merging the PR I think a call to cancel the request needs to be added, as it's not logically clear what needs to be added there. What do you think? Here's the alternative impl that ignores quick_exit and cancels the request before waiting: #13518

@Ch40zz can you try these changes with your code to see if the issue gets fixed?

@Ch40zz
Copy link
Author

Ch40zz commented May 2, 2024

#13518 sadly did not fix the issue yet, I posted more info on the PR.
Needs a bit more investigation, we'll try to provide a repro asap.

@Ch40zz
Copy link
Author

Ch40zz commented May 6, 2024

Quick update from us after testing for a day:
PR #13518 is correct and fixes a bug in curl. Our initial bug is still occuring when building with ASAN, however it is not a curl issue anymore. It also occurs when pasting the msdn sample code for GetAddrInfoExW(). We suspect that either ASAN has a bug (false positive) or the API itself is buggy and causes a heap-use-after-free. We will report the bug to Microsoft and hope it gets fixed soon.
So long people building with ASAN will run into the error on shutdown. We will link the bug in the main issue when it has been created.
Thanks for your help!

EDIT: repro with either curl or pure WinAPI generating the exact same stack trace when compiled with ASAN and MSVC can be found here: https://gist.github.com/Ch40zz/f3a33139f35fd608d71db5e4085e0bee

EDIT2: Created a bug on MS Developer Community: https://developercommunity.visualstudio.com/t/ASAN:-heap-use-after-free-in-NdrFullPoin/10654169

@jay jay closed this as completed in 428579f May 7, 2024
@jay
Copy link
Member

jay commented May 7, 2024

Thanks. We'll consider the remaining issue a Windows bug until proven otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash help wanted name lookup DNS and related tech Windows Windows-specific
4 participants