Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ares_gethostbyname Appears to Leak from an 'ares_strdup' at the Conclusion of a Successful Query #439

Closed
gerickson opened this issue Nov 9, 2021 · 19 comments

Comments

@gerickson
Copy link

gerickson commented Nov 9, 2021

When invoking ares_gethostbyname for a forward DNS name-to-address lookup, there seems to be a leak from ares_strdup, as reported by the clang/LLVM address sanitizer:

Making check in examples
Making check in CFHost
make  CFHostExample
  CCLD     CFHostExample
/bin/bash ../../libtool --mode execute ./CFHostExample
Asynchronous lookups...
By name 'localhost' (Forward DNS)...
    Resolved names:
        0: localhost
    Resolved addresses:
        0: ::1
        1: 127.0.0.1

=================================================================
==1551143==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 10 byte(s) in 1 object(s) allocated from:
    #0 0x7fac81a14bc8 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
    #1 0x7fac81325487 in ares_strdup (/usr/local/lib/libcares.so.2+0x10487)

This was/is demonstrated against c-ares-1.18.1 using https://github.com/gerickson/opencfnetwork/blob/user/gerickson/github-issue-8-forward-dns-deadlock/third_party/CFNetwork/repo/Host/CFHost.c#L2770 along with the example application https://github.com/gerickson/opencfnetwork/blob/user/gerickson/github-issue-8-forward-dns-deadlock/examples/CFHost/CFHostExample.c.

@bradh352
Copy link
Member

bradh352 commented Nov 9, 2021

ares_getnameinfo() is reverse not forward. Are you sure you're looking at the right path?

ares_getaddrinfo() is the forward query (or ares_gethostbyname(), which these days internally calls ares_getaddrinfo()).

Our test suite does run through Leak Sanitizer, and it's not showing anything:
https://cirrus-ci.com/task/4544613408047104

Can you provide the leak sanitizer output with debug symbols and the full trace?

@bradh352
Copy link
Member

bradh352 commented Nov 9, 2021

Also, I'm assuming you mean 1.18.1 not 1.8.1 (as 1.8.1 never existed).

And hquery->name is free'd immediately before hquery in the 2 places where hquery itself is freed. So I really don't see any way that only hquery->name could be left hanging around, but you didn't provide the full leak output so maybe there is more relevant information in that.

@bradh352
Copy link
Member

bradh352 commented Nov 9, 2021

I'm only seeing references to ares_gethostbyname() and not ares_getaddrinfo() in the code, but it looks like you convert the ares_gethostbyname() result into an addrinfo result. You might consider using ares_getaddrinfo(), it was added in the last couple of years.

@gvanem
Copy link
Contributor

gvanem commented Nov 9, 2021

Our test suite does run through Leak Sanitizer, and it's not showing anything:

Using Visual Leak Detector on Windows/MSVC, I see 7 leaks in ares-test.exe only.
None from any src/lib/*.c files. Here is the VLD report.

@bradh352
Copy link
Member

bradh352 commented Nov 9, 2021

Yeah, those look to be leaks inside of the test framework gmock itself as far as I can tell. I guess specific to windows since we don't see them on Linux.

@gvanem
Copy link
Contributor

gvanem commented Nov 9, 2021

The comment in gmock-test-all.cc:

// Use the RAII idiom to flag mem allocs that are intentionally never
// deallocated. The motivation is to silence the false positive mem leaks
// that are reported by the debug version of MS's CRT which can only detect
// if an alloc is missing a matching deallocation.

is for some other leaks AFAICS. I used Release mode with VLD.

@gerickson gerickson changed the title ares_getnameinfo Appears to Leak 'hquery->name' at the Conclusion of a Successful Query ares_getnameinfo Appears to Leak from an 'ares_strdup' at the Conclusion of a Successful Query Nov 9, 2021
@gerickson
Copy link
Author

I'm only seeing references to ares_gethostbyname() and not ares_getaddrinfo() in the code, but it looks like you convert the ares_gethostbyname() result into an addrinfo result. You might consider using ares_getaddrinfo(), it was added in the last couple of years.

Confirmed; however, I was trying to ensure package compatibility with Ubuntu 20.04 which offers c-ares-1.15.0 which lacks ares_getaddrinfo.

@gerickson
Copy link
Author

The attached test program also demonstrates the issue and zeros in with valgrind more closely than with the clang/LLVM address sanitizer:

% clang++-11 -Werror -g -O0 -I/usr/local/include ares-test.cpp -L/usr/local/lib -lcares -o ares-test && ./ares-test localhost       
--> main argc 2 argv 0x7ffce792d4a8
argc 2 argv 0x7ffce792d4a8 optind 1
C-Ares Version: 1.18.1
Looking up 'localhost' w/ DNS...
--> hostby_completed_cb arg 0xca4eb0 status 0 timeouts 0 hostent 0xcb7560
hostname: localhost
127.0.0.1
<-- hostby_completed_cb 
--> hostby_completed_cb arg 0xca4eb0 status 0 timeouts 0 hostent 0xcb75f0
hostname: localhost
::1
<-- hostby_completed_cb 
ares_fds: nfds 0
ares_fds: nfds 0
<-- main 

% valgrind --leak-check=full ./ares-test localhost
==1601959== Memcheck, a memory error detector
==1601959== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1601959== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1601959== Command: ./ares-test localhost
==1601959== 
--> main argc 2 argv 0x1ffefffdf8
argc 2 argv 0x1ffefffdf8 optind 1
C-Ares Version: 1.18.1
Looking up 'localhost' w/ DNS...
--> hostby_completed_cb arg 0x4dcec80 status 0 timeouts 0 hostent 0x4de4fc0
hostname: localhost
127.0.0.1
<-- hostby_completed_cb 
--> hostby_completed_cb arg 0x4dcec80 status 0 timeouts 0 hostent 0x4de6910
hostname: localhost
::1
<-- hostby_completed_cb 
ares_fds: nfds 0
ares_fds: nfds 0
<-- main 
==1601959== 
==1601959== HEAP SUMMARY:
==1601959==     in use at exit: 10 bytes in 1 blocks
==1601959==   total heap usage: 51 allocs, 50 frees, 167,136 bytes allocated
==1601959== 
==1601959== 10 bytes in 1 blocks are definitely lost in loss record 1 of 1
==1601959==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1601959==    by 0x4872487: ares_strdup (in /usr/local/lib/libcares.so.2.5.1)
==1601959==    by 0x4865F37: ares__readaddrinfo (in /usr/local/lib/libcares.so.2.5.1)
==1601959==    by 0x4868065: next_lookup (in /usr/local/lib/libcares.so.2.5.1)
==1601959==    by 0x48689BC: ares_getaddrinfo (in /usr/local/lib/libcares.so.2.5.1)
==1601959==    by 0x486954A: ares_gethostbyname (in /usr/local/lib/libcares.so.2.5.1)
==1601959==    by 0x40163E: main (ares-test.cpp:281)
==1601959== 
==1601959== LEAK SUMMARY:
==1601959==    definitely lost: 10 bytes in 1 blocks
==1601959==    indirectly lost: 0 bytes in 0 blocks
==1601959==      possibly lost: 0 bytes in 0 blocks
==1601959==    still reachable: 0 bytes in 0 blocks
==1601959==         suppressed: 0 bytes in 0 blocks
==1601959== 
==1601959== For lists of detected and suppressed errors, rerun with: -s
==1601959== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

@gerickson gerickson changed the title ares_getnameinfo Appears to Leak from an 'ares_strdup' at the Conclusion of a Successful Query ares_gethostbyname Appears to Leak from an 'ares_strdup' at the Conclusion of a Successful Query Nov 9, 2021
@bradh352
Copy link
Member

bradh352 commented Nov 9, 2021

It looks like your test case needs MacOS. But on my MacOS instance, leak sanitizer doesn't work with the XCode compiler, and valgrind says it requires linux when trying to install via homebrew. I don't suppose you have a test case that will work on Linux do you?

@bradh352
Copy link
Member

bradh352 commented Nov 9, 2021

Actually, installing llvm via homebrew and using that compiler does allow -fsanitize=leak ... but I'm getting no output indicating a leak when running your test.

I found this https://github.com/LouisBrunner/valgrind-macos so I'll try to use that valgrind package once its done compiling.

@bradh352
Copy link
Member

bradh352 commented Nov 9, 2021

Nope, looks like that valgrind doesn't work on MacOS 11.6 which is what I have access to right now, valgrind itself throws an internal assert.

Can you compile c-ares with debug symbols so hopefully valgrind will give a line number within c-ares for the leak?

The leak you seem to be reporting is actually from reading /etc/hosts when looking up 'localhost'. Do you also get a leak with the same utility if you try looking up a remote host? If not, can you provide your /etc/hosts file? It may be something specific inside of that causing the leak.

@gerickson
Copy link
Author

gerickson commented Nov 9, 2021

It looks like your test case needs MacOS. But on my MacOS instance, leak sanitizer doesn't work with the XCode compiler, and valgrind says it requires linux when trying to install via homebrew. I don't suppose you have a test case that will work on Linux do you?

All of the above was generated on Ubuntu 20.04:

% lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.3 LTS
Release:	20.04
Codename:	focal

% uname -a
Linux ubuntu 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

The clang/LLVM and valgrind package are:

% dpkg -l | grep clang-11
ii  clang-11                                   1:11.1.0~++20210717113923+1fdec59bffc1-1~exp1~20210717214604.165 amd64        C, C++ and Objective-C compiler

% dpkg -l | grep valgrind
ii  valgrind                                   1:3.15.0-1ubuntu9.1                                              amd64        instrumentation framework for building dynamic analysis tools

@gerickson
Copy link
Author

Nope, looks like that valgrind doesn't work on MacOS 11.6 which is what I have access to right now, valgrind itself throws an internal assert.

Can you compile c-ares with debug symbols so hopefully valgrind will give a line number within c-ares for the leak?

The leak you seem to be reporting is actually from reading /etc/hosts when looking up 'localhost'. Do you also get a leak with the same utility if you try looking up a remote host? If not, can you provide your /etc/hosts file? It may be something specific inside of that causing the leak.

There are no leaks with an external, DNS-resolved host:

% valgrind --leak-check=full ./ares-test dns.google.com
==3849== Memcheck, a memory error detector
==3849== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3849== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==3849== Command: ./ares-test dns.google.com
==3849== 
--> main argc 2 argv 0x1ffefffeb8
argc 2 argv 0x1ffefffeb8 optind 1
C-Ares Version: 1.18.1
Looking up 'dns.google.com' w/ DNS...
--> sock_state_cb data 0x1ffefffd90 socket_fd 3 readable 1 writable 0
<-- sock_state_cb 
ares_getsock: socket 3 is readable
ares_fds: nfds 4
ares_fds: socket 3 is readable
ares_fds: nfds 4
tvp 4.989000
count 1
--> hostby_completed_cb arg 0x4dcec80 status 0 timeouts 0 hostent 0x4de6c20
hostname: dns.google.com
2001:4860:4860::8888
2001:4860:4860::8844
<-- hostby_completed_cb 
--> hostby_completed_cb arg 0x4dcec80 status 0 timeouts 0 hostent 0x4de7190
hostname: dns.google.com
8.8.4.4
8.8.8.8
<-- hostby_completed_cb 
--> sock_state_cb data 0x1ffefffd90 socket_fd 3 readable 0 writable 0
<-- sock_state_cb 
ares_fds: nfds 0
<-- main 
==3849== 
==3849== HEAP SUMMARY:
==3849==     in use at exit: 0 bytes in 0 blocks
==3849==   total heap usage: 70 allocs, 70 frees, 168,068 bytes allocated
==3849== 
==3849== All heap blocks were freed -- no leaks are possible
==3849== 
==3849== For lists of detected and suppressed errors, rerun with: -s
==3849== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The contents of /etc/hosts are:

% more /etc/hosts
127.0.0.1   localhost
127.0.1.1   ubuntu

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

@gerickson
Copy link
Author

As requested, here is a valgrind trace with debug symbols enabled in libcares:

% valgrind --leak-check=full ./ares-test localhost
==23991== Memcheck, a memory error detector
==23991== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==23991== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==23991== Command: ./ares-test localhost
==23991== 
--> main argc 2 argv 0x1ffefffec8
argc 2 argv 0x1ffefffec8 optind 1
C-Ares Version: 1.18.1
Looking up 'localhost' w/ DNS...
--> hostby_completed_cb arg 0x4dd5c80 status 0 timeouts 0 hostent 0x4debfc0
hostname: localhost
127.0.0.1
<-- hostby_completed_cb 
--> hostby_completed_cb arg 0x4dd5c80 status 0 timeouts 0 hostent 0x4ded910
hostname: localhost
::1
<-- hostby_completed_cb 
ares_fds: nfds 0
ares_fds: nfds 0
<-- main 
==23991== 
==23991== HEAP SUMMARY:
==23991==     in use at exit: 10 bytes in 1 blocks
==23991==   total heap usage: 51 allocs, 50 frees, 167,136 bytes allocated
==23991== 
==23991== 10 bytes in 1 blocks are definitely lost in loss record 1 of 1
==23991==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23991==    by 0x48710C4: default_malloc (ares_library_init.c:48)
==23991==    by 0x4879C8E: ares_strdup (ares_strdup.c:33)
==23991==    by 0x4867739: ares__readaddrinfo (ares__readaddrinfo.c:62)
==23991==    by 0x486AE7E: file_lookup (ares_getaddrinfo.c:477)
==23991==    by 0x486AF78: next_lookup (ares_getaddrinfo.c:513)
==23991==    by 0x486B529: ares_getaddrinfo (ares_getaddrinfo.c:698)
==23991==    by 0x486C4F0: ares_gethostbyname (ares_gethostbyname.c:113)
==23991==    by 0x40163E: main (ares-test.cpp:281)
==23991== 
==23991== LEAK SUMMARY:
==23991==    definitely lost: 10 bytes in 1 blocks
==23991==    indirectly lost: 0 bytes in 0 blocks
==23991==      possibly lost: 0 bytes in 0 blocks
==23991==    still reachable: 0 bytes in 0 blocks
==23991==         suppressed: 0 bytes in 0 blocks
==23991== 
==23991== For lists of detected and suppressed errors, rerun with: -s
==23991== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

@bradh352
Copy link
Member

bradh352 commented Nov 9, 2021

On your test case, I see these 2 includes that only exist on MacOS for me ...

#include <AssertMacros.h>

#include <CoreFoundation/CoreFoundation.h>

What do I need to get them on Linux?

@gerickson
Copy link
Author

On your test case, I see these 2 includes that only exist on MacOS for me ...

#include <AssertMacros.h>

#include <CoreFoundation/CoreFoundation.h>

What do I need to get them on Linux?

You can just delete the #include <CoreFoundation/CoreFoundation.h>; it's unneeded. For #include <AssertMacros.h>, those are provided by opencflite.

@bradh352
Copy link
Member

bradh352 commented Nov 9, 2021

Actually, I think I know what is going on here based on your /etc/hosts file. Your code is doing 2 gethostbyname lookups, one for ipv4 and one for ipv6, and the ipv6 one doesn't have an entry for localhost, but it still succeeds due to the required fallback from RFC6761 Section 6.3. But that fallback is overwriting stuff that is already set. Let me push up a change and see if it fixes it for you.

@bradh352
Copy link
Member

bradh352 commented Nov 9, 2021

The /etc/hosts/ file on my debian11 vm has:

127.0.0.1	localhost
127.0.1.1	debian11

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

So I'm assuming most hosts out there are similar where ::1 has a localhost alias which is why this issue didn't come up in our test cases. Please test c25d4eb

@gerickson
Copy link
Author

@bradh352, looks great! Confirmed on the closure. Thanks for the fast turn and the collaboration on debugging this to root cause.

sergepetrenko pushed a commit to tarantool/c-ares that referenced this issue Jul 29, 2022
When an /etc/hosts lookup is performed, but fails with ENOTFOUND, and
a valid RFC6761 Section 6.3 fallback is performed, it could overwrite
variables that were already set and therefore leave the pointers
dangling, never to be cleaned up.

Clean up explicitly on ENOTFOUND when returning from the file parser.

Fixes: c-ares#439
Fix By: Brad House (@bradh352)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants