Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to limit CNAME loops #551

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pemensik
Copy link
Member

Limit number of chained CNAMEs. Do not allow to query also following to cname key, which we already have in cname chains.

Fixes #501

}
return (count >= AVAHI_LOOKUPS_PER_BROWSER_MAX);
}

static void lookup_handle_cname(AvahiSRBLookup *l, AvahiIfIndex interface, AvahiProtocol protocol, AvahiLookupFlags flags, AvahiRecord *r) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code isn't covered by any tests at the moment so it should certainly come with some tests. Then again it's kind of tricky, involves a lot of reference counting and also https://github.com/evverx/avahi-private/issues/3 is related to CNAMEs too. I still think #501 can be moved to the next milestone where it can be fixed without any rush.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bit polished reproducer you have provided. It seems we lack nicer ability to publish cname records. This seems to be simple enough to trigger remotely triggered crashes, so I would like to get it fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be easily triggered remotely by sending unsolicited CNAMEs and it should be fixed of course but it's kind of tricky.

Valgrind reported three use-after-frees and a memory leak as far as I can see:

2024-01-27T10:30:13.0372291Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225== Invalid read of size 8
2024-01-27T10:30:13.0372803Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    at 0x4888595: avahi_interface_monitor_get_hw_interface (iface.c:554)
2024-01-27T10:30:13.0373286Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48884D3: avahi_interface_monitor_get_interface (iface.c:540)
2024-01-27T10:30:13.0373734Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48890DC: avahi_interface_monitor_walk (iface.c:743)
2024-01-27T10:30:13.0374204Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B2C5B: avahi_querier_remove_for_all (querier.c:186)
2024-01-27T10:30:13.0374708Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B1840: lookup_stop (multicast-lookup.c:132)
2024-01-27T10:30:13.0375158Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B18C3: lookup_destroy (multicast-lookup.c:146)
2024-01-27T10:30:13.0375684Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B24C7: avahi_multicast_lookup_engine_free (multicast-lookup.c:345)
2024-01-27T10:30:13.0376097Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488EABD: avahi_server_free (server.c:1564)
2024-01-27T10:30:13.0376476Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x1132C1: run_server (main.c:1312)
2024-01-27T10:30:13.0376824Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x113FCD: main (main.c:1708)
2024-01-27T10:30:13.0377382Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==  Address 0x4f02628 is 8 bytes inside a block of size 64 free'd
2024-01-27T10:30:13.0377957Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
2024-01-27T10:30:13.0378342Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486D010: avahi_free (malloc.c:138)
2024-01-27T10:30:13.0378801Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488841E: avahi_interface_monitor_free (iface.c:528)
2024-01-27T10:30:13.0379212Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488EA16: avahi_server_free (server.c:1551)
2024-01-27T10:30:13.0379585Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x1132C1: run_server (main.c:1312)
2024-01-27T10:30:13.0379940Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x113FCD: main (main.c:1708)
2024-01-27T10:30:13.0380236Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==  Block was alloc'd at
2024-01-27T10:30:13.0380808Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
2024-01-27T10:30:13.0381176Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486CE69: xcalloc (malloc.c:95)
2024-01-27T10:30:13.0381568Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486CF41: avahi_malloc0 (malloc.c:120)
2024-01-27T10:30:13.0382061Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x4886640: avahi_new0_internal (malloc.h:59)
2024-01-27T10:30:13.0382502Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48882BB: avahi_interface_monitor_new (iface.c:495)
2024-01-27T10:30:13.0382907Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488E83B: avahi_server_new (server.c:1515)
2024-01-27T10:30:13.0383278Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x1131AA: run_server (main.c:1265)
2024-01-27T10:30:13.0383624Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x113FCD: main (main.c:1708)
2024-01-27T10:30:13.0383835Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==
2024-01-27T10:30:13.0384138Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225== Invalid read of size 8
2024-01-27T10:30:13.0384582Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    at 0x48AE0C5: entry_get (hashmap.c:59)
2024-01-27T10:30:13.0385019Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48AE5E9: avahi_hashmap_lookup (hashmap.c:119)
2024-01-27T10:30:13.0385520Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48885A7: avahi_interface_monitor_get_hw_interface (iface.c:554)
2024-01-27T10:30:13.0386010Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48884D3: avahi_interface_monitor_get_interface (iface.c:540)
2024-01-27T10:30:13.0386465Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48890DC: avahi_interface_monitor_walk (iface.c:743)
2024-01-27T10:30:13.0386919Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B2C5B: avahi_querier_remove_for_all (querier.c:186)
2024-01-27T10:30:13.0387413Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B1840: lookup_stop (multicast-lookup.c:132)
2024-01-27T10:30:13.0387858Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B18C3: lookup_destroy (multicast-lookup.c:146)
2024-01-27T10:30:13.0388393Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B24C7: avahi_multicast_lookup_engine_free (multicast-lookup.c:345)
2024-01-27T10:30:13.0388806Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488EABD: avahi_server_free (server.c:1564)
2024-01-27T10:30:13.0389176Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x1132C1: run_server (main.c:1312)
2024-01-27T10:30:13.0389529Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x113FCD: main (main.c:1708)
2024-01-27T10:30:13.0390094Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==  Address 0x4f026a0 is 0 bytes inside a block of size 1,024 free'd
2024-01-27T10:30:13.0390660Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
2024-01-27T10:30:13.0391052Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486D010: avahi_free (malloc.c:138)
2024-01-27T10:30:13.0391484Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48AE590: avahi_hashmap_free (hashmap.c:111)
2024-01-27T10:30:13.0391941Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x4888412: avahi_interface_monitor_free (iface.c:526)
2024-01-27T10:30:13.0392351Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488EA16: avahi_server_free (server.c:1551)
2024-01-27T10:30:13.0392720Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x1132C1: run_server (main.c:1312)
2024-01-27T10:30:13.0393074Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x113FCD: main (main.c:1708)
2024-01-27T10:30:13.0393371Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==  Block was alloc'd at
2024-01-27T10:30:13.0393938Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
2024-01-27T10:30:13.0394325Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486CE69: xcalloc (malloc.c:95)
2024-01-27T10:30:13.0394801Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486CF41: avahi_malloc0 (malloc.c:120)
2024-01-27T10:30:13.0395221Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48AE0AA: avahi_new0_internal (malloc.h:59)
2024-01-27T10:30:13.0395629Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48AE4C0: avahi_hashmap_new (hashmap.c:92)
2024-01-27T10:30:13.0396072Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x4888309: avahi_interface_monitor_new (iface.c:500)
2024-01-27T10:30:13.0396501Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488E83B: avahi_server_new (server.c:1515)
2024-01-27T10:30:13.0396875Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x1131AA: run_server (main.c:1265)
2024-01-27T10:30:13.0397221Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x113FCD: main (main.c:1708)
2024-01-27T10:30:13.0397493Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==
2024-01-27T10:30:13.0397797Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225== Invalid read of size 8
2024-01-27T10:30:13.0398180Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    at 0x48AE105: entry_get (hashmap.c:61)
2024-01-27T10:30:13.0398607Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48AE5E9: avahi_hashmap_lookup (hashmap.c:119)
2024-01-27T10:30:13.0399108Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48885A7: avahi_interface_monitor_get_hw_interface (iface.c:554)
2024-01-27T10:30:13.0399605Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48884D3: avahi_interface_monitor_get_interface (iface.c:540)
2024-01-27T10:30:13.0400053Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48890DC: avahi_interface_monitor_walk (iface.c:743)
2024-01-27T10:30:13.0400584Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B2C5B: avahi_querier_remove_for_all (querier.c:186)
2024-01-27T10:30:13.0401017Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B1840: lookup_stop (multicast-lookup.c:132)
2024-01-27T10:30:13.0401452Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B18C3: lookup_destroy (multicast-lookup.c:146)
2024-01-27T10:30:13.0401982Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B24C7: avahi_multicast_lookup_engine_free (multicast-lookup.c:345)
2024-01-27T10:30:13.0402395Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488EABD: avahi_server_free (server.c:1564)
2024-01-27T10:30:13.0402827Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x1132C1: run_server (main.c:1312)
2024-01-27T10:30:13.0403182Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x113FCD: main (main.c:1708)
2024-01-27T10:30:13.0403687Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==  Address 0x4f026d0 is 48 bytes inside a block of size 1,024 free'd
2024-01-27T10:30:13.0404258Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
2024-01-27T10:30:13.0404651Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486D010: avahi_free (malloc.c:138)
2024-01-27T10:30:13.0405069Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48AE590: avahi_hashmap_free (hashmap.c:111)
2024-01-27T10:30:13.0405520Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x4888412: avahi_interface_monitor_free (iface.c:526)
2024-01-27T10:30:13.0405931Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488EA16: avahi_server_free (server.c:1551)
2024-01-27T10:30:13.0406307Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x1132C1: run_server (main.c:1312)
2024-01-27T10:30:13.0406656Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x113FCD: main (main.c:1708)
2024-01-27T10:30:13.0406951Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==  Block was alloc'd at
2024-01-27T10:30:13.0407584Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
2024-01-27T10:30:13.0407952Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486CE69: xcalloc (malloc.c:95)
2024-01-27T10:30:13.0408338Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486CF41: avahi_malloc0 (malloc.c:120)
2024-01-27T10:30:13.0408749Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48AE0AA: avahi_new0_internal (malloc.h:59)
2024-01-27T10:30:13.0409161Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48AE4C0: avahi_hashmap_new (hashmap.c:92)
2024-01-27T10:30:13.0409602Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x4888309: avahi_interface_monitor_new (iface.c:500)
2024-01-27T10:30:13.0409999Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488E83B: avahi_server_new (server.c:1515)
2024-01-27T10:30:13.0410429Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x1131AA: run_server (main.c:1265)
2024-01-27T10:30:13.0410891Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x113FCD: main (main.c:1708)
...
2024-01-27T10:30:13.0416963Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225== HEAP SUMMARY:
2024-01-27T10:30:13.0417360Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==     in use at exit: 1,715 bytes in 29 blocks
2024-01-27T10:30:13.0417892Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==   total heap usage: 371,141 allocs, 371,112 frees, 217,050,956 bytes allocated
2024-01-27T10:30:13.0418086Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==
2024-01-27T10:30:13.0418688Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225== 133 (72 direct, 61 indirect) bytes in 1 blocks are definitely lost in loss record 12 of 17
2024-01-27T10:30:13.0419260Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
2024-01-27T10:30:13.0419626Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486CDC0: xmalloc (malloc.c:68)
2024-01-27T10:30:13.0420090Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x486CEB5: avahi_malloc (malloc.c:107)
2024-01-27T10:30:13.0420539Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x489FB74: avahi_new_internal (malloc.h:50)
2024-01-27T10:30:13.0420924Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x489FDBC: lookup_new (browse.c:99)
2024-01-27T10:30:13.0421303Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48A0CFF: lookup_add (browse.c:371)
2024-01-27T10:30:13.0421722Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48A102F: lookup_handle_cname (browse.c:435)
2024-01-27T10:30:13.0422168Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48A0728: lookup_multicast_callback (browse.c:280)
2024-01-27T10:30:13.0422701Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x48B23A1: avahi_multicast_lookup_engine_notify (multicast-lookup.c:317)
2024-01-27T10:30:13.0423194Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x4896A9C: avahi_cache_update (cache.c:363)
2024-01-27T10:30:13.0423625Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488B960: handle_response_packet (server.c:720)
2024-01-27T10:30:13.0424024Z Jan 27 10:30:12 fv-az1144-454 valgrind[46225]: ==46225==    by 0x488CB7F: dispatch_packet (server.c:1032)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ASan reports got lost. I fixed it in d4c2f40. Now it should be easier to figure out why the ASan/UBSan jobs fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were this change caused my my changes? I think I have freed properly everything, which have been allocated. This seems to be unrelated. Is there simple way to run this on my machine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were this change caused my my changes?

The CI says that the smoke test fails with this patch applied: https://github.com/avahi/avahi/actions/runs/7677998609/job/20929598265?pr=551. Without ASan/UBSan or Valgrind it segfaulted as far as I can see:

Jan 27 14:04:42 fv-az730-940 avahi-daemon[23683]: Leaving mDNS multicast group on interface lo.IPv4 with address 127.0.0.1.
Jan 27 14:04:42 fv-az730-940 systemd[1]: avahi-daemon.service: Main process exited, code=dumped, status=11/SEGV
Jan 27 14:04:42 fv-az730-940 systemd[1]: avahi-daemon.service: Failed with result 'core-dump'.

Is there simple way to run this on my machine?

I think it should be possible to run something like

sudo VALGRIND=true CC=gcc .github/workflows/build.sh build

but it's kind of destructive in that it reinstalls avahi, edits the unit file to make it compatible with valgrind and so on. Another option would be to compile avahi locally, run it under Valgrind and then run

.github/workflows/smoke-tests.sh

@pemensik
Copy link
Member Author

Hmm, this is a tricky one. We need more flexible testing for this. I were able to first crash using your reproducer, then pass it just with error logged.

@pemensik pemensik added bug important High priority labels Jan 27, 2024
evverx added a commit to evverx/avahi that referenced this pull request Jan 27, 2024
When avahi-daemon fails under ASan/UBSan the tests trying to reach it
via D-Bus start to fail too with cryptic error messages and without ASan
reports it's hard to tell what exactly fails.

This patch is prompted by avahi#551 where
the smoke test failed with
```
** (process:23892): WARNING **: 10:26:43.529: Error initializing Avahi: Daemon not running
glib-integration: client.c:626: void avahi_client_free(AvahiClient *): Assertion `client' failed.
```
without any way to figure out what went wrong.

With this patch applied the following backtrace would have been shown:
```
avahi-daemon[23694]: browse.c: Found CNAME loop on interface 2, proto 1, query cname0.local        IN        AAAA
avahi-daemon[23694]: browse.c: Found CNAME loop on interface 2, proto 1, query cname0.local        IN        AAAA
avahi-daemon[23694]: =================================================================
avahi-daemon[23694]: ==23694==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b000000f70 at pc 0x7f5aac154542 bp 0x7ffe59141be0 sp 0x7ffe59141bd8
avahi-daemon[23694]: READ of size 4 at 0x60b000000f70 thread T0
avahi-daemon[23694]:     #0 0x7f5aac154541 in lookup_multicast_callback /home/runner/work/avahi/avahi/avahi-core/browse.c:268:12
avahi-daemon[23694]:     #1 0x7f5aac1bfa0a in avahi_multicast_lookup_engine_notify /home/runner/work/avahi/avahi/avahi-core/multicast-lookup.c:317:21
avahi-daemon[23694]:     #2 0x7f5aac115808 in avahi_cache_update /home/runner/work/avahi/avahi/avahi-core/cache.c:363:13
avahi-daemon[23694]:     #3 0x7f5aac0e9621 in handle_response_packet /home/runner/work/avahi/avahi/avahi-core/server.c:720:21
avahi-daemon[23694]:     #4 0x7f5aac0e3cf6 in dispatch_packet /home/runner/work/avahi/avahi/avahi-core/server.c:1032:9
avahi-daemon[23694]:     #5 0x7f5aac0e2116 in mcast_socket_event /home/runner/work/avahi/avahi/avahi-core/server.c:1093:13
avahi-daemon[23694]:     #6 0x7f5aac464b6c in avahi_simple_poll_dispatch /home/runner/work/avahi/avahi/avahi-common/simple-watch.c:585:13
avahi-daemon[23694]:     #7 0x7f5aac4651a8 in avahi_simple_poll_iterate /home/runner/work/avahi/avahi/avahi-common/simple-watch.c:605:14
avahi-daemon[23694]:     #8 0x5592a3ed3884 in run_server /home/runner/work/avahi/avahi/avahi-daemon/main.c:1279:18
avahi-daemon[23694]:     #9 0x5592a3ec4132 in main /home/runner/work/avahi/avahi/avahi-daemon/main.c:1708:13
avahi-daemon[23694]:     #10 0x7f5aabc29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
avahi-daemon[23694]:     #11 0x7f5aabc29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
avahi-daemon[23694]:     #12 0x5592a3e05054 in _start (/usr/sbin/avahi-daemon+0x71054) (BuildId: 0aa9e5ea43ef010d5f42e9109eabd1434ff1b3db)
...
```
evverx added a commit to evverx/avahi that referenced this pull request Jan 27, 2024
When avahi-daemon fails under ASan/UBSan the tests trying to reach it
via D-Bus start to fail too with cryptic error messages and without ASan
reports it's hard to tell what exactly fails.

This patch is prompted by avahi#551 where
the smoke test failed with
```
** (process:23892): WARNING **: 10:26:43.529: Error initializing Avahi: Daemon not running
glib-integration: client.c:626: void avahi_client_free(AvahiClient *): Assertion `client' failed.
```
without any way to figure out what went wrong.

With this patch applied the following backtrace would have been shown:
```
avahi-daemon[23694]: browse.c: Found CNAME loop on interface 2, proto 1, query cname0.local        IN        AAAA
avahi-daemon[23694]: browse.c: Found CNAME loop on interface 2, proto 1, query cname0.local        IN        AAAA
avahi-daemon[23694]: =================================================================
avahi-daemon[23694]: ==23694==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b000000f70 at pc 0x7f5aac154542 bp 0x7ffe59141be0 sp 0x7ffe59141bd8
avahi-daemon[23694]: READ of size 4 at 0x60b000000f70 thread T0
avahi-daemon[23694]:     #0 0x7f5aac154541 in lookup_multicast_callback /home/runner/work/avahi/avahi/avahi-core/browse.c:268:12
avahi-daemon[23694]:     #1 0x7f5aac1bfa0a in avahi_multicast_lookup_engine_notify /home/runner/work/avahi/avahi/avahi-core/multicast-lookup.c:317:21
avahi-daemon[23694]:     #2 0x7f5aac115808 in avahi_cache_update /home/runner/work/avahi/avahi/avahi-core/cache.c:363:13
avahi-daemon[23694]:     #3 0x7f5aac0e9621 in handle_response_packet /home/runner/work/avahi/avahi/avahi-core/server.c:720:21
avahi-daemon[23694]:     #4 0x7f5aac0e3cf6 in dispatch_packet /home/runner/work/avahi/avahi/avahi-core/server.c:1032:9
avahi-daemon[23694]:     #5 0x7f5aac0e2116 in mcast_socket_event /home/runner/work/avahi/avahi/avahi-core/server.c:1093:13
avahi-daemon[23694]:     #6 0x7f5aac464b6c in avahi_simple_poll_dispatch /home/runner/work/avahi/avahi/avahi-common/simple-watch.c:585:13
avahi-daemon[23694]:     #7 0x7f5aac4651a8 in avahi_simple_poll_iterate /home/runner/work/avahi/avahi/avahi-common/simple-watch.c:605:14
avahi-daemon[23694]:     #8 0x5592a3ed3884 in run_server /home/runner/work/avahi/avahi/avahi-daemon/main.c:1279:18
avahi-daemon[23694]:     #9 0x5592a3ec4132 in main /home/runner/work/avahi/avahi/avahi-daemon/main.c:1708:13
avahi-daemon[23694]:     #10 0x7f5aabc29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
avahi-daemon[23694]:     #11 0x7f5aabc29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
avahi-daemon[23694]:     #12 0x5592a3e05054 in _start (/usr/sbin/avahi-daemon+0x71054) (BuildId: 0aa9e5ea43ef010d5f42e9109eabd1434ff1b3db)
...
```
evverx added a commit that referenced this pull request Jan 27, 2024
When avahi-daemon fails under ASan/UBSan the tests trying to reach it
via D-Bus start to fail too with cryptic error messages and without ASan
reports it's hard to tell what exactly fails.

This patch is prompted by #551 where
the smoke test failed with
```
** (process:23892): WARNING **: 10:26:43.529: Error initializing Avahi: Daemon not running
glib-integration: client.c:626: void avahi_client_free(AvahiClient *): Assertion `client' failed.
```
without any way to figure out what went wrong.

With this patch applied the following backtrace would have been shown:
```
avahi-daemon[23694]: browse.c: Found CNAME loop on interface 2, proto 1, query cname0.local        IN        AAAA
avahi-daemon[23694]: browse.c: Found CNAME loop on interface 2, proto 1, query cname0.local        IN        AAAA
avahi-daemon[23694]: =================================================================
avahi-daemon[23694]: ==23694==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b000000f70 at pc 0x7f5aac154542 bp 0x7ffe59141be0 sp 0x7ffe59141bd8
avahi-daemon[23694]: READ of size 4 at 0x60b000000f70 thread T0
avahi-daemon[23694]:     #0 0x7f5aac154541 in lookup_multicast_callback /home/runner/work/avahi/avahi/avahi-core/browse.c:268:12
avahi-daemon[23694]:     #1 0x7f5aac1bfa0a in avahi_multicast_lookup_engine_notify /home/runner/work/avahi/avahi/avahi-core/multicast-lookup.c:317:21
avahi-daemon[23694]:     #2 0x7f5aac115808 in avahi_cache_update /home/runner/work/avahi/avahi/avahi-core/cache.c:363:13
avahi-daemon[23694]:     #3 0x7f5aac0e9621 in handle_response_packet /home/runner/work/avahi/avahi/avahi-core/server.c:720:21
avahi-daemon[23694]:     #4 0x7f5aac0e3cf6 in dispatch_packet /home/runner/work/avahi/avahi/avahi-core/server.c:1032:9
avahi-daemon[23694]:     #5 0x7f5aac0e2116 in mcast_socket_event /home/runner/work/avahi/avahi/avahi-core/server.c:1093:13
avahi-daemon[23694]:     #6 0x7f5aac464b6c in avahi_simple_poll_dispatch /home/runner/work/avahi/avahi/avahi-common/simple-watch.c:585:13
avahi-daemon[23694]:     #7 0x7f5aac4651a8 in avahi_simple_poll_iterate /home/runner/work/avahi/avahi/avahi-common/simple-watch.c:605:14
avahi-daemon[23694]:     #8 0x5592a3ed3884 in run_server /home/runner/work/avahi/avahi/avahi-daemon/main.c:1279:18
avahi-daemon[23694]:     #9 0x5592a3ec4132 in main /home/runner/work/avahi/avahi/avahi-daemon/main.c:1708:13
avahi-daemon[23694]:     #10 0x7f5aabc29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
avahi-daemon[23694]:     #11 0x7f5aabc29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
avahi-daemon[23694]:     #12 0x5592a3e05054 in _start (/usr/sbin/avahi-daemon+0x71054) (BuildId: 0aa9e5ea43ef010d5f42e9109eabd1434ff1b3db)
...
```
@evverx
Copy link
Member

evverx commented Jan 27, 2024

We need more flexible testing for this

Agreed but I think the coverage job on the CI should be brought back first. Without coverage reports it's kind of hard to add new tests because it's hard to figure out what exactly tests cover and what's missing.

@pemensik pemensik force-pushed the 501-cname-loops branch 2 times, most recently from 43149ae to 3559ade Compare January 27, 2024 16:26
@pemensik
Copy link
Member Author

Oh my. I am unable to find a correct way, how to abort a lookup in case of error. Calling callback with failure does not seem to be correct way, but I haven't found in the code what should be done in case of incorrect/bogus response.

@pemensik
Copy link
Member Author

Ah, finally all tests have passed. Errors are handled just by waiting for a timeout. There does not seem to be supported way to abort browsing earlier.

@pemensik pemensik changed the title WIP: Attempt to limit CNAME loops Attempt to limit CNAME loops Jan 27, 2024
@pemensik pemensik added this to the v0.9 milestone Jan 29, 2024
@evverx
Copy link
Member

evverx commented Jan 29, 2024

To judge from https://coveralls.io/builds/65341393/source?filename=avahi-core%2Fbrowse.c#L95 and https://coveralls.io/builds/65341393/source?filename=avahi-core%2Fbrowse.c#L457 the coverage isn't ideal so I'll go ahead and add the "needs-test" label. The performance optimization where the same queries coalesce and where this bug came from originally isn't covered either.

All in all I think the testing infrastructure should be up and running first. This issue has been publicly known since 2016 so I'm not sure why it should be rushed. It can be moved to the next milestone.

@pemensik
Copy link
Member Author

If we have even imperfect mitigation not crashing the process, then that is what we want in upcoming release. Its about bug fixes, this would be one of them. I think it is better to have not complete test coverage but fixed the known and reproducible way to trigger it, than leaving it open completely. We might always make new issue for improving the coverage for next releases.

@evverx
Copy link
Member

evverx commented Feb 12, 2024

I think it is better to have not complete test coverage but fixed the known and reproducible way to trigger it, than leaving it open completely

The coverage should make sure that the bug is actually fixed and more importantly it should help to make sure that new bugs aren't introduced. It isn't ideal that avahi crashes of course but replacing crashes with memory corruptions isn't better either. I'm concerned that the part of the code where CNAMES were handled is essentially dead now and it's necessary to pepper the code with the NULL assignments.

@evverx
Copy link
Member

evverx commented Feb 13, 2024

Just to clarify I'm not saying that tests should be necessarily added in this PR. What I'm saying is that personally I didn't have the time to review this PR thoroughly. I didn't poke the corner cases either. The "needs-tests" label just shows that in this particular case the smoke tests aren't enough. If you poked the corner cases downstream it could probably suffice too. Then again given that 0.9-rc1 is released I don't think it should receive patches like this on top and be included in the final release without being in "rc" releases.

FWIW #286 can be added on top of this PR to make it easier to expose those code paths.

Copy link
Member

@evverx evverx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno. I built avahi with this patch applied and for some reason packets with non-recursive CNAMEs trigger:

browse.c: Found CNAME loop on interface 3, proto 0, query X.local       IN      CNAME
browse.c: Found CNAME loop on interface 3, proto 0, query X.local       IN      CNAME

Those packets came with CNAMEs pointing X.local to H.local

    Questions: 0
    Answer RRs: 1
    Authority RRs: 0
    Additional RRs: 0
    Answers
        X.local: type CNAME, class IN, cname H.local
            Name: X.local
            Type: CNAME (Canonical NAME for an alias) (5)
            .000 0000 0000 0001 = Class: IN (0x0001)
            0... .... .... .... = Cache flush: False
            Time to live: 10 (10 seconds)
            Data length: 9
            CNAME: H.local
    [Unsolicited: True]

I think something is broken somewhere. Let's put it on hold. I don't think it's ready.

@pemensik
Copy link
Member Author

Do you think this would cause more issues than it solves? It depends on what H.local were pointing to.

I am starting to think whether we want to follow CNAMEs on mDNS at all. Should we instead to ignore CNAME completely? Do we know any application or service, which uses this for something actually useful? SRV services records are kind of CNAME alternatives anyway.

@pemensik
Copy link
Member Author

Can I ask Michal @msekletar to look at this too?

@evverx
Copy link
Member

evverx commented Mar 28, 2024

Should we instead to ignore CNAME completely?

I don't think CNAMEs should be ignored completely. They should be handled properly though. https://datatracker.ietf.org/doc/html/rfc6762#section-16 explicitly mentions one use case

Where automatic equivalences are desired, this may be
achieved through the use of programmatically generated CNAME records.
For example, if a responder has an address record for an accented
name Y, and a querier issues a query for a name X, where X is the
same as Y with all the accents removed, then the responder may issue
a response containing two resource records: a CNAME record "X CNAME
Y", asserting that the requested name X (unaccented) is an alias for
the true (accented) name Y, followed by the address record for Y

so it's assumed that queriers can follow CNAMEs.

Do we know any application or service, which uses this for something actually useful?

There are at least two scripts I'm aware of that make it easier to publish CNAMEs to create aliases. There are features requests like #319 too.

I haven't tested this patch extensively yet so I can't say whether it works with all the corner cases where queries are restarted for example. To me it seems that the patch gets the issue around without addressing the underlying issue. The underlying issue here boils down to the fact that n_lookups doesn't keep track of references so

    if (b->n_lookups >= AVAHI_LOOKUPS_PER_BROWSER_MAX)
        /* We don't like cyclic CNAMEs */

is never reached in this particular case. It should look at the "ref" part to count things properly. All those "NULL"s indicate that the memory management part goes sideways too. Maybe it's OK to backport this downstream but I'm not sure it's suitable upstream.

Limit number of chained CNAMEs. Do not allow to query also following to
cname key, which we already have in cname chains.

Add ability to report failure for broken CNAME chains.

Fixes avahi#501
We are missing nice API to publish CNAME records. But test direct and
indirect loop detection in client-test.
Valgrind has revealed some internals freeing can walk over interfaces,
which were already freed.

Ensure freed structures are resetted to NULL once freed. If anything
still tries to touch them, ensure it asserts.
@@ -272,35 +279,36 @@ int main (AVAHI_GCC_UNUSED int argc, AVAHI_GCC_UNUSED char *argv[]) {
assert(r >= 0);
assert(avahi_string_list_serialize(txt, NULL, 0) == sizeof(rdata));
error = avahi_entry_group_add_service_strlst(group, AVAHI_IF_UNSPEC, AVAHI_PROTO_UNSPEC, 0, "TestX", "_qotd._tcp", NULL, NULL, 123, txt);
assert(error == AVAHI_ERR_INVALID_RECORD);
//assert(error == AVAHI_ERR_INVALID_RECORD);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this been commented out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing it's because it was initially used to reproduce the CNAME bug locally by editing the line above. When the avahi crash was gone and the other tests were added it was probably left here accidentally.

(This PR isn't mergeable currently so there are various things that probably came from debugging sessions and need fixing)

@evverx evverx removed this from the v0.9 milestone May 28, 2024
@evverx evverx marked this pull request as draft May 28, 2024 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uncontrolled recursion in lookup_handle_cname caused by recursive CNAMEs
3 participants