core/wide-area: fix for CVE-2024-52615#662
Conversation
2d212c2 to
f4b843f
Compare
|
Idea behind changes is straight forward, i.e. to not reuse the same port we have to open new socket every time we send packets to DNS servers. I've ran some basic tests using avahi-resolve so now I am submitting here to get feedback from others as well as CI results. Btw, maybe there is better way to go about this or we can remove wide area feature completely as I think it is very seldomly used. |
|
There is 1 new finding reported by OpenScanHub: https://openscanhub.fedoraproject.org/task/26177/log/added.html. It does not look like a bug at first glance, but you may want to take a look at it. Thanks! |
I don't think it should be removed. It's part of https://www.rfc-editor.org/rfc/rfc6763 and, for example, https://www.rfc-editor.org/rfc/rfc8766.html specifies a network proxy that uses Multicast DNS to automatically populate the wide-area unicast Domain Name System namespace to cover real-world use cases. That being said in its current form it should certainly be off by default and the documentation should probably say that it's experimental (or something like that) and that avahi should talk to a local resolver to offload all the DNS stuff onto it. It doesn't fix all the weird interactions with mDNS though. Ideally it should be made to work properly and be covered with tests (currently it isn't tested at all upstream) and as far as I understand @pemensik was interested in that but given that it isn't trivial I wouldn't expect it to be fixed anytime soon. I'd wait for @pemensik here. Personally I think that with all its limited resources avahi should focus on mDNS first.
I think there are places in the avahi codebase where this pattern is used (historically) and that snippet came from the existing code and replaced the same warning marked as fixed in https://openscanhub.fedoraproject.org/task/26177/log/fixed.html. |
|
Actually, I've received request for the fix of this CVE (in RHEL context) so it would be good to move forward with the upstream inclusion as well. |
|
@pemensik PTAL |
pemensik
left a comment
There was a problem hiding this comment.
Found just small issue with extra close() call so far.
The change getting randomization works, each type query has random source port and waits for the response on it. It is not ideal, since in case of many queries pending, each requires separate socket. But since this is usually used by localhost clients only, I think that is sufficient for now.
|
Ideally there should be somehow limited bank of random descriptors used by the process, which would start reusing already open random socket once we have open (defined) maximum random sockets. Could be 100-200 sockets. We could store array on AvahiWideAreaLookupEngine, from which each lookup would borrow fd, watch and usage counted structure. something like: struct AvahiSharedSocket {
int fd; // initialize to -EBADF
int usage; // initialize to 1 on fd assigning, increment on reuse
AvahiWatch *watch; // watch is created per-socket, not per lookup anyway.
AvahiAddress server; // need to reuse only the same target?
};
struct AvahiWideAreaLookupEngine {
// ...
size_t last_reused_socket;
AvahiSharedSocket random_sockets[AVAHI_WIDE_AREA_RANDSOCK_MAX];
};But this would get complicated with minor details, if sockets should be reused properly. I propose to avoid it unless proven necessary. Clients doing requests are likely localhost and somehow trusted. handle_packet and socket_event seems somehow prepared for sharing multiple lookups on a single socket. After all, originally they shared one socket per address family. |
|
Experiments with strace show quite long time until those random sockets are closed, unless another query triggers cleanup of previous (dead) lookups. It seems to wait over 300 seconds to close those sockets. Especially when it uses add_to_cache() in handle_packet, positive answer were received and none other should arrive (or be processed). It may make sense to close socket already for faster reuse, like in send_to_dns_server() function. Unlike multicast queries, DNS queries should receive exactly one response at most. handle_packet should ensure only at most one callback is emitted for one lookup. Possible network duplicates would be received, but fail to find non-dead lookup. Can prevent sending back ICMP errors if socket is closed too early, but currently those sockets are open too long. But management of living lookups is quite non-obvious, why it works current way. I guess keeping socket open longer than needed is safer now. Ideally socket should be closed 10-200 miliseconds after answer with known lookup id were received from it. But changing that here is risky. lookup_stop might change just callback to null first time and set l->time_event to some short timeout. Then in sender_timeout_callback, it may detect |
|
Hmm, maybe limiting total used outgoing random sockets is more important than I thought at first. It uses not only socket descriptors of the process, but also binds to local unique UDP port. It may prevent other programs running on the same host to run their queries, for example system DNS resolver. The limit of ~32k of emphemeral ports is system-wide, not per process limitation. |
|
Actually, number of UDP sockets used is implicitly limited by number of open file descriptors which is by default (soft-limit) 1024. |
f4b843f to
4e2e1ea
Compare
|
@pemensik What do you think about my latest comment? |
|
Okay, I made a mistake when checking default service file descriptiors. Though they are usually unlimited. But at least this default limit is more than enough to prevent other local services for being restricted by avahi lookups. We do not have to invent more complex logic if we have already relative strict limit on used descriptors. |
|
I did not want to introduce a new simple limit of max lookups, but reusing existing one is kind of okay. Yes, existing implementation should not hit file descriptors limit even with heavy usage of wide-dns. I have not tested how it would respond on socket creation error. It would be great to test it somehow. |
|
I've tried to test it with local named instance and many parallel avahi-resolve invocations. I was unable to get to the limit because local lookups are fast and I started to run into D-Bus related issues before exhausting number of open FDs. After the test number of file descriptors stabilized and returned back to "quiescent" state. |
|
Okay, looks good! Merging it. |
|
@pemensik Packit CI failure should be investigated but it seems unrelated to the proposed change. Here is Fedora Rawhide scratch build w/o any changes which seems to fail with the same error, https://koji.fedoraproject.org/koji/taskinfo?taskID=134148227 |
Changes were addresses, nothing more needed
|
Ah, okay merged finally. Got confused with the UI to let me cancel my own change request. Could not find the correct button. Rawhide failures should be addressed, but this is definitely improving the situation. Thank you! |
It's been failing since April (with FORTIFY_SOURCE): #699 |
|
I added the "merged-but-needs-fixing" label because it introduced #810. That's a really fast way for one unprivileged user to consume all the avahi file descriptors. In practice in most cases |
No description provided.