Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D-Bus API Race #9

Closed
lathiat opened this issue Jul 16, 2015 · 26 comments
Closed

D-Bus API Race #9

lathiat opened this issue Jul 16, 2015 · 26 comments
Milestone

Comments

@lathiat
Copy link
Contributor

lathiat commented Jul 16, 2015

from #gnome-hackers/hadess

The current D-BUS API requires that you create an object, which then immediately starts returning signals.

The C Client appears to subscribe to all signals up front, however the Python API and for example frameworks like GDBUS would have you call 'connect_to_signal' once the object is created and the path is known. Signals might arrive in this time frame.

Investigate further if this actually occurs and what possible fix there might be, other than an obvious Start method. Does d-bus have some way around this, what do others do?

Example in avahi-python/avahi-bookmarks.in:
browser = dbus.Interface(self.bus.get_object(avahi.DBUS_NAME, self.server.ServiceBrowserNew(avahi.IF_UNSPEC, avahi.PROTO_UNSPEC, stype, domain, dbus.UInt32(0))), avahi.DBUS_INTERFACE_SERVICE_BROWSER)
browser.connect_to_signal('ItemNew', self.new_service)
browser.connect_to_signal('ItemRemove', self.remove_service)

@smcv
Copy link
Contributor

smcv commented Jul 24, 2015

If there is no method call that can be used to "catch up", then this can never be reliable with proxy objects (dbus-python's get_object(), GDBus's GDBusProxy, etc.), only with lower-level APIs in which a client adds a non-specific signal match before creating any service browsers.

The conventional way to resolve this in other D-Bus APIs is a GetItems() method that can be used to "catch up".

@lathiat
Copy link
Contributor Author

lathiat commented Jul 24, 2015

Thanks for the comment, I was hoping for info on the conventional method to solve that.

@smcv
Copy link
Contributor

smcv commented Jul 24, 2015

See also: the thread starting at https://mail.gnome.org/archives/gtk-devel-list/2015-July/msg00019.html

@rogerjames99
Copy link

I started the thread over at gnome. You can find my workaround here at

https://github.com/rogerjames99/unicornemu/blob/master/avahibrowser.py

@lathiat
Copy link
Contributor Author

lathiat commented Jul 27, 2015

calling GetItems seems to also be racy, as we will never know precisely which was the first signal they received. Some kind of .Start() would be ideal but that would break backwards compatibility.

@smcv
Copy link
Contributor

smcv commented Jul 27, 2015

The idea is that clients follow this sequence:

  • connect to signal(s)
  • call state-recovery method

which cannot miss any items, but might duplicate them: the worst that can happen is that they receive some signals after calling the state-recovery method, but before it returns.

Duplicate items can be addressed by having the client specifically ignore duplicates, or by having an "initialized" flag as described in https://mail.gnome.org/archives/gtk-devel-list/2015-July/msg00021.html.

@rogerjames99
Copy link

I had quick look at the code for the for the use case I have been looking at only. i.e. Creating a Server proxy then calling its ServiceBrowserNew method. In this case after sending back the response to this method the server will always send zero or more ItemNew messages followed by an AllowForNow message. So adding a catch up method would not break backwards compatibility for this case, provided the catch up also broadcast an AllForNow message. A new client implementation can synchronise by not finalising its local cache until it receives an AllForNow message after the catch up call.

I have not looked at the other race mentioned above.

@rogerjames99
Copy link

Just a thought. If the catch up broadcast an AllForNow at the start of the catch up sequence that would make client implementation even easier. As old clients that get all the messages should treat two consecutive AllForNow messages as an empty cache.

@smcv
Copy link
Contributor

smcv commented Jul 27, 2015

Suppose my network contains services called alpha, beta and gamma, and Avahi additionally detects delta after it has signalled AllForNow. <- X -> denotes a signal and <= X or X => denotes a method call or reply, Avahi is on the left, and time increases downwards.

The conventional fix for this sort of issue (in pseudocode) is like this:

                                       <= ServiceBrowserNew()
ServiceBrowserNew reply: /foo =>
<- ItemNew(alpha) ->
                                       <= GetItems
<- ItemNew(beta) ->
GetItems reply: alpha, beta =>
<- ItemNew(gamma) ->
<- AllForNow() ->
<- ItemNew(delta) ->

What @rogerjames99 seems to be suggesting something more like this. This would work, but is by no means conventional:

                                       <= ServiceBrowserNew()
ServiceBrowserNew reply: /foo =>
<- ItemNew(alpha) ->
                                       <= RepeatSignals()
<- ItemNew(beta) ->
RepeatSignals reply: (nothing) =>
<- ItemNew(alpha) ->
<- ItemNew(beta) ->
<- ItemNew(gamma) ->
<- AllForNow() ->
<- ItemNew(delta) ->

@smcv
Copy link
Contributor

smcv commented Jul 27, 2015

Another way to think about the conventional solution is this design principle: first write some methods, so that at any time, it's possible to retrieve the state of the object by calling its methods ("state recovery"); then add signals to provide change notification for the results of those methods (ideally, designing them so that the number of times the methods need to be called is minimized).

In the case of Avahi, this probably means that GetItems() should actually return a boolean for whether AllForNow() has been emitted yet, in addition to the actual list of items.

@rogerjames99
Copy link

A small correction to @smvc analysis. DBUS is seqential so what I expect to see is this:

                                                             <= ServiceBrowserNew()
ServiceBrowserNew reply: /foo =>
<- ItemNew(alpha) ->*
                                                             <= RepeatSignals()
<- ItemNew(beta) ->*
<- ItemNew(gamma) ->*
<- AllForNow() ->*
RepeatSignals reply: (nothing) =>
<- ItemNew(alpha) ->
<- ItemNew(beta) ->
<- ItemNew(gamma) ->
<- AllForNow() ->
<- ItemNew(delta) ->

or this:

                                                             <= ServiceBrowserNew()
ServiceBrowserNew reply: /foo =>
<- ItemNew(alpha) ->*
<- ItemNew(beta) ->*
<- ItemNew(gamma) ->*
<- AllForNow() ->*
<- ItemNew(delta) ->*
                                                             <= RepeatSignals()
RepeatSignals reply: (nothing) =>
<- ItemNew(alpha) ->
<- ItemNew(beta) ->
<- ItemNew(gamma) ->
<- ItemNew(delta) ->
<- AllForNow() ->

Depending on when the server sees the RepeatSignals(a.k.a CatchUp) request. This signals I have marked with an asterisk can potentially be missed by the client.

On further reflection I think the server should mark the end of the re-broadcast signals with a <- CacheExhausted -> signal not an <- AllForNow ->. So the end result is something like this:

                                                             <= ServiceBrowserNew()
ServiceBrowserNew reply: /foo =>
<- ItemNew(alpha) ->*
<- ItemNew(beta) ->*
<- ItemNew(gamma) ->*
<- AllForNow() ->*
<- ItemNew(delta) ->*
                                                             <= CatchUp()
CatchUp reply: (nothing) =>
<- ItemNew(alpha) ->
<- ItemNew(beta) ->
<- ItemNew(gamma) ->
<- ItemNew(delta) ->
<- CacheExhausted() ->

I will leave it to you guys who are actually familiar with this to decide whether this will break existing clients.

Roger

@rogerjames99
Copy link

One last off topic question. Is this now the official repository for avahi? Ubuntu is still citing git://git.0pointer.net/avahi.git as the upstream source.

@smcv
Copy link
Contributor

smcv commented Jul 28, 2015

I still think the version with GetItems() is better than having a RepeatSignals() method (or whatever you want to call the method - it's the meaning that matters).

The approach with GetItems() is certainly more conventional: for instance, the standard D-Bus Properties interface works that way (state-recovery via GetAll() and change-notification via PropertiesChanged).

@rogerjames99
Copy link

Either way is fine with me :-)

Maybe we should call the method Gnomic ;-)

Definition of gnomic in English: adjective. Expressed in or of the nature of short, pithy maxims or aphorisms: 'that most gnomic form, the aphorism'

Roger

@soemraws
Copy link

soemraws commented Sep 8, 2017

I was just wondering if the following I encountered is related to this race condition.

If I create a new ServiceBrowser and immediately register a handler to the signal "ItemNew", I generally do find the services I was expecting. However, if I now unregister the handler and call "Free" on the ServiceBrowser, then create a new ServiceBrowser and re-register a handler to the signal, I never (well after many tries) find the services.

I would expect that when freeing the ServiceBrowser, everything related to it is destroyed. Thus when creating a new ServiceBrowser, we are starting from a "clean slate" again and the original race condition would appear, i.e. I would generally find the service. However, it seems that it's a sure thing that I won't find it.

If it is unrelated, I can open a new issue.

Thanks,
Sumant

@hadess
Copy link
Contributor

hadess commented Sep 11, 2017

It's the same bug, but it's rightfully not behaving as you expect. Avahi, to speed things up, will cache responses to queries. So once you're done looking up a service, it will be quick to look up the second time. That's completely expected.

@soemraws
Copy link

Just to be sure I understand correctly, you're saying that due to the caching, in the second case we're sure to find nothing, because the signals are sent out (from the cache) immediately when the browser is instantiated, thus before the handler for the signals is registered?

I've been trying to figure out how to work around this, e.g. by registering the handlers before instantiating the browser. However, it seems that dbus needs the path (of the browser) when registering. Is there any other workaround?

@smcv
Copy link
Contributor

smcv commented Sep 11, 2017

However, it seems that dbus needs the path (of the browser) when registering. Is there any other workaround?

High-level (object-oriented) D-Bus APIs like GDBus' GDBusProxy and dbus-python's dbus.proxies.* need this. There is not going to be a workaround while using high-level D-Bus APIs. You will have to use low-level (message-oriented) APIs.

Quoting myself from https://mail.gnome.org/archives/gtk-devel-list/2015-July/msg00021.html:

Gio.DBusProxy cannot represent this API
successfully, and you will have to use lower-level functions like
GIO.DBusConnection.signal_subscribe to subscribe to the signals in
general, before you call ServiceBrowserNew; and then in the signal
callback, use the sending object path of the signals to associate them
with a particular service browser object.

@soemraws
Copy link

soemraws commented Sep 12, 2017

If there is not going to be a work-around for high-level D-Bus APIs, at least make the documentation consistent. Quoting from the (online) Avahi documentation:

the D-Bus API: an extensive D-Bus interface for browsing and registering mDNS/DNS-SD services using avahi-daemon. We recommend using this API for software written in any language other than C (e.g. Python).

There are many other languages that only have the object-oriented D-Bus interface. And I would hardly call the D-Bus interface extensive if it cannot work with the high-level (object-oriented) API.

In my opinion, a "Start" method would solve this issue and meet the promise and recommendation in the documentation.

Or, you know, just don't say it's extensive and only works with low-level D-Bus APIs.

@smcv
Copy link
Contributor

smcv commented Sep 12, 2017

I am not an Avahi developer, and even if I was, I couldn't fix the API of already-deployed instances of Avahi.

If you want to contribute a fixed D-Bus API to Avahi so that high-level D-Bus APIs can interact with it, #9 (comment) describes how I would recommend doing so.

Otherwise, this bug ("Avahi's D-Bus API doesn't work with high-level D-Bus APIs") remains open, and the workaround for that bug that can be used outside Avahi is to use low-level D-Bus APIs instead.

@soemraws
Copy link

Understood, I didn't mean you personally. I wouldn't mind contributing, if the low level approach you suggested didn't work. But it seems to work very well, so I'll be using that in my code instead. Seems like this will remain open.

Thanks.

@SilaUser
Copy link

Are there any intentions to fix this bug in the near future. Maybe by enhancing the DBus API with one of the proposed repeat() or getItems() functions.

@hadess
Copy link
Contributor

hadess commented Mar 22, 2018

If somebody has a test already written, that would help. I have some code written, completely untested, but having a test case would help me avoid having to write one myself.

Python or C using GDbus would be best.

@SilaUser
Copy link

Maybe Python Script works as a short test for ServiceBrowser. It demonstrates the race condition in my network.

@ghost ghost mentioned this issue Mar 26, 2018
@ghost
Copy link

ghost commented Apr 3, 2018

Is this repository abandoned @lathiat? There are lots of old pull requests without any reaction by the maintainer and the last bigger change happened in july 2017.

kdesysadmin pushed a commit to KDE/kdnssd that referenced this issue Nov 21, 2018
Summary:
avahi/avahi#9

Avahi has upstream signal races which can have any number of side effect
bugs on our end ranging from unreliable discovery, over missing servers
in kio slave listings, to outright deadlocking when waiting for a result
which has already raced passed us (specifically RemoteService is vulnerable
to that).

When we make a request to avahi it will immediately start sending
discovery signals even when we haven't even connected to the signals yet.
This means any number of signals may be lost in the time between our
request and us connecting to the signals.
As this applies to (all?) in-the-wild versions of avahi even if we got an
upstream fix today it'd not help the user until they get the new avahi
integrated.

To prevent us from losing the race all dbus signal listening is now done
via blanket connects.
Ordinarily we'd make a request, dbus would give us an object path, we'd
connect to the object path's signals and then get signals for our request.
Since the signals are racey we'll instead connect to all signals of a given
interface (e.g. org.freedesktop.Avahi.DomainBrowser) completely ignoring
the object path. We'll then "locally" filter all signals which belong to
the specific DomainBrowser object as identified by the signals' dbus object
path. This eliminates all risk of racing. Before we make the request
we setup our blanket connections and the slots won't get called until after
the event loop returns, which won't be until we've set "our" dbus object
path for filtering later in the relevant functions.

The obvious downsides of this are:

- boilerplatey code for handling this stuff
- runtime checked signal-slot compatibility
- much more traffic going into our slots (doesn't actually end up doing
  any heavy lifting)

Test Plan: - wrote a crappy test app to use all browser -> still browse (also service resolves)

Reviewers: mdawson, broulik, #frameworks

Reviewed By: broulik

Subscribers: kde-frameworks-devel

Tags: #frameworks

Differential Revision: https://phabricator.kde.org/D16298
@lathiat
Copy link
Contributor Author

lathiat commented Feb 18, 2020

This has now been solved in 0.8 by #175

@lathiat lathiat closed this as completed Feb 18, 2020
@lathiat lathiat added this to the v0.8 milestone Feb 18, 2020
evverx referenced this issue in evverx/avahi Sep 19, 2023
Fixes:
```
==93410==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f9e76f14c16 at pc 0x00000047208d bp 0x7ffee90a6a00 sp 0x7ffee90a61c8
READ of size 1110 at 0x7f9e76f14c16 thread T0
    #0 0x47208c in __interceptor_strlen (out/fuzz-domain+0x47208c) (BuildId: 731b20c1eef22c2104e75a6496a399b10cfc7cba)
    #1 0x534eb0 in avahi_strdup avahi/avahi-common/malloc.c:167:12
    #2 0x53862c in avahi_normalize_name_strdup avahi/avahi-common/domain.c:226:12
```
and
```
fuzz-domain: fuzz/fuzz-domain.c:38: int LLVMFuzzerTestOneInput(const uint8_t *, size_t): Assertion `avahi_domain_equal(s, t)' failed.
==101571== ERROR: libFuzzer: deadly signal
    #0 0x501175 in __sanitizer_print_stack_trace (/home/vagrant/avahi/out/fuzz-domain+0x501175) (BuildId: 682bf6400aff9d41b64b6e2cc3ef5ad600216ea8)
    #1 0x45ad2c in fuzzer::PrintStackTrace() (/home/vagrant/avahi/out/fuzz-domain+0x45ad2c) (BuildId: 682bf6400aff9d41b64b6e2cc3ef5ad600216ea8)
    #2 0x43fc07 in fuzzer::Fuzzer::CrashCallback() (/home/vagrant/avahi/out/fuzz-domain+0x43fc07) (BuildId: 682bf6400aff9d41b64b6e2cc3ef5ad600216ea8)
    #3 0x7f1581d7ebaf  (/lib64/libc.so.6+0x3dbaf) (BuildId: c9f62793b9e886eb1b95077d4f26fe2b4aa1ac25)
    #4 0x7f1581dcf883 in __pthread_kill_implementation (/lib64/libc.so.6+0x8e883) (BuildId: c9f62793b9e886eb1b95077d4f26fe2b4aa1ac25)
    #5 0x7f1581d7eafd in gsignal (/lib64/libc.so.6+0x3dafd) (BuildId: c9f62793b9e886eb1b95077d4f26fe2b4aa1ac25)
    #6 0x7f1581d6787e in abort (/lib64/libc.so.6+0x2687e) (BuildId: c9f62793b9e886eb1b95077d4f26fe2b4aa1ac25)
    #7 0x7f1581d6779a in __assert_fail_base.cold (/lib64/libc.so.6+0x2679a) (BuildId: c9f62793b9e886eb1b95077d4f26fe2b4aa1ac25)
    #8 0x7f1581d77186 in __assert_fail (/lib64/libc.so.6+0x36186) (BuildId: c9f62793b9e886eb1b95077d4f26fe2b4aa1ac25)
    #9 0x5344a4 in LLVMFuzzerTestOneInput /home/vagrant/avahi/fuzz/fuzz-domain.c:38:9
```

It's a follow-up to 94cb648
evverx referenced this issue in evverx/avahi Oct 22, 2023
It fixes the crash spotted
avahi#490 (comment).
The fuzz target was updated to exercise those code paths (among other
things). Without this commit it crashes with
```
fuzz-consume-record: malloc.c:250: void *avahi_memdup(const void *, size_t): Assertion `s' failed.
==72869== ERROR: libFuzzer: deadly signal
    #0 0x5031b5 in __sanitizer_print_stack_trace (avahi/out/fuzz-consume-record+0x5031b5) (BuildId: 69840d811c9ba9f74eea21e34786a2005c5dcc06)
    #1 0x45cd6c in fuzzer::PrintStackTrace() (avahi/out/fuzz-consume-record+0x45cd6c) (BuildId: 69840d811c9ba9f74eea21e34786a2005c5dcc06)
    #2 0x441c47 in fuzzer::Fuzzer::CrashCallback() (out/fuzz-consume-record+0x441c47) (BuildId: 69840d811c9ba9f74eea21e34786a2005c5dcc06)
    #3 0x7f189e97ebaf  (/lib64/libc.so.6+0x3dbaf) (BuildId: 3ebe8d97a0ed3e1f13476a02665c5a9442adcd78)
    #4 0x7f189e9cf883 in __pthread_kill_implementation (/lib64/libc.so.6+0x8e883) (BuildId: 3ebe8d97a0ed3e1f13476a02665c5a9442adcd78)
    #5 0x7f189e97eafd in gsignal (/lib64/libc.so.6+0x3dafd) (BuildId: 3ebe8d97a0ed3e1f13476a02665c5a9442adcd78)
    #6 0x7f189e96787e in abort (/lib64/libc.so.6+0x2687e) (BuildId: 3ebe8d97a0ed3e1f13476a02665c5a9442adcd78)
    #7 0x7f189e96779a in __assert_fail_base.cold (/lib64/libc.so.6+0x2679a) (BuildId: 3ebe8d97a0ed3e1f13476a02665c5a9442adcd78)
    #8 0x7f189e977186 in __assert_fail (/lib64/libc.so.6+0x36186) (BuildId: 3ebe8d97a0ed3e1f13476a02665c5a9442adcd78)
    #9 0x557bfc in avahi_memdup avahi/avahi-common/malloc.c:250:5
    #10 0x54895c in avahi_record_copy avahi/avahi-core/rr.c:469:45
```
evverx added a commit that referenced this issue Jan 27, 2024
When avahi-daemon fails under ASan/UBSan the tests trying to reach it
via D-Bus start to fail too with cryptic error messages and without ASan
reports it's hard to tell what exactly fails.

This patch is prompted by #551 where
the smoke test failed with
```
** (process:23892): WARNING **: 10:26:43.529: Error initializing Avahi: Daemon not running
glib-integration: client.c:626: void avahi_client_free(AvahiClient *): Assertion `client' failed.
```
without any way to figure out what went wrong.

With this patch applied the following backtrace would have been shown:
```
avahi-daemon[23694]: browse.c: Found CNAME loop on interface 2, proto 1, query cname0.local        IN        AAAA
avahi-daemon[23694]: browse.c: Found CNAME loop on interface 2, proto 1, query cname0.local        IN        AAAA
avahi-daemon[23694]: =================================================================
avahi-daemon[23694]: ==23694==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b000000f70 at pc 0x7f5aac154542 bp 0x7ffe59141be0 sp 0x7ffe59141bd8
avahi-daemon[23694]: READ of size 4 at 0x60b000000f70 thread T0
avahi-daemon[23694]:     #0 0x7f5aac154541 in lookup_multicast_callback /home/runner/work/avahi/avahi/avahi-core/browse.c:268:12
avahi-daemon[23694]:     #1 0x7f5aac1bfa0a in avahi_multicast_lookup_engine_notify /home/runner/work/avahi/avahi/avahi-core/multicast-lookup.c:317:21
avahi-daemon[23694]:     #2 0x7f5aac115808 in avahi_cache_update /home/runner/work/avahi/avahi/avahi-core/cache.c:363:13
avahi-daemon[23694]:     #3 0x7f5aac0e9621 in handle_response_packet /home/runner/work/avahi/avahi/avahi-core/server.c:720:21
avahi-daemon[23694]:     #4 0x7f5aac0e3cf6 in dispatch_packet /home/runner/work/avahi/avahi/avahi-core/server.c:1032:9
avahi-daemon[23694]:     #5 0x7f5aac0e2116 in mcast_socket_event /home/runner/work/avahi/avahi/avahi-core/server.c:1093:13
avahi-daemon[23694]:     #6 0x7f5aac464b6c in avahi_simple_poll_dispatch /home/runner/work/avahi/avahi/avahi-common/simple-watch.c:585:13
avahi-daemon[23694]:     #7 0x7f5aac4651a8 in avahi_simple_poll_iterate /home/runner/work/avahi/avahi/avahi-common/simple-watch.c:605:14
avahi-daemon[23694]:     #8 0x5592a3ed3884 in run_server /home/runner/work/avahi/avahi/avahi-daemon/main.c:1279:18
avahi-daemon[23694]:     #9 0x5592a3ec4132 in main /home/runner/work/avahi/avahi/avahi-daemon/main.c:1708:13
avahi-daemon[23694]:     #10 0x7f5aabc29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
avahi-daemon[23694]:     #11 0x7f5aabc29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
avahi-daemon[23694]:     #12 0x5592a3e05054 in _start (/usr/sbin/avahi-daemon+0x71054) (BuildId: 0aa9e5ea43ef010d5f42e9109eabd1434ff1b3db)
...
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants