-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avahi-browse bug with large LAN --terminate and --cache randomly never terminates #264
Comments
@bwfisher82 - did you ever find a way to better tune avahi for a large network? |
Nope. Not sure if it's multicast in general or this mDNS implementation.
Because pinging the group address you stop getting replies from everything
as you add more and more devices. At just 200 devices it takes 30 second
ping to ff02::fb to be sure I probably got them all. I doubt it's ipv6
issue btw. We had to stop using this altogether and require users to
provide static IPs for the automation program. We briefly considered
multiple network segments but never made it that far before we just went
with static addressing and no automatic detection on the network ---
basically completely removed avahi/bonjour/mdns. :/
…On Wed, Mar 22, 2023 at 09:34 Marshall Onsrud ***@***.***> wrote:
@bwfisher82 <https://github.com/bwfisher82> - did you ever find a way to
better tune avahi for a large network?
—
Reply to this email directly, view it on GitHub
<#264 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHSM4J2GAIVN3FEIYYH4C7TW5MEVZANCNFSM4KXQGLJQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
We are seeing that hosts don't show up for a very long time (up to an hour)
via avahi-browse. However, if we restart the avahi-daemon on the remote
host or the one on which we are running avahi-browse, the host shows up
right away. This is on a network with only ~50 devices. Not a lot of
leads out there on Google.
On Wed, Mar 22, 2023 at 10:02 AM Ben Fisher ***@***.***>
wrote:
… Nope. Not sure if it's multicast in general or this mDNS implementation.
Because pinging the group address you stop getting replies from everything
as you add more and more devices. At just 200 devices it takes 30 second
ping to ff02::fb to be sure I probably got them all. I doubt it's ipv6
issue btw. We had to stop using this altogether and require users to
provide static IPs for the automation program. We briefly considered
multiple network segments but never made it that far before we just went
with static addressing and no automatic detection on the network ---
basically completely removed avahi/bonjour/mdns. :/
On Wed, Mar 22, 2023 at 09:34 Marshall Onsrud ***@***.***>
wrote:
> @bwfisher82 <https://github.com/bwfisher82> - did you ever find a way to
> better tune avahi for a large network?
>
> —
> Reply to this email directly, view it on GitHub
> <#264 (comment)>,
or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AHSM4J2GAIVN3FEIYYH4C7TW5MEVZANCNFSM4KXQGLJQ
>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#264 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACBJMGX23BBH5F477WMM2O3W5MH7RANCNFSM4KXQGLJQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I reproduced this issue by announcing a service pointing to an unresolvable host name and then sending a goodbye packet before the resolver timed out. Could anyone apply the following patch to see if it helps: diff --git a/avahi-utils/avahi-browse.c b/avahi-utils/avahi-browse.c
index 4028ca0..f7542ff 100644
--- a/avahi-utils/avahi-browse.c
+++ b/avahi-utils/avahi-browse.c
@@ -284,8 +284,10 @@ static void remove_service(Config *c, ServiceInfo *i) {
AVAHI_LLIST_REMOVE(ServiceInfo, info, services, i);
- if (i->resolver)
+ if (i->resolver) {
avahi_service_resolver_free(i->resolver);
+ n_resolving--;
+ }
avahi_free(i->name);
avahi_free(i->type);
@@ -331,6 +333,7 @@ static void service_browser_callback(
return;
remove_service(c, info);
+ check_terminate(c);
print_service_line(c, '-', interface, protocol, name, type, domain, 1);
break; ? |
@evverx I'm not sure how to apply a patch exactly. We just install avahi from RHEL / CentOS repos. I can grab this repo and jump on a given branch / tag (latest branch?) and apply the patch if I have instructions for that part. It may take some time because we don't currently have relevant devices in our data centers, but should soon ish. |
…e resolvers fail/time out Related to avahi#264 This PR addresses one particular scenario. There can be other scenarios preventing avahi-browse from stopping: avahi#444 (comment) but they should be identified and fixed one by one.
@fisherbe I opened #583 so it should be possible to get that patch by running the following commands: git clone https://github.com/avahi/avahi
cd avahi
git fetch origin pull/583/head:browse-cache-terminate
git checkout browse-cache-terminate installing the build dependencies and running ./boostrap.sh
make
As mentioned there the PR fixes one particular issue and there can be other issues preventing |
Hello,
Reporting an issue discussed in #avahi on freenode with lathiat:
I am experiencing an issue where the avahi-browse command never terminates when it should, randomly, on a large network.
I have a network with ~300 publishing devices. I find that about ~100 devices it is fine, about ~150ish we start noticing the issue, and with ~300+ it's quite noticeable and easily reproducible, but doesn't always happen.
My automation software is regularly running the avahi-browse commands to pull detected node information, and then connect to devices and perform various operations. Right now as a work-around I am having it timeout out after a few seconds but it happens often enough with this many devices (this much mDNS traffic?) that the web UI for the software becomes noticeable slow waiting for detection attempts to timeout and retry a lot.
The detection software I wrote, fairly basic by calling avahi-browse from Python with a timeout, is running on a CentOS 7 server. The server is always up, and regarding other time sync bugs, uses chronyd with CentOS Internet time sources, so that is highly unlikely I think.
The command I am using is: avahi-browse -ltrp ._._tcp
This seems to be timing out (my timeout of 15s) about 3% of the time. If I exchange the -t option (terminate) for --cache it does the same thing. I believe it is the -r resolving action that probably has the issues.
When I do have the issue, I see output where it just continually re-resolves things that it has already displayed as if -t was not used.
This is on a large fully 10GigE network if that matters. 16 core by 32 GiB memory on the server if that matters. Probably not.
The avahi-daemon is started with -s and --debug currently.
The config file:
[server]
use-ipv4=no
use-ipv6=yes
allow-interfaces=ens256
deny-interfaces=ens192,ens224
enable-dbus=yes
disallow-other-stacks=yes
objects-per-client-max=2048
ratelimit-interval-usec=1000000
ratelimit-burst=1000
cache-entries-max=2048
[wide-area]
enable-wide-area=no
[publish]
[reflector]
[rlimits]
rlimit-core=0
rlimit-data=4194304
rlimit-fsize=0
rlimit-nofile=768
rlimit-stack=4194304
rlimit-nproc=3
The text was updated successfully, but these errors were encountered: