Fixed unreliable DNS-SD discovery with ippfind -T 0#1211
Fixed unreliable DNS-SD discovery with ippfind -T 0#1211alexpevzner wants to merge 1 commit intoOpenPrinting:2.4.xfrom
Conversation
The driverless backend uses the ippfind tool for DNS-SD discovery. When invoking ippfind, it uses the -T 0 option to maximize speed. In this mode, ippfind follows a heuristic: if no new events are received from Avahi for 500 milliseconds and all discovered services have already been resolved, it assumes discovery is complete and exits. However, Avahi sometimes responds too slowly, meaning a 500 ms silence does not necessarily indicate that no more services will be discovered shortly. As a result, ippfind may exit prematurely, increasing the likelihood of missing devices. This issue occurs particularly often when only the loopback interface is active, and the only available devices are those published by the ipp-usb daemon. To address this, the fix increases the minimum search time to 2500 milliseconds. This change has minimal impact on normal operations, since discovery typically takes 2000–2500 ms when devices are found, while eliminating unreliability caused by early termination.
|
The initial problem was that Sometimes it doesn't see network printers at all (or ipp-usb exported printers). Sometimes the printer is found, but driverless driver is not offered. It happens on Fedora 41, it happens on our ROSA Linux, probably it happens everywhere. I cannot figure out what has caused this change. My guess, something has changed in Avahi. This patch helps. |
|
I'm not sure this is the correct fix, and will do some investigation. That said, the "driverless" backend shouldn't be running ippfind - either use Avahi's C API to browse or (for CUPS 2.5 and later) use the corresponding cupsDNSSD APIs, which is what ippfind does. And that assumes that we want to continue using the driverless backend which is convoluting a bunch of things that cupsd already supports in CUPS 2.4... |
|
This PR modifies Please note that Avahi's search completion indication is highly unreliable. If a search takes too long, Avahi may send an Additionally, the real semantics of From my experience, the most reliable method for DNS-SD discovery with Avahi is to allocate sufficient time (with an extra 1–2 seconds as a buffer if the last event arrives just before the initial timeout expires). In CUPS doesn’t suffer from this issue, so a 5-second timeout might be a better tradeoff between search speed and reliability. That said, based on |
|
|
Yes, sure. Тhat functionality used to work reliably for a long time. However, something broke recently, and I haven’t been able to pinpoint the exact change responsible. The issue reproduces on Fedora 41 with all updates installed—surprisingly, especially when the machine is offline (with That said, Avahi never guaranteed that 500 ms of silence should be treated as the end of discovery—relying on that is inherently unreliable. For example, the same device that The PR I’m proposing is simple, low-risk, and easy to review. It addresses a real, observable issue, so I’d appreciate it if it could be accepted. |
With respect, simple timing related fixes often mask a bigger problem. Let's do a little investigation to make sure we are actually fixing things. Can you try (on the same system) building the current CUPS master (2.5b1) code and run "tools/ippfind-static -T 0" to see if it reproduces with the new DNS-SD code? |
|
IMO this is related/dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1954469 and #620 and avahi/avahi#442 . tl, dr: There is no reliable marker for "we're done" in mDNS, especially with Avahi. Additionally, Avahi dBUS API really generates two dBUS messages for both answers, but "sometimes" the answers get bundled into one syscall, where the first answer is empty, and the next part contains answer for the other Avahi call. Usually it happened for me since my printer at home uses IPP only (no TLS), and since ipp-usb registers the services on plain IPP too, I imagine it behaves the same - though I did not see it often, since I use CUPS temporary queues and if I have to create permanent queue, I take URL from My plan was to find some workaround for this glibc behavior, but now I got an idea - what about driverless would separate the requests? First run would be for IPPS service search, if not found, run IPP service search? |
Agree. I've investigated things a little bit. I need to say that initial problem (driverless unreliably) is the very floating error. Yesterday it didn't work at all, today it seems to work well. Nothing has changed in the system configuration.
First of all, it is broken. It sends After hot-fixing that with replacing the After that experiment, I've instrumented the old In the poll_callback: In event loop in the main function: And in the browse_callback, in the AVAHI_BROWSER_NEW handler: And with these changes applied and with the disconnected network, I see the following trace: Please notice, the So looks like the positive To understand how it could happen, the deeper lock into the libavahi implementation is required, but I didn't do that. |
|
So I think for now:
Let me know if some more investigations are required. |
|
@alexpevzner as I wrote in my comment, IMO the problem is there is no really reliable milestone/status in mDNS where we can say we have the whole answer - see the investigation in the tickets I've shared. |
|
I've read your comment and links you've shared. This is very interesting and definitely related to the problem we are discussing here. After some thinking/analysis I've came to conclusion that actually we have two separate problems there. They look similar by their symptoms, but these problems are actually different. One problem is that when searching the real network, we don't have the reliable "end of discovery" indication. It happens not because of Avahi bugs, but because there is no such thing in the underlying DNS-SD protocol. Only some time-based approach can work here, giving some level of accuracy. Avahi tries to help us with the The second problem that we loose even the locally-published (through the Our problem here is that we use to the So @michaelrsweet is correct here in his answer: my patch mitigates the first problem (it cannot be solved completely, but 500 ms for network discovery is not enough) and only masks the second problem. However, taking in account that in the CUPS 2.5 this code is totally reworked and |
|
@alexpevzner Actually, I was thinking to fix the underlying Avahi code in 2.4.x, based on the 2.5/3.0 Avahi/DNS-SD code we have and know is working. |
|
I understand you. I still believe that setting Your new implementation looks very promising, but it has not been widely deployed yet. As a result, it’s hard to predict what issues might arise once it’s in broader use. Additionally, the new implementation seems too fast. In my setup, it completes network printer discovery in less than a second, whereas the old implementation takes 2–2.5 seconds for successful discovery. I’m concerned that not all printers—especially those on Wi-Fi, where packet loss can occur—will respond that quickly. When time permits, I’ll examine your new code in more detail. For The worst-case scenario would be if a printer appears in the list of available network printers but fails to be rediscovered later. This could lead the system to omit the driverless driver option, directing users to manually select from a list of available drivers—where they might pick an incorrect one (though this may never happen if Avahi cache reads are always reliable in the new implementation). |
… (Issue #1211) - Export _cupsGetClock private API. - Use _cupsGetClock in ippfind. - Drop "avahi_have_data" and adopt CUPS 2.5/3.0 processing strategy.
|
@alexpevzner Try the latest changes: [2.4.x 8304d6b] Fix "unreliable" discovery with ippfind as used by driverless backend (Issue #1211) Still not as fast as 2.5/3.0 for some reason, but seems to work reliably... |
|
Actually, I think the difference in performance is that 2.5/3.0 process DNS-SD updates in a separate thread. Just pushed another change to remove the extra usleep... |
It is not reliable. I've enabled the network, connected printer to the For the first time, it was running for 0.3 s and has found the So I think the minimal search time needs to be enforced (and to be 2.5 seconds at least). |
|
IIUC this is no longer valid, the current work is tracked here - #1214 . If I don't get it right, I can reopen the PR again. |
|
I’d probably prefer to keep it open while #1214 is still ongoing, just to maintain visibility—since the first half of the discussion is here... |
|
There are links to this PR in the newer one, so the discussion won't go anywhere. I just take it if the code is not going to be merged (AFAIK the newer one will be the final MR for the problem), then it should be closed. Probably we hit a difference in meaning between issue x merge request. |
The driverless backend uses the ippfind tool for DNS-SD discovery. When invoking ippfind, it uses the -T 0 option to maximize speed.
In this mode, ippfind follows a heuristic: if no new events are received from Avahi for 500 milliseconds and all discovered services have already been resolved, it assumes discovery is complete and exits.
However, Avahi sometimes responds too slowly, meaning a 500 ms silence does not necessarily indicate that no more services will be discovered shortly. As a result, ippfind may exit prematurely, increasing the likelihood of missing devices.
This issue occurs particularly often when only the loopback interface is active, and the only available devices are those published by the ipp-usb daemon.
To address this, the fix increases the minimum search time to 2500 milliseconds. This change has minimal impact on normal operations, since discovery typically takes 2000–2500 ms when devices are found, while eliminating unreliability caused by early termination.