Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When used with systemd-networkd, unbound does not start until systemd-networkd-wait-online.service times out #773

Closed
dryya opened this issue Oct 30, 2022 · 8 comments

Comments

@dryya
Copy link

dryya commented Oct 30, 2022

Describe the bug

As described in this arch linux bug report, "unbound waits for the network to be on (as stipulated in its service file) and systemd waits for the DNS resolver to be up before declaring that the network is on. The cycle only breaks when systemd network initialization times out and finally the unbound service file is allowed to start." The behavior started to occur with commit afbc7bb . Unbound and the network still work perfectly fine afterwards, it's just that DNS resolution doesn't come up until after the timeout period for systemd's network target.

To reproduce

On arch linux enable the systemd-networkd and unbound systemd services. Systemd-resolved is disabled. I don't believe it's relevant but I included a minimal resolvconf config file too.

/etc/unbound/unbound.conf
server:
	verbosity: 1
	trust-anchor-file: "/etc/unbound/trusted-key.key"
	tls-cert-bundle: "/etc/ssl/cert.pem"
	tls-system-cert: yes
python:
dynlib:
remote-control:
forward-zone:
	name: "."
	forward-tls-upstream: yes
	forward-addr: 1.1.1.1@853#cloudflare-dns.com
/etc/systemd/network/20-wired.network 
[Match]
Name=enp31s0
[Network]
DHCP=yes
[DHCPv4]
UseDNS=no
[DHCPv6]
UseDNS=no
/etc/resolvconf.conf
name_servers="::1 127.0.0.1"
resolv_conf_options="trust-ad"

Some more information on what's happening via systemd logs:

Output from ❯ systemctl status systemd-networkd-wait-online.service:

× systemd-networkd-wait-online.service - Wait for Network to be Configured
     Loaded: loaded (/usr/lib/systemd/system/systemd-networkd-wait-online.service; enabled; preset: disabled)
    Drop-In: /etc/systemd/system/systemd-networkd-wait-online.service.d
             └─override.conf
     Active: failed (Result: exit-code) since Sat 2022-10-29 22:49:12 CDT; 13min ago
       Docs: man:systemd-networkd-wait-online.service(8)
    Process: 621 ExecStart=/usr/lib/systemd/systemd-networkd-wait-online (code=exited, status=1/FAILURE)
   Main PID: 621 (code=exited, status=1/FAILURE)
        CPU: 9ms

22:47:12 arch systemd[1]: Starting Wait for Network to be Configured...
22:49:12 arch systemd-networkd-wait-online[621]: Timeout occurred while waiting for network connectivity.
22:49:12 arch systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
22:49:12 arch systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
22:49:12 arch systemd[1]: Failed to start Wait for Network to be Configured.

And you can see via journalctl --boot unbound only begins afterwards:

Oct 29 22:49:12 arch systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Oct 29 22:49:12 arch systemd[1]: Failed to start Wait for Network to be Configured.
Oct 29 22:49:12 arch systemd[1]: Reached target Network is Online.
Oct 29 22:49:12 arch systemd[1]: Starting Validating, recursive, and caching DNS resolver...
Oct 29 22:49:12 arch unbound[1432]: [1432:0] notice: init module 0: subnetcache

System:

  • OS: Linux arch 6.0.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 26 Oct 2022 15:25:45 +0000 x86_64 GNU/Linux
  • unbound -V output:
❯ unbound -V
Version 1.17.0

Configure line: --prefix=/usr --sysconfdir=/etc --localstatedir=/var --sbindir=/usr/bin --disable-rpath --enable-dnscrypt --enable-dnstap --enable-pie --enable-relro-now --enable-subnet --enable-systemd --enable-tfo-client --enable-tfo-server --enable-cachedb --with-libhiredis --with-conf-file=/etc/unbound/unbound.conf --with-pidfile=/run/unbound.pid --with-rootkey-file=/etc/trusted-key.key --with-libevent --with-libnghttp2 --with-pyunbound
Linked libs: libevent 2.1.12-stable (it uses epoll), OpenSSL 1.1.1q  5 Jul 2022
Linked modules: dns64 cachedb subnetcache respip validator iterator
DNSCrypt feature available
TCP Fastopen feature available

BSD licensed, see LICENSE in source package for details.
Report bugs to unbound-bugs@nlnetlabs.nl or https://github.com/NLnetLabs/unbound/issues
@wcawijngaards
Copy link
Member

There seems to be a loop in the service file, in that the Wants seems to reference the stuff in the Before, for network-online and also for nss-lookup target. Perhaps the sensible approach would be to fill in the supposed answers here, unbound starts when the network target is done, and this is completed before the network-online target is reached. And also before nss-lookup, to have unbound up before nss-lookup intends to do queries.

This sort of depends on the meaning of the targets and also other systemd set up. Perhaps this change could be good?

diff --git a/contrib/unbound.service.in b/contrib/unbound.service.in
index ada5fac9..5a05c525 100644
--- a/contrib/unbound.service.in
+++ b/contrib/unbound.service.in
@@ -42,9 +42,8 @@
 [Unit]
 Description=Validating, recursive, and caching DNS resolver
 Documentation=man:unbound(8)
-After=network-online.target
-Before=nss-lookup.target
-Wants=network-online.target nss-lookup.target
+After=network.target
+Before=network-online.target nss-lookup.target
 
 [Install]
 WantedBy=multi-user.target

@dryya
Copy link
Author

dryya commented Nov 1, 2022

I can confirm that this works for me on two machines (one using systemd-networkd and one with no network manager, just iwd) - unbound is up and running in three seconds! (I attempted something similar on my own, but I realize now it failed because the standard systemctl edit command won't remove previous Before entries, but instead adds on to them.) Thanks for the quick response!

@jm355
Copy link

jm355 commented Dec 1, 2022

That fixed it for me as well!

@wcawijngaards
Copy link
Member

The fix is committed to the repo. That should improve the systemd integration scripts for Unbound!

jedisct1 added a commit to jedisct1/unbound that referenced this issue Dec 13, 2022
* nlnet/master:
  - Updates for NLnetLabs#461 (Add max-query-restarts option).
  - Expose 'max-sent-count' as a configuration option; the   default value retains Unbound's behavior.
  - Expose 'statistics-inhibit-zero' as a configuration option; the   default value retains Unbound's behavior.
  - Fix to wrap Makefile scripts directory in quotes for uninstall.
  Changelog note for NLnetLabs#808 - Merge NLnetLabs#808: Wrap Makefile script's directory variables in quotes.
  wrap directory variables in quotes
  Fix date.
  - Fix NLnetLabs#773: When used with systemd-networkd, unbound does not start   until systemd-networkd-wait-online.service times out.
  - Clear documentation for interactivity between the subnet module and   the serve-expired and prefetch configuration options.
  - Add SVCB and HTTPS to the types removed by 'unbound-control flush'.
  - Fix NLnetLabs#782: Segmentation fault in stats.c:404.
  Changelog entry for NLnetLabs#720
  Document max-query-restarts option
  Use max-query-restarts in iterative resolver
  Add max-query-restarts to grammar and lexer
  Add max-query-restarts config parameter
@hugleo
Copy link

hugleo commented Jan 12, 2023

Maybe it's something related to this commit that when I restart the server the unbound service fails because the ipv6 network still hasn't come up.

unbound[364]: [1673554420] unbound[364:0] error: can't bind socket: Cannot assign requested address for 2001:db8:0:2::2 port 53
unbound[364]: [1673554420] unbound[364:0] fatal error: could not open ports
systemd[1]: unbound.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: unbound.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Validating, recursive, and caching DNS resolver.

I need to restart de service to bring it up:
systemctl restart unbound.service

/etc/systemd/network/ens18.network
[Match]
Name=ens18

[Address]
Address=192.168.0.2/24

[Address]
Address=2001:db8:0:2::2/64

[Network]
Gateway=192.168.0.1
Gateway=2001:db8:0:2::1
DHCP=no
ConfigureWithoutCarrier=Yes

@ztNIE
Copy link

ztNIE commented Jul 8, 2024

Hi @wcawijngaards,

I've encountered an issue where the Unbound service fails to restart on boot, which may be related to the issue you've addressed.

TL;DR: After=network.target doesn't guarantee that interfaces are ready when Unbound attempts to bind to them. Changing the configuration to After=network-online.target appears to be the correct fix.

Details:
I have a custom dummy interface with IP 10.1.1.1, and Unbound cannot bind to it during boot time because the interface isn't ready yet. I fixed this issue by modifying the unit file (I'm using Unbound 1.rocky8 and Unbound 1.16.2) to this:

[Unit]
Description=Unbound recursive Domain Name Server
After=network.target
After=network-online.target    # This is the line I added
Before=nss-lookup.target

Before I changed the unit file (After=network.target), unbound cannot start at boot time:

Jul 08 15:07:19 sre-pdns-primary systemd[1]: Starting Unbound recursive Domain Name Server...
Jul 08 15:07:19 sre-pdns-primary unbound-checkconf[844]: unbound-checkconf: no errors in /etc/unbound/unbound.conf
Jul 08 15:07:19 sre-pdns-primary systemd[1]: Started Unbound recursive Domain Name Server.
Jul 08 15:07:19 sre-pdns-primary unbound[855]: [1720415239] unbound[855:0] error: can't bind socket: Cannot assign requested address for 10.1.1.1 port 53
Jul 08 15:07:19 sre-pdns-primary unbound[855]: [1720415239] unbound[855:0] fatal error: could not open ports
Jul 08 15:07:19 sre-pdns-primary systemd[1]: unbound.service: Main process exited, code=exited, status=1/FAILURE
Jul 08 15:07:19 sre-pdns-primary systemd[1]: unbound.service: Failed with result 'exit-code'.

After I changed the unit file (After=network-online.target)

Jul 08 15:10:11 sre-pdns-primary systemd[1]: Starting Unbound recursive Domain Name Server...
Jul 08 15:10:11 sre-pdns-primary unbound-checkconf[2484]: unbound-checkconf: no errors in /etc/unbound/unbound.conf
Jul 08 15:10:11 sre-pdns-primary systemd[1]: Started Unbound recursive Domain Name Server.
Jul 08 15:10:11 sre-pdns-primary unbound[2489]: [1720415411] unbound[2489:0] debug: chdir to /etc/unbound
Jul 08 15:10:11 sre-pdns-primary unbound[2489]: [1720415411] unbound[2489:0] debug: drop user privileges, run as unbound
Jul 08 15:10:11 sre-pdns-primary unbound[2489]: [1720415411] unbound[2489:0] debug: switching log to /var/log/unbound/unbound.log

According to RHEL's documentation, network.target means that the service for setting up the network has started but doesn't guarantee that it's ready. In contrast, network-online.target is only reached after the network is connected, which seems to be the appropriate option for this use case.

In most cases, the current setting works because interfaces are up faster than Unbound tries to bind to them. However, there's a chance that interfaces become slow, causing Unbound not to start at boot time. Many users modify their own systemd unit file to fix this (it's more likely to happen with custom interfaces). Changing After=network.target to After=network-online.target may address the root cause of this issue.

@hugleo
Copy link

hugleo commented Jul 8, 2024

Not facing the problem for ipv4.
But for ipv6 the root cause seems to be DAD. A workaround is to disable it with: net.ipv6.conf.xxx.accept_dad = 0

wcawijngaards added a commit that referenced this issue Jul 10, 2024
  network-online.target. Also for contrib/unbound_portable.service.in.
@wcawijngaards
Copy link
Member

The commit d43760a adds the network-online.target to the contrib/unbound.service.in and contrib/unbound_portable.service.in unit files. Another workaround for avoiding the problem could be to set ip-freebind: yes, that allows using interfaces that are down, or ip-transparent: yes, by the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants