Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd unit needs Wants and Before #14059

Open
2 tasks done
lingfish opened this issue Apr 11, 2024 · 6 comments
Open
2 tasks done

systemd unit needs Wants and Before #14059

lingfish opened this issue Apr 11, 2024 · 6 comments
Labels
Milestone

Comments

@lingfish
Copy link

  • Program: Recursor
  • Issue type: Bug report

Short description

Due to #13210, recursor starts after the nss-lookup.target, and this then breaks other things like ntpd and Wireguard, as is shown here:

Apr 11 12:23:58 hostname systemd[1]: Reached target nss-lookup.target - Host and Network Name Lookups.
Apr 11 12:24:14 hostname ntpd[989]: DNS: dns_check: DNS error: -2, Name or service not known
Apr 11 12:24:21 hostname systemd[1]: Reached target network-online.target - Network is Online.
Apr 11 12:24:21 hostname systemd[1]: Starting pdns-recursor.service - PowerDNS Recursor...
Apr 11 12:24:21 hostname wg-quick[1109]: Name or service not known: `hidden.host.name:999999'
Apr 11 12:24:21 hostname pdns-recursor[1039]: Apr 11 12:24:21 PowerDNS Recursor 5.1.0-alpha0.1346.master.g617d2f04d (C) PowerDNS.COM BV

Environment

  • Operating system: Debian
  • Software version: 5.1.0-alpha0.1346.master.g617d2f04d
  • Software source: PowerDNS repository

Steps to reproduce

Reproducible by having the above version installed.

Expected behaviour

Recursor should be Before nss-lookup.target so that other units waiting on that target work.

Actual behaviour

See above.

Other information

I believe the unit needs After, Wants, and Before, as per Debian's unit file for ISC bind.

I'm no expert on systemd unit dependency stuff, but I'm inclined to trust them.

See also this discussion that makes things clearer.

@omoerbeek
Copy link
Member

It's not as simple as that. In some cases rec is used as the system resolver by the machine it is running on, in other cases just a service by other machines. Both use-cases are valid and need different unit files.

@lingfish
Copy link
Author

Sorry, I don't see the difference. By using Before, it makes rec a predicate before the (system standard) nss-lookup.target is finally reached. If rec is a local resolver, or one for a network (such as in my case), either way this will ensure it starts and is up after the network, and before anything else depending on nss-related stuff.

@omoerbeek
Copy link
Member

You log lines do suggest your rec is (also?) use as a local resolver, so I'm officially confused now. I'll let somebody who has more knowledge wrt systemd answer this.

@omoerbeek omoerbeek added the rec label Apr 11, 2024
@omoerbeek omoerbeek added this to the rec-5.1.0 milestone Apr 11, 2024
@lingfish
Copy link
Author

Indeed I do, and so again, super important for rec to start before that target is reached.

Here's a little more from systemd.special(7):

       nss-lookup.target
           A target that should be used as synchronization point for all
           host/network name service lookups. Note that this is
           independent of UNIX user/group name lookups for which
           nss-user-lookup.target should be used. All services for which
           the availability of full host/network name resolution is
           essential should be ordered after this target, but not pull
           it in.

@dwfreed
Copy link
Contributor

dwfreed commented Apr 11, 2024

As @omoerbeek mentioned, it's not possible to provide a single service file that meets everyone's needs. At present, pdns-recursor has no mechanism to ignore DNSSEC time validity checks, and so if your clock is too far off, DNSSEC fails to validate for basic things like the root zone or the TLD zones, and you can't resolve any names. To avoid this, an After=time-sync.target was added in #12248 so that users could set up something like systemd-time-wait-sync.service or similar to ensure time is synced before recursor starts. However, this created the following ordering loop (#13115):

pdns-recursor.service -> time-sync.target -> systemd-time-wait-sync.service (or similar) -> ntp.service (or similar) -> nss-lookup.target -> pdns-recursor.service

As time sync is critically important to DNSSEC, and it is varied whether pdns-recursor on the system is used as the system's recursor, it was decided to remove the Wants=nss-lookup.target and Before=nss-lookup.target to break the loop (#13210).

If your system has a reliable RTC, or another mechanism to set a reasonably close to accurate time (within an hour, preferably better) during startup that doesn't rely on DNS, then you can utilize systemd's drop-in mechanism to change the dependencies of pdns-recursor.service to remove the After=time-sync.target and add back the Before=nss-lookup.target and Wants=nss-lookup.target items. If you do not have a way to get reasonably close to accurate time during startup that doesn't rely on DNS, you could still make this change, but then you may run into the issue where DNSSEC fails to validate due to time being too far off, which may make it impossible for your NTP client to start until you've manually corrected the time; you'll have to decide if you're willing to accept that risk. Only you know your system, so only you can make this decision.

@lingfish
Copy link
Author

Thanks for this breakdown, and I see your points.

Some interesting observations from me:

  • On Debian, ntpsec.service doesn't pin itself to time-sync.target, nor time-set.target, yet one might think it would. Reading systemd.special(7) again, it is specific that a service should only reach this target if the time is set, which of course, NTP for example may not have immediately after start, so I suspect that's why. A oneshot sync by ntpsec, and then going into polling mode would satisfy... but, on my system that has rec and ntpsec installed, I don't see either target:
hostname [11:43 AM] [j:0] /etc/ntpsec # systemctl -a | grep -E 'time |ntp'
  ntpsec-systemd-netif.path         loaded    active     waiting      ntpsec-systemd-netif.path
  initrd-parse-etc.service          loaded    inactive   dead         Mountpoints Configured in the Real Root
  ntpsec-rotate-stats.service       loaded    inactive   dead         Rotate ntpd stats
  ntpsec-systemd-netif.service      loaded    inactive   dead         ntpsec-systemd-netif.service
  ntpsec.service                    loaded    active     running      Network Time Service
  user-runtime-dir@1000.service     loaded    active     exited       User Runtime Directory /run/user/1000
  ntpsec-rotate-stats.timer         loaded    active     waiting      Rotate ntpd stats daily

Services where accurate time is essential should be ordered after this unit, but not pull it in.

You'd assume that would mean "well, NTP will sort that", but it won't.

This target provides stricter clock accuracy guarantees than time-set.target (see above), but likely requires network communication and thus introduces unpredictable delays. Services that require clock accuracy and where network communication delays are acceptable should use this target. Services that require a less accurate clock, and only approximate and roughly monotonic clock behaviour should use time-set.target instead.

Based on the above, and your statement "within an hour, preferably better", perhaps rec could use time-set.target instead?

Either way, I couldn't find a discussion around this in the doco. Considering the impact it just had on my boot (tunnels not coming up, time not coming up etc), perhaps it needs to be documented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants