Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unbound refactor the unit & add systemd support #101218

Merged
merged 10 commits into from Nov 8, 2020

Conversation

@andir
Copy link
Member

@andir andir commented Oct 20, 2020

Motivation for this change

This is motivated due to the current nasty state of the module and the uncertain startup state unbound might be in during boot-up on NixOS. It is somewhat related to #99901 as that might improve the situation around ACME during systemd switches (where the unbound unit was changed).

Things to be done
  • write a NixOS VM test to verify it still works
  • add a symlink pointing to the conf at /etc/unbound/unbound.conf. This will improve the unbound-control usability, we won't need to explicitly point to the config through the -c arg.
  • provide a module system option for the control socket (it should default to a root/unbound only unix socket; if configured)
  • write changelog entries
  • remove the remaining FIXME's and squash some more.
@flokli
Copy link
Contributor

@flokli flokli commented Oct 21, 2020

Generally I agree with this approach - I'd personally prefer calling the library variant libunbound, and unbound-systemd unbound (like we do with [lib]pulseaudio).

However, the test currently fails:

resolver: must succeed: drill  -t AAAA example.com @localhost
resolver # Error: error sending query: General LDNS error
resolver: output:
error:
Traceback (most recent call last):
  File "/nix/store/isl6j0894s1kyp4l9y6wjpnmfdc1f7n5-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 901, in run_tests
    exec(tests, globals())
  File "<string>", line 1, in <module>
  File "<string>", line 31, in <module>
  File "<string>", line 17, in test
  File "/nix/store/isl6j0894s1kyp4l9y6wjpnmfdc1f7n5-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 423, in succeed
    raise Exception(
Exception: command `drill  -t AAAA example.com @localhost` failed (exit code 1)
cleaning up

@andir
Copy link
Member Author

@andir andir commented Oct 21, 2020

Generally I agree with this approach - I'd personally prefer calling the library variant libunbound, and unbound-systemd unbound (like we do with [lib]pulseaudio).

However, the test currently fails:

resolver: must succeed: drill  -t AAAA example.com @localhost
resolver # Error: error sending query: General LDNS error
resolver: output:
error:
Traceback (most recent call last):
  File "/nix/store/isl6j0894s1kyp4l9y6wjpnmfdc1f7n5-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 901, in run_tests
    exec(tests, globals())
  File "<string>", line 1, in <module>
  File "<string>", line 31, in <module>
  File "<string>", line 17, in test
  File "/nix/store/isl6j0894s1kyp4l9y6wjpnmfdc1f7n5-nixos-test-driver/bin/.nixos-test-driver-wrapped", line 423, in succeed
    raise Exception(
Exception: command `drill  -t AAAA example.com @localhost` failed (exit code 1)
cleaning up

Yes, it is a WIP and I just headed to bed after pushing the latest revision.

@andir andir force-pushed the unbound-systemd branch 2 times, most recently from c7bc082 to 2ab0bb4 Oct 23, 2020
@andir
Copy link
Member Author

@andir andir commented Oct 24, 2020

Let's see if tests also pass on CI and not just locally.
@ofborg test unbound

@nixos-discourse
Copy link

@nixos-discourse nixos-discourse commented Nov 1, 2020

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/make-acme-renew-systemd-service-depend-on-dns-nss-lookup/7412/8

@andir andir force-pushed the unbound-systemd branch from 2ab0bb4 to cb27901 Nov 1, 2020
@andir
Copy link
Member Author

@andir andir commented Nov 1, 2020

Rebased & started cleaning up the commit messages so the release note changes can mostly be copy & pasted from those.

@andir andir force-pushed the unbound-systemd branch from e5d4620 to 89df4fa Nov 1, 2020
@andir
Copy link
Member Author

@andir andir commented Nov 1, 2020

There is now an option to set the socket path. cc @NinjaTrappeur could you have a look if this is what you'd want?

@andir andir force-pushed the unbound-systemd branch from 89df4fa to e5bd3c4 Nov 1, 2020
Copy link
Member

@NinjaTrappeur NinjaTrappeur left a comment

Yes. There's even a test, that's perfect!

@@ -86,6 +91,28 @@ in
description = "Use and update root trust anchor for DNSSEC validation.";
};

localControlSocketPath = mkOption {
default = null;
# FIXME: What is the proper type here so users can specify strings,
Copy link
Member

@NinjaTrappeur NinjaTrappeur Nov 2, 2020

I'm fine with keeping the current types.str alone. types.path can be relative to the $pwd at evaluation time. While it's what you want to point to a file living in your source tree, I always find awkward using this type to point to a runtime path.

Copy link
Member Author

@andir andir Nov 2, 2020

Same but often my taste doesn't seem to follow what some in the community are pushing for.

andir added 2 commits Nov 3, 2020
Systemd has to remain an optional (non-default) dependency as otherwise
we will have an unpleasant bootstrap cycle. Most (if not all) of the
(lib)unbound consumers will likely not care about unbound's systemd
integration that only affects the daemon mode, anyway.
This introduces an unbound variant that is built with systemd support.
That means it is able to signal readiness to systemd once it did start
or finished reloading. This likely allows us to close a small gap during
bootup where the service is assumed up but doesn't respond to DNS
queries just yet.
@andir andir force-pushed the unbound-systemd branch from 30152a2 to 94f766b Nov 3, 2020
@andir
Copy link
Member Author

@andir andir commented Nov 3, 2020

I decided to cut the scope of this PR and leave the openresolv configuration for now. It ended up to be a much bigger rabbit hole then I initially expected. I would also like to postpone the renaming of unbound to libunbound to another PR as those are more likely to introduce breakage. I can do one right after this is merged.

Compared the the previous version I now have written the release notes entries. The rendered version can be found at: https://s.rammhold.de/nixos-manual-unbound/release-notes.html#sec-release-21.03-notable-changes

I am including a screenshot of it for the lazy people :)

rn

The new localControlSocketPath option documentation can be found here: https://s.rammhold.de/nixos-manual-unbound/options.html#opt-services.unbound.localControlSocketPath

opt

I am marking this as ready for review and would appreciate any feedback.

@andir andir marked this pull request as ready for review Nov 3, 2020
@andir andir changed the title WIP: unbound refactor the unit & add systemd support unbound refactor the unit & add systemd support Nov 3, 2020
@andir andir force-pushed the unbound-systemd branch from a1e5270 to 15e4f92 Nov 3, 2020
andir added 8 commits Nov 3, 2020
Previously we just applied a very minimal set of restrictions and
trusted unbound to properly drop root privs and capabilities.

With this change I am (for the most part) just using the upstream
example unit file for unbound. The main difference is that we start
unbound was `unbound` user with the required capabilities instead of
letting unbound do the chroot & uid/gid changes.

The upstream unit configuration this is based on is a lot stricter with
all kinds of permissions then our previous variant. It also came with
the default of having the `Type` set to `notify`, therefore we are also
using the `unbound-with-systemd` package here. Unbound will start up,
read the configuration files and start listening on the configured ports
before systemd will declare the unit "running". This will likely help
with startup order and the occasional race condition during system
activation where the DNS service is started but not yet ready to answer
queries.

Aditionally to the much stricter runtime environmet I removed the
`/dev/urandom` mount lines we previously had in the code (that would
randomly fail during `stop`-phase).

The `preStart` script is now only required if we enabled the trust
anchor updates (which are still enabled by default).

Another beneefit of the refactoring is that we can now issue reloads via
either `pkill -HUP unbound` or `systemctl reload unbound` to reload the
running configuration without taking the daemon offline. A prerequisite
of this was that unbound configuration is available on a well known path
on the file system. I went for /etc/unbound/unbound.conf as that is the
default in the CLI tooling which in turn enables us to use
`unbound-control` without passing a custom configuration location.
Both of the configured paths should be implicit due to RuntimeDirectory
& StateDirectory.
This option allows users to specify a local UNIX control socket to
"remote control" the daemon. System users, that should be permitted to
access the daemon, must be in the `unbound` group in order to access the
socket. When a socket path is configured we are also creating the
required group.

Currently this only supports the UNIX socket mode while unbound actually
supports more advanced types. Users are still able to configure more
complex scenarios via the `extraConfig` attribute.

When this option is set to `null` (the default) it doesn't affect the
system configuration at all. The unbound defaults for control sockets
apply and no additional groups are created.
As part of this patch series a few changes have been made to the unbound
serivce the deserve proper documentation.
Other units depend on nss-lookup.target and expect the DNS resolution to
work once that target is reached. The previous version
`wants=nss-lookup.target` made this unit require the nss-lookup.target
to be reached before this was started.

Another change that we can probalby do is drop the before relationship
with the nss-lookup.target. That might just be implied with the current
version.
@andir andir force-pushed the unbound-systemd branch from 15e4f92 to 5903ea5 Nov 3, 2020
Copy link
Member

@NinjaTrappeur NinjaTrappeur left a comment

👍 The API barely moves, the module quality rise quite significantly.

Great job!

Overall LGTM.

I was wondering how to plug a passthru.tests to unbound. It'd be something nice to have. Sadly, I did not find any solution.

Obviously, we'd need the systemd support to run the VM test, which default to false in pkgs/tools/networking/unbonud/default.nix. I don't see any way to automatically run the VM test on a unbound/default.nix version update :( At least not a way to force ofborg to re-evaluate the test. Maybe I'm missing something.

@NinjaTrappeur NinjaTrappeur merged commit 5f5d38e into NixOS:master Nov 8, 2020
19 of 20 checks passed
@rissson
Copy link
Member

@rissson rissson commented Nov 8, 2020

Ugh this conflicts a lot with #89572 wish I caught it sooner.

EDIT: s/with/wish

@NinjaTrappeur
Copy link
Member

@NinjaTrappeur NinjaTrappeur commented Nov 9, 2020

Ugh, missed it as well.

On the other hand, the test is going to make your PR more easy to test.

@hmenke
Copy link
Member

@hmenke hmenke commented Nov 9, 2020

Backport to 20.09?

@andir
Copy link
Member Author

@andir andir commented Nov 10, 2020

@NixOS NixOS deleted a comment from andir Nov 22, 2020
@NixOS NixOS deleted a comment from andir Nov 22, 2020
pmeiyu pushed a commit to pmeiyu/nixpkgs that referenced this issue Dec 8, 2020
Unbound throws the following error after PR NixOS#101218:

--8<---------------cut here---------------start------------->8---
error: failed to list interfaces: getifaddrs: Address family not supported by protocol
fatal error: could not open ports
--8<---------------cut here---------------end--------------->8---

The solution is pulled from upstream:
NLnetLabs/unbound#351
pmeiyu pushed a commit to pmeiyu/nixpkgs that referenced this issue Dec 8, 2020
Unbound throws the following error due to PR NixOS#101218:

--8<---------------cut here---------------start------------->8---
error: failed to list interfaces: getifaddrs: Address family not supported by protocol
fatal error: could not open ports
--8<---------------cut here---------------end--------------->8---

The solution is pulled from upstream:
NLnetLabs/unbound#351
${lib.optionalString (cfg.localControlSocketPath != null) ''
remote-control:
control-enable: yes
control-interface: ${cfg.localControlSocketPath}
''}
${cfg.extraConfig}
Copy link
Contributor

@kvtb kvtb Mar 27, 2021

Introducing cfg.localControlSocketPath made impossible to write do-options (like do-ip6: no) in cfg.extraConfig, because those options must be BEFORE remote-control

Copy link
Contributor

@AluisioASG AluisioASG Mar 28, 2021

You can just begin a new server: block in extraConfig, at least I used to run a config like that.

Restart = "always";
RestartSec = "5s";
Copy link
Contributor

@kvtb kvtb Mar 27, 2021

Why removed?
unbound could crash (it happens) or just being started too slow that systemd marks the service as failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment