New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace several activation script snippets with declarative configuration #47563
Conversation
As far as I can tell, the systemd snippet hasn't depended on groups being initialized since 5d02c02 in 2015, when a `setfacl` call was removed.
Previously, the activation script was responsible for ensuring that /etc/machine-id exists. However, the only time it could not already exist is during stage-2-init, not while switching configurations, because one of the first things systemd does when starting up as PID 1 is to create this file. So I've moved the initialization to stage-2-init. Furthermore, since systemd will do the equivalent of systemd-machine-id-setup if /etc/machine-id doesn't have valid contents, we don't need to do that ourselves. We _do_, however, want to ensure that the file at least exists, because systemd also uses the non-existence of this file to guess that this is a first-boot situation. In that case, systemd tries to create some symlinks in /etc/systemd/system according to its presets, which it can't do because we've already populated /etc according to the current NixOS configuration. This is not necessary for any other activation script snippets, so it's okay to do it after stage-2-init runs the activation script. None of them declare a dependency on the "systemd" snippet. Also, most of them only create files or directories in ways that obviously don't need the machine-id set.
The default value for journald's Storage option is "auto", which determines whether to log to /var/log/journal based on whether that directory already exists. So NixOS has been unconditionally creating that directory in activation scripts. However, we can get the same behavior by configuring journald.conf to set Storage to "persistent" instead. In that case, journald will create the directory itself if necessary.
As far as I can tell, systemd has never used this directory, so I think this is a holdover from before udev merged into systemd.
These don't need to get cleaned up during activation; that can wait until systemd-tmpfiles-setup runs.
Nix 2.0 no longer uses these directories. /run/nix/current-load was moved to /nix/var/nix/current-load in 2017 (Nix commit d7653df). Anyway, src/build-remote/build-remote.cc will create the current-load directory if it doesn't exist already. /run/nix/remote-stores seems to have been deprecated since 2014 (Nix commit b1af336) when the documentation for $NIX_OTHER_STORES was removed, and support for it was dropped entirely in 2016 (Nix commit 4494000).
Anything that uses OpenGL starts after sysinit.target, so systemd-tmpfiles runs before anything that needs these symlinks.
I think pam_lastlog is the only thing that writes to these files in practice on a modern Linux system, so in a configuration that doesn't use that module, we don't need to create these files. I used tmpfiles.d instead of activation snippets to create the logs. It's good enough for upstream and other distros; it's probably good enough for us.
There is a lot of critical stuff changed here. This will need some testing. |
Tests can be run by calling |
@GrahamcOfBorg test boot.biosCdrom boot.netboot boot.uefiUsb |
@GrahamcOfBorg test boot.boot-stage1 |
No attempt on aarch64-linux (full log) The following builds were skipped because they don't evaluate on aarch64-linux: tests.boot.boot-stage1 Partial log (click to expand)
|
No attempt on x86_64-linux (full log) The following builds were skipped because they don't evaluate on x86_64-linux: tests.boot.boot-stage1 Partial log (click to expand)
|
@GrahamcOfBorg test boot-stage1 |
@GrahamcOfBorg test containers-imperative |
Success on x86_64-linux (full log) Attempted: tests.boot-stage1 Partial log (click to expand)
|
Success on x86_64-linux (full log) Attempted: tests.containers-imperative Partial log (click to expand)
|
Success on aarch64-linux (full log) Attempted: tests.containers-imperative Partial log (click to expand)
|
Success on aarch64-linux (full log) Attempted: tests.boot-stage1 Partial log (click to expand)
|
Failure on aarch64-linux (full log) Attempted: tests.boot.netboot The following builds were skipped because they don't evaluate on aarch64-linux: tests.boot.biosCdrom, tests.boot.uefiUsb Partial log (click to expand)
|
Success on x86_64-linux (full log) Attempted: tests.boot.biosCdrom, tests.boot.netboot, tests.boot.uefiUsb Partial log (click to expand)
|
I and @flokli are happy to have a closer look and review this. We were actually working on something very similar during the All Systems Go Hackfest this weekend and have some (very unstructured) notes here. https://hackmd.io/vFdKH1PpT5CbAp45SxOlAw# Guess the section about "Cleanup" is most relevant. Rest is just our rambling of stuff we learnt at the conference |
@@ -128,14 +128,6 @@ in | |||
'' | |||
# Various log/runtime directories. | |||
|
|||
mkdir -m 0755 -p /run/nix/current-load # for distributed builds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't done anywhere else is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the commit message it says that this was used in an very old version of nix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks. Missed the commit messages but they are very informative.
@@ -783,19 +784,6 @@ in | |||
|
|||
services.dbus.enable = true; | |||
|
|||
system.activationScripts.systemd = stringAfter [ "groups" ] | |||
'' | |||
mkdir -m 0755 -p /var/lib/udev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this was also used in pre-systemd times.
Excellent! |
@xaverdh Yes! Assuming that commit gets merged, I think there should also be a new config option for journald.conf's Storage setting so people can pick which mode they want. I just figured this pull request was invasive enough without adding that change to it as well. |
Motivation for this change
I find it difficult to understand what the dependencies are between the fragments of scripts that implement stage-2 init and activation. This branch illustrates several ways of eliminating some fragments of those scripts in favor of systemd tools like tmpfiles.d. In one case I found the snippet was not necessary at all, as far as I can tell.
As I reported in #47550, using tmpfiles.d is not entirely safe on NixOS, but so many things rely on it at this point that I think that just needs to be fixed, rather than continuing to use activation snippets to avoid the problem.
Things done
This series changes only modules that I'm personally using. I've tested these patches against my personal system config in QEMU, by first starting with an empty disk and verifying that first-boot comes up without errors; then rebooting the VM and verifying that stage-2-init still works with an already-initialized filesystem; and finally running
/run/current-system/bin/switch-to-configuration test
and verifying that activation on a running system works.I don't know which NixOS tests are relevant and haven't figured out how to run any of them yet, so presumably I need to go read some documentation somewhere...
sandbox
innix.conf
on non-NixOS)nix-shell -p nox --run "nox-review wip"
./result/bin/
)nix path-info -S
before and after)