Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace several activation script snippets with declarative configuration #47563

Merged
merged 8 commits into from Oct 2, 2018

Conversation

jameysharp
Copy link
Contributor

Motivation for this change

I find it difficult to understand what the dependencies are between the fragments of scripts that implement stage-2 init and activation. This branch illustrates several ways of eliminating some fragments of those scripts in favor of systemd tools like tmpfiles.d. In one case I found the snippet was not necessary at all, as far as I can tell.

As I reported in #47550, using tmpfiles.d is not entirely safe on NixOS, but so many things rely on it at this point that I think that just needs to be fixed, rather than continuing to use activation snippets to avoid the problem.

Things done

This series changes only modules that I'm personally using. I've tested these patches against my personal system config in QEMU, by first starting with an empty disk and verifying that first-boot comes up without errors; then rebooting the VM and verifying that stage-2-init still works with an already-initialized filesystem; and finally running /run/current-system/bin/switch-to-configuration test and verifying that activation on a running system works.

I don't know which NixOS tests are relevant and haven't figured out how to run any of them yet, so presumably I need to go read some documentation somewhere...

  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Fits CONTRIBUTING.md.

As far as I can tell, the systemd snippet hasn't depended on groups
being initialized since 5d02c02 in
2015, when a `setfacl` call was removed.
Previously, the activation script was responsible for ensuring that
/etc/machine-id exists. However, the only time it could not already
exist is during stage-2-init, not while switching configurations,
because one of the first things systemd does when starting up as PID 1
is to create this file. So I've moved the initialization to
stage-2-init.

Furthermore, since systemd will do the equivalent of
systemd-machine-id-setup if /etc/machine-id doesn't have valid contents,
we don't need to do that ourselves.

We _do_, however, want to ensure that the file at least exists, because
systemd also uses the non-existence of this file to guess that this is a
first-boot situation. In that case, systemd tries to create some
symlinks in /etc/systemd/system according to its presets, which it can't
do because we've already populated /etc according to the current NixOS
configuration.

This is not necessary for any other activation script snippets, so it's
okay to do it after stage-2-init runs the activation script. None of
them declare a dependency on the "systemd" snippet. Also, most of them
only create files or directories in ways that obviously don't need the
machine-id set.
The default value for journald's Storage option is "auto", which
determines whether to log to /var/log/journal based on whether that
directory already exists. So NixOS has been unconditionally creating
that directory in activation scripts.

However, we can get the same behavior by configuring journald.conf to
set Storage to "persistent" instead. In that case, journald will create
the directory itself if necessary.
As far as I can tell, systemd has never used this directory, so I think
this is a holdover from before udev merged into systemd.
These don't need to get cleaned up during activation; that can wait
until systemd-tmpfiles-setup runs.
Nix 2.0 no longer uses these directories.

/run/nix/current-load was moved to /nix/var/nix/current-load in 2017
(Nix commit d7653df). Anyway,
src/build-remote/build-remote.cc will create the current-load directory
if it doesn't exist already.

/run/nix/remote-stores seems to have been deprecated since 2014 (Nix
commit b1af336) when the documentation
for $NIX_OTHER_STORES was removed, and support for it was dropped
entirely in 2016 (Nix commit 4494000).
Anything that uses OpenGL starts after sysinit.target, so
systemd-tmpfiles runs before anything that needs these symlinks.
I think pam_lastlog is the only thing that writes to these files in
practice on a modern Linux system, so in a configuration that doesn't
use that module, we don't need to create these files.

I used tmpfiles.d instead of activation snippets to create the logs.
It's good enough for upstream and other distros; it's probably good
enough for us.
@Mic92
Copy link
Member

Mic92 commented Sep 30, 2018

There is a lot of critical stuff changed here. This will need some testing.

@Mic92
Copy link
Member

Mic92 commented Sep 30, 2018

Tests can be run by calling nix-build on the file in questions. Some file contain test sets in which case you can run nix-build ./file.nix -A subtest.

@Mic92
Copy link
Member

Mic92 commented Sep 30, 2018

@GrahamcOfBorg test boot.biosCdrom boot.netboot boot.uefiUsb

@Mic92
Copy link
Member

Mic92 commented Sep 30, 2018

@GrahamcOfBorg test boot.boot-stage1

@GrahamcOfBorg
Copy link

No attempt on aarch64-linux (full log)

The following builds were skipped because they don't evaluate on aarch64-linux: tests.boot.boot-stage1

Partial log (click to expand)

Cannot nix-instantiate `tests.boot.boot-stage1' because:
error: attribute 'boot-stage1' in selection path 'tests.boot.boot-stage1' not found

@GrahamcOfBorg
Copy link

No attempt on x86_64-linux (full log)

The following builds were skipped because they don't evaluate on x86_64-linux: tests.boot.boot-stage1

Partial log (click to expand)

Cannot nix-instantiate `tests.boot.boot-stage1' because:
error: attribute 'boot-stage1' in selection path 'tests.boot.boot-stage1' not found

@Mic92
Copy link
Member

Mic92 commented Sep 30, 2018

@GrahamcOfBorg test boot-stage1

@Mic92
Copy link
Member

Mic92 commented Sep 30, 2018

@GrahamcOfBorg test containers-imperative

@GrahamcOfBorg
Copy link

Success on x86_64-linux (full log)

Attempted: tests.boot-stage1

Partial log (click to expand)

machine: exit status 1
syncing
machine: running command: sync
machine: exit status 0
test script finished in 7.98s
cleaning up
killing machine (pid 600)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
/nix/store/cdh89mnbdqgf3s7h4sl8bwrzbrqq7g2a-vm-test-run-boot-stage1

@GrahamcOfBorg
Copy link

Success on x86_64-linux (full log)

Attempted: tests.containers-imperative

Partial log (click to expand)

machine: exit status 1
syncing
machine: running command: sync
machine: exit status 0
test script finished in 34.24s
cleaning up
killing machine (pid 597)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
/nix/store/ky6mv5fn2i2jwwyqqxv1y28lawbkd9hq-vm-test-run-containers-imperative

@GrahamcOfBorg
Copy link

Success on aarch64-linux (full log)

Attempted: tests.containers-imperative

Partial log (click to expand)

machine: exit status 1
syncing
machine: running command: sync
machine: exit status 0
test script finished in 59.59s
cleaning up
killing machine (pid 631)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
/nix/store/a6nnc7x38bb6y8x9yb31cn66422lq2m0-vm-test-run-containers-imperative

@GrahamcOfBorg
Copy link

Success on aarch64-linux (full log)

Attempted: tests.boot-stage1

Partial log (click to expand)

machine: exit status 1
syncing
machine: running command: sync
machine: exit status 0
test script finished in 52.01s
cleaning up
killing machine (pid 631)
vde_switch: EOF on stdin, cleaning up and exiting
vde_switch: Could not remove ctl dir '/build/vde1.ctl': Directory not empty
/nix/store/x0f279gq59f6374j54pk7z3d8x18mj4v-vm-test-run-boot-stage1

@GrahamcOfBorg
Copy link

Failure on aarch64-linux (full log)

Attempted: tests.boot.netboot

The following builds were skipped because they don't evaluate on aarch64-linux: tests.boot.biosCdrom, tests.boot.uefiUsb

Partial log (click to expand)

building '/nix/store/012672nrqa16d0a92v194siirza1m9y0-vm-test-run-boot-netboot.drv'...
running the VM test script
machine: starting vm
machine# qemu-system-aarch64: -enable-kvm: No machine specified, and there is no default
machine# Use -machine help to list supported machines
error: QEMU died prematurely
Died at /nix/store/315069h7r2ph5j9qkdgf1mg6xp9jfvkh-nixos-test-driver/bin/.nixos-test-driver-wrapped line 112.
cleaning up
builder for '/nix/store/012672nrqa16d0a92v194siirza1m9y0-vm-test-run-boot-netboot.drv' failed with exit code 4
error: build of '/nix/store/012672nrqa16d0a92v194siirza1m9y0-vm-test-run-boot-netboot.drv' failed

@GrahamcOfBorg
Copy link

Success on x86_64-linux (full log)

Attempted: tests.boot.biosCdrom, tests.boot.netboot, tests.boot.uefiUsb

Partial log (click to expand)

machine# [  646.539777] [2829]: Failed to unmount /iso: Device or resource busy
machine# [  646.547642] ACPI: Preparing to enter system sleep state S5
machine# [  646.550209] reboot: Power down
collecting coverage data
syncing
test script finished in 677.64s
cleaning up
/nix/store/6ns4pb600xm5k0a0bvqwsh7lj15v56y1-vm-test-run-boot-bios-cdrom
/nix/store/1wkwi0rq21rg3fakn1x627zpkdp89sfq-vm-test-run-boot-netboot
/nix/store/a2scfnjwipf4zyb0046ixaglsbmdimg2-vm-test-run-boot-uefi-usb

@arianvp
Copy link
Member

arianvp commented Sep 30, 2018

I and @flokli are happy to have a closer look and review this. We were actually working on something very similar during the All Systems Go Hackfest this weekend and have some (very unstructured) notes here. https://hackmd.io/vFdKH1PpT5CbAp45SxOlAw#

Guess the section about "Cleanup" is most relevant. Rest is just our rambling of stuff we learnt at the conference

@@ -128,14 +128,6 @@ in
''
# Various log/runtime directories.

mkdir -m 0755 -p /run/nix/current-load # for distributed builds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't done anywhere else is it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit message it says that this was used in an very old version of nix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks. Missed the commit messages but they are very informative.

@@ -783,19 +784,6 @@ in

services.dbus.enable = true;

system.activationScripts.systemd = stringAfter [ "groups" ]
''
mkdir -m 0755 -p /var/lib/udev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this was also used in pre-systemd times.

@xaverdh
Copy link
Contributor

xaverdh commented Oct 2, 2018

Excellent!
So 10e8650 will allow actually making proper use of "Storage=auto".
Until now auto was essentially an alias of persistent ;-)

@jameysharp
Copy link
Contributor Author

@xaverdh Yes! Assuming that commit gets merged, I think there should also be a new config option for journald.conf's Storage setting so people can pick which mode they want. I just figured this pull request was invasive enough without adding that change to it as well.

@Mic92 Mic92 merged commit b12c759 into NixOS:master Oct 2, 2018
@jameysharp jameysharp deleted the unscripted branch July 8, 2019 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants