nixos/nix-containers: bind container units to machined and dbus #120636

pennae · 2021-04-25T20:08:43Z

Motivation for this change

fix #109695 (containers don't stop properly during system shutdown). not sure whether this is the best way to do it, but can't think of something better that doesn't require moving away from machinectl commands again

Things done

currently there's a race during system poweroff, between containers shutting down via machinectl and machined/dbus being stopped. if machined or dbus are stopped before the containers units can dispath their stop commands the commands fail, leaving the containers alive. systemd will ultimately kill them hard after the stop timeout, which can have unpleasant consequences.

pennae · 2021-04-27T16:18:10Z

looked at this a bit more due to "shutdown during startup: issues: when a container is stopped before it has reached the in-container systemd it won't shut down properly. setting --kill-signal=SIGRTMIN+3 on nspawn should be sufficient to solve both that and what we tried to solve here, maybe without needing a stop command on the container unit at all. will have to test this some more.

pennae · 2021-04-27T19:44:46Z

investigated some more. it seems that we'll want this instead to fix both problems:

diff --git a/nixos/modules/system/boot/stage-2-init.sh b/nixos/modules/system/boot/stage-2-init.sh
index 50ee0b8841e..3f87c5c920c 100644
--- a/nixos/modules/system/boot/stage-2-init.sh
+++ b/nixos/modules/system/boot/stage-2-init.sh
@@ -1,5 +1,8 @@
 #! @shell@
 
+# Exit early if we're a container and asked to shut down.
+trap "exit 0" SIGRTMIN+3
+
 systemConfig=@systemConfig@
 
 export HOME=/root PATH="@path@"
diff --git a/nixos/modules/virtualisation/nixos-containers.nix b/nixos/modules/virtualisation/nixos-containers.nix
index f15d5875841..b7074e4460f 100644
--- a/nixos/modules/virtualisation/nixos-containers.nix
+++ b/nixos/modules/virtualisation/nixos-containers.nix
@@ -35,6 +35,9 @@ let
       ''
         #! ${pkgs.runtimeShell} -e
 
+        # Exit early if we're asked to shut down.
+        trap "exit 0" SIGRTMIN+3
+
         # Initialise the container side of the veth pair.
         if [ -n "$HOST_ADDRESS" ]   || [ -n "$HOST_ADDRESS6" ]  ||
            [ -n "$LOCAL_ADDRESS" ]  || [ -n "$LOCAL_ADDRESS6" ] ||
@@ -127,12 +130,16 @@ let
       ''}
 
       # Run systemd-nspawn without startup notification (we'll
-      # wait for the container systemd to signal readiness).
+      # wait for the container systemd to signal readiness)
+      # Kill signal handling means systemd-nspawn will pass a system-halt signal
+      # to the container systemd when it receives SIGTERM for container shutdown;
+      # containerInit and stage2 have to handle this as well.
       exec ${config.systemd.package}/bin/systemd-nspawn \
         --keep-unit \
         -M "$INSTANCE" -D "$root" $extraFlags \
         $EXTRA_NSPAWN_FLAGS \
         --notify-ready=yes \
+        --kill-signal=SIGRTMIN+3 \
         --bind-ro=/nix/store \
         --bind-ro=/nix/var/nix/db \
         --bind-ro=/nix/var/nix/daemon-socket \
@@ -259,13 +266,10 @@ let
     Slice = "machine.slice";
     Delegate = true;
 
-    # Hack: we don't want to kill systemd-nspawn, since we call
-    # "machinectl poweroff" in preStop to shut down the
-    # container cleanly. But systemd requires sending a signal
-    # (at least if we want remaining processes to be killed
-    # after the timeout). So send an ignored signal.
+    # We rely on systemd-nspawn turning a SIGTERM to itself into a shutdown
+    # signal (SIGRTMIN+3) for the inner container.
     KillMode = "mixed";
-    KillSignal = "WINCH";
+    KillSignal = "TERM";
 
     DevicePolicy = "closed";
     DeviceAllow = map (d: "${d.node} ${d.modifier}") cfg.allowedDevices;
@@ -752,8 +756,6 @@ in
 
       postStart = postStartScript dummyConfig;
 
-      preStop = "machinectl poweroff $INSTANCE";
-
       restartIfChanged = false;
 
       serviceConfig = serviceDirectives dummyConfig;

that will not eliminate the race between starting up a container and a stop job that replaces the running start job, but it makes it about as small as i can think of making it. also completely removes the need for machined or dbus during container shutdown. modifying stage2 doesn't feel quite right, but i really can't find a better way to do it.

github-actions bot added 6.topic: nixos 8.has: module (update) labels Apr 25, 2021

ofborg bot added 10.rebuild-darwin: 0 10.rebuild-linux: 1-10 labels Apr 25, 2021

SuperSandro2000 requested a review from Mic92 April 27, 2021 13:38

pennae marked this pull request as draft April 27, 2021 19:44

pennae mentioned this pull request Apr 28, 2021

nixos/nix-containers: use SIGTERM to stop containers #121021

Merged

10 tasks

pennae closed this Apr 28, 2021

pennae deleted the container-shutdown branch May 1, 2021 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nixos/nix-containers: bind container units to machined and dbus #120636

nixos/nix-containers: bind container units to machined and dbus #120636

pennae commented Apr 25, 2021

pennae commented Apr 27, 2021

pennae commented Apr 27, 2021

nixos/nix-containers: bind container units to machined and dbus #120636

nixos/nix-containers: bind container units to machined and dbus #120636

Conversation

pennae commented Apr 25, 2021

Motivation for this change

Things done

pennae commented Apr 27, 2021

pennae commented Apr 27, 2021