Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/nix-containers: bind container units to machined and dbus #120636

Closed
wants to merge 1 commit into from

Conversation

pennae
Copy link
Contributor

@pennae pennae commented Apr 25, 2021

Motivation for this change

fix #109695 (containers don't stop properly during system shutdown). not sure whether this is the best way to do it, but can't think of something better that doesn't require moving away from machinectl commands again

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

currently there's a race during system poweroff, between containers shutting
down via machinectl and machined/dbus being stopped. if machined or dbus are
stopped before the containers units can dispath their stop commands the commands
fail, leaving the containers alive. systemd will ultimately kill them hard after
the stop timeout, which can have unpleasant consequences.
@pennae
Copy link
Contributor Author

pennae commented Apr 27, 2021

looked at this a bit more due to "shutdown during startup: issues: when a container is stopped before it has reached the in-container systemd it won't shut down properly. setting --kill-signal=SIGRTMIN+3 on nspawn should be sufficient to solve both that and what we tried to solve here, maybe without needing a stop command on the container unit at all. will have to test this some more.

@pennae
Copy link
Contributor Author

pennae commented Apr 27, 2021

investigated some more. it seems that we'll want this instead to fix both problems:

diff --git a/nixos/modules/system/boot/stage-2-init.sh b/nixos/modules/system/boot/stage-2-init.sh
index 50ee0b8841e..3f87c5c920c 100644
--- a/nixos/modules/system/boot/stage-2-init.sh
+++ b/nixos/modules/system/boot/stage-2-init.sh
@@ -1,5 +1,8 @@
 #! @shell@
 
+# Exit early if we're a container and asked to shut down.
+trap "exit 0" SIGRTMIN+3
+
 systemConfig=@systemConfig@
 
 export HOME=/root PATH="@path@"
diff --git a/nixos/modules/virtualisation/nixos-containers.nix b/nixos/modules/virtualisation/nixos-containers.nix
index f15d5875841..b7074e4460f 100644
--- a/nixos/modules/virtualisation/nixos-containers.nix
+++ b/nixos/modules/virtualisation/nixos-containers.nix
@@ -35,6 +35,9 @@ let
       ''
         #! ${pkgs.runtimeShell} -e
 
+        # Exit early if we're asked to shut down.
+        trap "exit 0" SIGRTMIN+3
+
         # Initialise the container side of the veth pair.
         if [ -n "$HOST_ADDRESS" ]   || [ -n "$HOST_ADDRESS6" ]  ||
            [ -n "$LOCAL_ADDRESS" ]  || [ -n "$LOCAL_ADDRESS6" ] ||
@@ -127,12 +130,16 @@ let
       ''}
 
       # Run systemd-nspawn without startup notification (we'll
-      # wait for the container systemd to signal readiness).
+      # wait for the container systemd to signal readiness)
+      # Kill signal handling means systemd-nspawn will pass a system-halt signal
+      # to the container systemd when it receives SIGTERM for container shutdown;
+      # containerInit and stage2 have to handle this as well.
       exec ${config.systemd.package}/bin/systemd-nspawn \
         --keep-unit \
         -M "$INSTANCE" -D "$root" $extraFlags \
         $EXTRA_NSPAWN_FLAGS \
         --notify-ready=yes \
+        --kill-signal=SIGRTMIN+3 \
         --bind-ro=/nix/store \
         --bind-ro=/nix/var/nix/db \
         --bind-ro=/nix/var/nix/daemon-socket \
@@ -259,13 +266,10 @@ let
     Slice = "machine.slice";
     Delegate = true;
 
-    # Hack: we don't want to kill systemd-nspawn, since we call
-    # "machinectl poweroff" in preStop to shut down the
-    # container cleanly. But systemd requires sending a signal
-    # (at least if we want remaining processes to be killed
-    # after the timeout). So send an ignored signal.
+    # We rely on systemd-nspawn turning a SIGTERM to itself into a shutdown
+    # signal (SIGRTMIN+3) for the inner container.
     KillMode = "mixed";
-    KillSignal = "WINCH";
+    KillSignal = "TERM";
 
     DevicePolicy = "closed";
     DeviceAllow = map (d: "${d.node} ${d.modifier}") cfg.allowedDevices;
@@ -752,8 +756,6 @@ in
 
       postStart = postStartScript dummyConfig;
 
-      preStop = "machinectl poweroff $INSTANCE";
-
       restartIfChanged = false;
 
       serviceConfig = serviceDirectives dummyConfig;

that will not eliminate the race between starting up a container and a stop job that replaces the running start job, but it makes it about as small as i can think of making it. also completely removes the need for machined or dbus during container shutdown. modifying stage2 doesn't feel quite right, but i really can't find a better way to do it.

@pennae pennae marked this pull request as draft April 27, 2021 19:44
@pennae pennae closed this Apr 28, 2021
@pennae pennae deleted the container-shutdown branch May 1, 2021 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"A stop job is running for container" on host shut down
1 participant