From ac64ce994509aaad8c5b55254595a5f989ba24e9 Mon Sep 17 00:00:00 2001 From: aszlig Date: Sun, 10 Mar 2019 12:21:55 +0100 Subject: [PATCH 1/8] nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig --- nixos/modules/module-list.nix | 1 + nixos/modules/security/systemd-chroot.nix | 160 ++++++++++++++++++++++ nixos/modules/system/boot/systemd-lib.nix | 9 +- nixos/tests/all-tests.nix | 1 + nixos/tests/systemd-chroot.nix | 129 +++++++++++++++++ 5 files changed, 295 insertions(+), 5 deletions(-) create mode 100644 nixos/modules/security/systemd-chroot.nix create mode 100644 nixos/tests/systemd-chroot.nix diff --git a/nixos/modules/module-list.nix b/nixos/modules/module-list.nix index 2c7b15e65c49..768bc40d1796 100644 --- a/nixos/modules/module-list.nix +++ b/nixos/modules/module-list.nix @@ -170,6 +170,7 @@ ./security/rtkit.nix ./security/wrappers/default.nix ./security/sudo.nix + ./security/systemd-chroot.nix ./services/admin/oxidized.nix ./services/admin/salt/master.nix ./services/admin/salt/minion.nix diff --git a/nixos/modules/security/systemd-chroot.nix b/nixos/modules/security/systemd-chroot.nix new file mode 100644 index 000000000000..befe2d3418c9 --- /dev/null +++ b/nixos/modules/security/systemd-chroot.nix @@ -0,0 +1,160 @@ +{ config, pkgs, lib, ... }: + +let + inherit (lib) types; + inherit (import ../system/boot/systemd-lib.nix { + inherit config pkgs lib; + }) mkPathSafeName; +in { + options.systemd.services = lib.mkOption { + type = types.attrsOf (types.submodule ({ name, config, ... }: { + options.chroot.enable = lib.mkOption { + type = types.bool; + default = false; + description = '' + If set, all the required runtime store paths for this service are + bind-mounted into a tmpfs-based + chroot + 2 + . + ''; + }; + + options.chroot.packages = lib.mkOption { + type = types.listOf (types.either types.str types.package); + default = []; + description = let + mkScOption = optName: ""; + in '' + Additional packages or strings with context to add to the closure of + the chroot. By default, this includes all the packages from the + ${lib.concatMapStringsSep ", " mkScOption [ + "ExecReload" "ExecStartPost" "ExecStartPre" "ExecStop" + "ExecStopPost" + ]} and ${mkScOption "ExecStart"} options. + + Only the latter + (${mkScOption "ExecStart"}) will be used if + ${mkScOption "RootDirectoryStartOnly"} is enabled. + + Also, the store paths listed in are + not included in the closure as + well as paths from other options except those listed + above. + ''; + }; + + options.chroot.withBinSh = lib.mkOption { + type = types.bool; + default = true; + description = '' + Whether to symlink dash as + /bin/sh to the chroot. + + This is useful for some applications, which for example use the + + system + 3 + library function to execute commands. + ''; + }; + + options.chroot.confinement = lib.mkOption { + type = types.enum [ "full-apivfs" "chroot-only" ]; + default = "full-apivfs"; + description = '' + The value full-apivfs (the default) sets up + private /dev, /proc, /sys and /tmp file systems in a separate user + name space. + + If this is set to chroot-only, only the file + system name space is set up along with the call to + chroot + 2 + . + + This doesn't cover network namespaces and is solely for + file system level isolation. + ''; + }; + + config = lib.mkIf config.chroot.enable { + serviceConfig = let + rootName = "${mkPathSafeName name}-chroot"; + in { + RootDirectory = pkgs.runCommand rootName {} "mkdir \"$out\""; + TemporaryFileSystem = "/"; + MountFlags = lib.mkDefault "private"; + } // lib.optionalAttrs config.chroot.withBinSh { + BindReadOnlyPaths = [ "${pkgs.dash}/bin/dash:/bin/sh" ]; + } // lib.optionalAttrs (config.chroot.confinement == "full-apivfs") { + MountAPIVFS = true; + PrivateDevices = true; + PrivateTmp = true; + PrivateUsers = true; + ProtectControlGroups = true; + ProtectKernelModules = true; + ProtectKernelTunables = true; + }; + chroot.packages = let + startOnly = config.serviceConfig.RootDirectoryStartOnly or false; + execOpts = if startOnly then [ "ExecStart" ] else [ + "ExecReload" "ExecStart" "ExecStartPost" "ExecStartPre" "ExecStop" + "ExecStopPost" + ]; + execPkgs = lib.concatMap (opt: let + isSet = config.serviceConfig ? ${opt}; + in lib.optional isSet config.serviceConfig.${opt}) execOpts; + in execPkgs ++ lib.optional config.chroot.withBinSh pkgs.dash; + }; + })); + }; + + config.assertions = lib.concatLists (lib.mapAttrsToList (name: cfg: let + whatOpt = optName: "The 'serviceConfig' option '${optName}' for" + + " service '${name}' is enabled in conjunction with" + + " 'chroot.enable'"; + in lib.optionals cfg.chroot.enable [ + { assertion = !cfg.serviceConfig.RootDirectoryStartOnly or false; + message = "${whatOpt "RootDirectoryStartOnly"}, but right now systemd" + + " doesn't support restricting bind-mounts to 'ExecStart'." + + " Please either define a separate service or find a way to run" + + " commands other than ExecStart within the chroot."; + } + { assertion = !cfg.serviceConfig.DynamicUser or false; + message = "${whatOpt "DynamicUser"}. Please create a dedicated user via" + + " the 'users.users' option instead as this combination is" + + " currently not supported."; + } + ]) config.systemd.services); + + config.systemd.packages = lib.concatLists (lib.mapAttrsToList (name: cfg: let + rootPaths = let + contents = lib.concatStringsSep "\n" cfg.chroot.packages; + in pkgs.writeText "${mkPathSafeName name}-string-contexts.txt" contents; + + chrootPaths = pkgs.runCommand "${mkPathSafeName name}-chroot-paths" { + closureInfo = pkgs.closureInfo { inherit rootPaths; }; + serviceName = "${name}.service"; + excludedPath = rootPaths; + } '' + mkdir -p "$out/lib/systemd/system" + serviceFile="$out/lib/systemd/system/$serviceName" + + echo '[Service]' > "$serviceFile" + + while read storePath; do + if [ -L "$storePath" ]; then + # Currently, systemd can't cope with symlinks in Bind(ReadOnly)Paths, + # so let's just bind-mount the target to that location. + echo "BindReadOnlyPaths=$(readlink -e "$storePath"):$storePath" + elif [ "$storePath" != "$excludedPath" ]; then + echo "BindReadOnlyPaths=$storePath" + fi + done < "$closureInfo/store-paths" >> "$serviceFile" + ''; + in lib.optional cfg.chroot.enable chrootPaths) config.systemd.services); +} diff --git a/nixos/modules/system/boot/systemd-lib.nix b/nixos/modules/system/boot/systemd-lib.nix index 68a40377ee13..28ad4f121bbe 100644 --- a/nixos/modules/system/boot/systemd-lib.nix +++ b/nixos/modules/system/boot/systemd-lib.nix @@ -9,12 +9,11 @@ in rec { shellEscape = s: (replaceChars [ "\\" ] [ "\\\\" ] s); + mkPathSafeName = lib.replaceChars ["@" ":" "\\" "[" "]"] ["-" "-" "-" "" ""]; + makeUnit = name: unit: - let - pathSafeName = lib.replaceChars ["@" ":" "\\" "[" "]"] ["-" "-" "-" "" ""] name; - in if unit.enable then - pkgs.runCommand "unit-${pathSafeName}" + pkgs.runCommand "unit-${mkPathSafeName name}" { preferLocalBuild = true; allowSubstitutes = false; inherit (unit) text; @@ -24,7 +23,7 @@ in rec { echo -n "$text" > $out/${shellEscape name} '' else - pkgs.runCommand "unit-${pathSafeName}-disabled" + pkgs.runCommand "unit-${mkPathSafeName name}-disabled" { preferLocalBuild = true; allowSubstitutes = false; } diff --git a/nixos/tests/all-tests.nix b/nixos/tests/all-tests.nix index 2ddb54bcc3d7..fe67e2453505 100644 --- a/nixos/tests/all-tests.nix +++ b/nixos/tests/all-tests.nix @@ -216,6 +216,7 @@ in switchTest = handleTest ./switch-test.nix {}; syncthing-relay = handleTest ./syncthing-relay.nix {}; systemd = handleTest ./systemd.nix {}; + systemd-chroot = handleTest ./systemd-chroot.nix {}; taskserver = handleTest ./taskserver.nix {}; telegraf = handleTest ./telegraf.nix {}; tomcat = handleTest ./tomcat.nix {}; diff --git a/nixos/tests/systemd-chroot.nix b/nixos/tests/systemd-chroot.nix new file mode 100644 index 000000000000..523e1ad9f4df --- /dev/null +++ b/nixos/tests/systemd-chroot.nix @@ -0,0 +1,129 @@ +import ./make-test.nix { + name = "systemd-chroot"; + + machine = { pkgs, lib, ... }: let + testServer = pkgs.writeScript "testserver.sh" '' + #!${pkgs.stdenv.shell} + export PATH=${lib.escapeShellArg "${pkgs.coreutils}/bin"} + ${lib.escapeShellArg pkgs.stdenv.shell} 2>&1 + echo "exit-status:$?" + ''; + + testClient = pkgs.writeScriptBin "chroot-exec" '' + #!${pkgs.stdenv.shell} -e + output="$(echo "$@" | nc -NU "/run/test$(< /teststep).sock")" + ret="$(echo "$output" | sed -nre '$s/^exit-status:([0-9]+)$/\1/p')" + echo "$output" | head -n -1 + exit "''${ret:-1}" + ''; + + mkTestStep = num: { description, config ? {}, testScript }: { + systemd.sockets."test${toString num}" = { + description = "Socket for Test Service ${toString num}"; + wantedBy = [ "sockets.target" ]; + socketConfig.ListenStream = "/run/test${toString num}.sock"; + socketConfig.Accept = true; + }; + + systemd.services."test${toString num}@" = { + description = "Chrooted Test Service ${toString num}"; + chroot = (config.chroot or {}) // { enable = true; }; + serviceConfig = (config.serviceConfig or {}) // { + ExecStart = testServer; + StandardInput = "socket"; + }; + } // removeAttrs config [ "chroot" "serviceConfig" ]; + + __testSteps = lib.mkOrder num '' + subtest '${lib.escape ["\\" "'"] description}', sub { + $machine->succeed('echo ${toString num} > /teststep'); + ${testScript} + }; + ''; + }; + + in { + imports = lib.imap1 mkTestStep [ + { description = "chroot-only confinement"; + config.chroot.confinement = "chroot-only"; + testScript = '' + $machine->succeed( + 'test "$(chroot-exec ls -1 / | paste -sd,)" = bin,nix', + 'test "$(chroot-exec id -u)" = 0', + 'chroot-exec chown 65534 /bin', + ); + ''; + } + { description = "full confinement with APIVFS"; + testScript = '' + $machine->fail( + 'chroot-exec ls -l /etc', + 'chroot-exec ls -l /run', + 'chroot-exec chown 65534 /bin', + ); + $machine->succeed( + 'test "$(chroot-exec id -u)" = 0', + 'chroot-exec chown 0 /bin', + ); + ''; + } + { description = "check existence of bind-mounted /etc"; + config.serviceConfig.BindReadOnlyPaths = [ "/etc" ]; + testScript = '' + $machine->succeed('test -n "$(chroot-exec cat /etc/passwd)"'); + ''; + } + { description = "check if User/Group really runs as non-root"; + config.serviceConfig.User = "chroot-testuser"; + config.serviceConfig.Group = "chroot-testgroup"; + testScript = '' + $machine->succeed('chroot-exec ls -l /dev'); + $machine->succeed('test "$(chroot-exec id -u)" != 0'); + $machine->fail('chroot-exec touch /bin/test'); + ''; + } + (let + symlink = pkgs.runCommand "symlink" { + target = pkgs.writeText "symlink-target" "got me\n"; + } "ln -s \"$target\" \"$out\""; + in { + description = "check if symlinks are properly bind-mounted"; + config.chroot.packages = lib.singleton symlink; + testScript = '' + $machine->fail('chroot-exec test -e /etc'); + $machine->succeed('chroot-exec cat ${symlink} >&2'); + $machine->succeed('test "$(chroot-exec cat ${symlink})" = "got me"'); + ''; + }) + { description = "check if StateDirectory works"; + config.serviceConfig.User = "chroot-testuser"; + config.serviceConfig.Group = "chroot-testgroup"; + config.serviceConfig.StateDirectory = "testme"; + testScript = '' + $machine->succeed('chroot-exec touch /tmp/canary'); + $machine->succeed('chroot-exec "echo works > /var/lib/testme/foo"'); + $machine->succeed('test "$(< /var/lib/testme/foo)" = works'); + $machine->succeed('test ! -e /tmp/canary'); + ''; + } + ]; + + options.__testSteps = lib.mkOption { + type = lib.types.lines; + description = "All of the test steps combined as a single script."; + }; + + config.environment.systemPackages = lib.singleton testClient; + + config.users.groups.chroot-testgroup = {}; + config.users.users.chroot-testuser = { + description = "Chroot Test User"; + group = "chroot-testgroup"; + }; + }; + + testScript = { nodes, ... }: '' + $machine->waitForUnit('multi-user.target'); + ${nodes.machine.config.__testSteps} + ''; +} From 0ba48f46dacf1d0771cb1995a9a0ff6c1bd2e4fb Mon Sep 17 00:00:00 2001 From: aszlig Date: Thu, 14 Mar 2019 15:26:10 +0100 Subject: [PATCH 2/8] nixos/systemd-chroot: Rename chroot to confinement Quoting @edolstra from [1]: I don't really like the name "chroot", something like "confine[ment]" or "restrict" seems better. Conceptually we're not providing a completely different filesystem tree but a restricted view of the same tree. I already used "confinement" as a sub-option and I do agree that "chroot" sounds a bit too specific (especially because not *only* chroot is involved). So this changes the module name and its option to use "confinement" instead of "chroot" and also renames the "chroot.confinement" to "confinement.mode". [1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704 Signed-off-by: aszlig --- nixos/modules/module-list.nix | 2 +- ...emd-chroot.nix => systemd-confinement.nix} | 26 +++++++++---------- nixos/tests/all-tests.nix | 2 +- ...emd-chroot.nix => systemd-confinement.nix} | 12 ++++----- 4 files changed, 21 insertions(+), 21 deletions(-) rename nixos/modules/security/{systemd-chroot.nix => systemd-confinement.nix} (88%) rename nixos/tests/{systemd-chroot.nix => systemd-confinement.nix} (92%) diff --git a/nixos/modules/module-list.nix b/nixos/modules/module-list.nix index 768bc40d1796..ab49bd549a87 100644 --- a/nixos/modules/module-list.nix +++ b/nixos/modules/module-list.nix @@ -170,7 +170,7 @@ ./security/rtkit.nix ./security/wrappers/default.nix ./security/sudo.nix - ./security/systemd-chroot.nix + ./security/systemd-confinement.nix ./services/admin/oxidized.nix ./services/admin/salt/master.nix ./services/admin/salt/minion.nix diff --git a/nixos/modules/security/systemd-chroot.nix b/nixos/modules/security/systemd-confinement.nix similarity index 88% rename from nixos/modules/security/systemd-chroot.nix rename to nixos/modules/security/systemd-confinement.nix index befe2d3418c9..dc53bbc4dbb9 100644 --- a/nixos/modules/security/systemd-chroot.nix +++ b/nixos/modules/security/systemd-confinement.nix @@ -8,7 +8,7 @@ let in { options.systemd.services = lib.mkOption { type = types.attrsOf (types.submodule ({ name, config, ... }: { - options.chroot.enable = lib.mkOption { + options.confinement.enable = lib.mkOption { type = types.bool; default = false; description = '' @@ -20,7 +20,7 @@ in { ''; }; - options.chroot.packages = lib.mkOption { + options.confinement.packages = lib.mkOption { type = types.listOf (types.either types.str types.package); default = []; description = let @@ -44,7 +44,7 @@ in { ''; }; - options.chroot.withBinSh = lib.mkOption { + options.confinement.withBinSh = lib.mkOption { type = types.bool; default = true; description = '' @@ -59,7 +59,7 @@ in { ''; }; - options.chroot.confinement = lib.mkOption { + options.confinement.mode = lib.mkOption { type = types.enum [ "full-apivfs" "chroot-only" ]; default = "full-apivfs"; description = '' @@ -81,16 +81,16 @@ in { ''; }; - config = lib.mkIf config.chroot.enable { + config = lib.mkIf config.confinement.enable { serviceConfig = let rootName = "${mkPathSafeName name}-chroot"; in { RootDirectory = pkgs.runCommand rootName {} "mkdir \"$out\""; TemporaryFileSystem = "/"; MountFlags = lib.mkDefault "private"; - } // lib.optionalAttrs config.chroot.withBinSh { + } // lib.optionalAttrs config.confinement.withBinSh { BindReadOnlyPaths = [ "${pkgs.dash}/bin/dash:/bin/sh" ]; - } // lib.optionalAttrs (config.chroot.confinement == "full-apivfs") { + } // lib.optionalAttrs (config.confinement.mode == "full-apivfs") { MountAPIVFS = true; PrivateDevices = true; PrivateTmp = true; @@ -99,7 +99,7 @@ in { ProtectKernelModules = true; ProtectKernelTunables = true; }; - chroot.packages = let + confinement.packages = let startOnly = config.serviceConfig.RootDirectoryStartOnly or false; execOpts = if startOnly then [ "ExecStart" ] else [ "ExecReload" "ExecStart" "ExecStartPost" "ExecStartPre" "ExecStop" @@ -108,7 +108,7 @@ in { execPkgs = lib.concatMap (opt: let isSet = config.serviceConfig ? ${opt}; in lib.optional isSet config.serviceConfig.${opt}) execOpts; - in execPkgs ++ lib.optional config.chroot.withBinSh pkgs.dash; + in execPkgs ++ lib.optional config.confinement.withBinSh pkgs.dash; }; })); }; @@ -116,8 +116,8 @@ in { config.assertions = lib.concatLists (lib.mapAttrsToList (name: cfg: let whatOpt = optName: "The 'serviceConfig' option '${optName}' for" + " service '${name}' is enabled in conjunction with" - + " 'chroot.enable'"; - in lib.optionals cfg.chroot.enable [ + + " 'confinement.enable'"; + in lib.optionals cfg.confinement.enable [ { assertion = !cfg.serviceConfig.RootDirectoryStartOnly or false; message = "${whatOpt "RootDirectoryStartOnly"}, but right now systemd" + " doesn't support restricting bind-mounts to 'ExecStart'." @@ -133,7 +133,7 @@ in { config.systemd.packages = lib.concatLists (lib.mapAttrsToList (name: cfg: let rootPaths = let - contents = lib.concatStringsSep "\n" cfg.chroot.packages; + contents = lib.concatStringsSep "\n" cfg.confinement.packages; in pkgs.writeText "${mkPathSafeName name}-string-contexts.txt" contents; chrootPaths = pkgs.runCommand "${mkPathSafeName name}-chroot-paths" { @@ -156,5 +156,5 @@ in { fi done < "$closureInfo/store-paths" >> "$serviceFile" ''; - in lib.optional cfg.chroot.enable chrootPaths) config.systemd.services); + in lib.optional cfg.confinement.enable chrootPaths) config.systemd.services); } diff --git a/nixos/tests/all-tests.nix b/nixos/tests/all-tests.nix index fe67e2453505..70103c4e6dad 100644 --- a/nixos/tests/all-tests.nix +++ b/nixos/tests/all-tests.nix @@ -216,7 +216,7 @@ in switchTest = handleTest ./switch-test.nix {}; syncthing-relay = handleTest ./syncthing-relay.nix {}; systemd = handleTest ./systemd.nix {}; - systemd-chroot = handleTest ./systemd-chroot.nix {}; + systemd-confinement = handleTest ./systemd-confinement.nix {}; taskserver = handleTest ./taskserver.nix {}; telegraf = handleTest ./telegraf.nix {}; tomcat = handleTest ./tomcat.nix {}; diff --git a/nixos/tests/systemd-chroot.nix b/nixos/tests/systemd-confinement.nix similarity index 92% rename from nixos/tests/systemd-chroot.nix rename to nixos/tests/systemd-confinement.nix index 523e1ad9f4df..448d34ec30bb 100644 --- a/nixos/tests/systemd-chroot.nix +++ b/nixos/tests/systemd-confinement.nix @@ -1,5 +1,5 @@ import ./make-test.nix { - name = "systemd-chroot"; + name = "systemd-confinement"; machine = { pkgs, lib, ... }: let testServer = pkgs.writeScript "testserver.sh" '' @@ -26,13 +26,13 @@ import ./make-test.nix { }; systemd.services."test${toString num}@" = { - description = "Chrooted Test Service ${toString num}"; - chroot = (config.chroot or {}) // { enable = true; }; + description = "Confined Test Service ${toString num}"; + confinement = (config.confinement or {}) // { enable = true; }; serviceConfig = (config.serviceConfig or {}) // { ExecStart = testServer; StandardInput = "socket"; }; - } // removeAttrs config [ "chroot" "serviceConfig" ]; + } // removeAttrs config [ "confinement" "serviceConfig" ]; __testSteps = lib.mkOrder num '' subtest '${lib.escape ["\\" "'"] description}', sub { @@ -45,7 +45,7 @@ import ./make-test.nix { in { imports = lib.imap1 mkTestStep [ { description = "chroot-only confinement"; - config.chroot.confinement = "chroot-only"; + config.confinement.mode = "chroot-only"; testScript = '' $machine->succeed( 'test "$(chroot-exec ls -1 / | paste -sd,)" = bin,nix', @@ -88,7 +88,7 @@ import ./make-test.nix { } "ln -s \"$target\" \"$out\""; in { description = "check if symlinks are properly bind-mounted"; - config.chroot.packages = lib.singleton symlink; + config.confinement.packages = lib.singleton symlink; testScript = '' $machine->fail('chroot-exec test -e /etc'); $machine->succeed('chroot-exec cat ${symlink} >&2'); From 46f7dd436f4b10d9c6cdde737d4da3ffce8e88be Mon Sep 17 00:00:00 2001 From: aszlig Date: Thu, 14 Mar 2019 19:07:03 +0100 Subject: [PATCH 3/8] nixos/confinement: Allow to configure /bin/sh Another thing requested by @edolstra in [1]: We should not provide a different /bin/sh in the chroot, that's just asking for confusion and random shell script breakage. It should be the same shell (i.e. bash) as in a regular environment. While I personally would even go as far to even have a very restricted shell that is not even a shell and basically *only* allows "/bin/sh -c" with only *very* minimal parsing of shell syntax, I do agree that people expect /bin/sh to be bash (or the one configured by environment.binsh) on NixOS. So this should make both others and me happy in that I could just use confinement.binSh = "${pkgs.dash}/bin/dash" for the services I confine. [1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704 Signed-off-by: aszlig --- .../modules/security/systemd-confinement.nix | 35 ++++++++++++------- nixos/tests/systemd-confinement.nix | 26 ++++++++++++++ 2 files changed, 49 insertions(+), 12 deletions(-) diff --git a/nixos/modules/security/systemd-confinement.nix b/nixos/modules/security/systemd-confinement.nix index dc53bbc4dbb9..a8367ca5eede 100644 --- a/nixos/modules/security/systemd-confinement.nix +++ b/nixos/modules/security/systemd-confinement.nix @@ -1,6 +1,7 @@ { config, pkgs, lib, ... }: let + toplevelConfig = config; inherit (lib) types; inherit (import ../system/boot/systemd-lib.nix { inherit config pkgs lib; @@ -44,12 +45,15 @@ in { ''; }; - options.confinement.withBinSh = lib.mkOption { - type = types.bool; - default = true; + options.confinement.binSh = lib.mkOption { + type = types.nullOr types.path; + default = toplevelConfig.environment.binsh; + defaultText = "config.environment.binsh"; + example = lib.literalExample "\${pkgs.dash}/bin/dash"; description = '' - Whether to symlink dash as - /bin/sh to the chroot. + The program to make available as /bin/sh inside + the chroot. If this is set to null, no + /bin/sh is provided at all. This is useful for some applications, which for example use the @@ -81,15 +85,14 @@ in { ''; }; - config = lib.mkIf config.confinement.enable { - serviceConfig = let - rootName = "${mkPathSafeName name}-chroot"; - in { + config = let + rootName = "${mkPathSafeName name}-chroot"; + inherit (config.confinement) binSh; + in lib.mkIf config.confinement.enable { + serviceConfig = { RootDirectory = pkgs.runCommand rootName {} "mkdir \"$out\""; TemporaryFileSystem = "/"; MountFlags = lib.mkDefault "private"; - } // lib.optionalAttrs config.confinement.withBinSh { - BindReadOnlyPaths = [ "${pkgs.dash}/bin/dash:/bin/sh" ]; } // lib.optionalAttrs (config.confinement.mode == "full-apivfs") { MountAPIVFS = true; PrivateDevices = true; @@ -108,7 +111,7 @@ in { execPkgs = lib.concatMap (opt: let isSet = config.serviceConfig ? ${opt}; in lib.optional isSet config.serviceConfig.${opt}) execOpts; - in execPkgs ++ lib.optional config.confinement.withBinSh pkgs.dash; + in execPkgs ++ lib.optional (binSh != null) binSh; }; })); }; @@ -146,6 +149,14 @@ in { echo '[Service]' > "$serviceFile" + # /bin/sh is special here, because the option value could contain a + # symlink and we need to properly resolve it. + ${lib.optionalString (cfg.confinement.binSh != null) '' + binsh=${lib.escapeShellArg cfg.confinement.binSh} + realprog="$(readlink -e "$binsh")" + echo "BindReadOnlyPaths=$realprog:/bin/sh" >> "$serviceFile" + ''} + while read storePath; do if [ -L "$storePath" ]; then # Currently, systemd can't cope with symlinks in Bind(ReadOnly)Paths, diff --git a/nixos/tests/systemd-confinement.nix b/nixos/tests/systemd-confinement.nix index 448d34ec30bb..63cb074d7ca1 100644 --- a/nixos/tests/systemd-confinement.nix +++ b/nixos/tests/systemd-confinement.nix @@ -106,6 +106,32 @@ import ./make-test.nix { $machine->succeed('test ! -e /tmp/canary'); ''; } + { description = "check if /bin/sh works"; + testScript = '' + $machine->succeed( + 'chroot-exec test -e /bin/sh', + 'test "$(chroot-exec \'/bin/sh -c "echo bar"\')" = bar', + ); + ''; + } + { description = "check if suppressing /bin/sh works"; + config.confinement.binSh = null; + testScript = '' + $machine->succeed( + 'chroot-exec test ! -e /bin/sh', + 'test "$(chroot-exec \'/bin/sh -c "echo foo"\')" != foo', + ); + ''; + } + { description = "check if we can set /bin/sh to something different"; + config.confinement.binSh = "${pkgs.hello}/bin/hello"; + testScript = '' + $machine->succeed( + 'chroot-exec test -e /bin/sh', + 'test "$(chroot-exec /bin/sh -g foo)" = foo', + ); + ''; + } ]; options.__testSteps = lib.mkOption { From 9e9af4f9c076f382bc40821551beaeb68ca071cd Mon Sep 17 00:00:00 2001 From: aszlig Date: Thu, 14 Mar 2019 19:48:20 +0100 Subject: [PATCH 4/8] nixos/confinement: Allow to include the full unit From @edolstra at [1]: BTW we probably should take the closure of the whole unit rather than just the exec commands, to handle things like Environment variables. With this commit, there is now a "fullUnit" option, which can be enabled to include the full closure of the service unit into the chroot. However, I did not enable this by default, because I do disagree here and *especially* things like environment variables or environment files shouldn't be in the closure of the chroot. For example if you have something like: { pkgs, ... }: { systemd.services.foobar = { serviceConfig.EnvironmentFile = ${pkgs.writeText "secrets" '' user=admin password=abcdefg ''; }; } We really do not want the *file* to end up in the chroot, but rather just the environment variables to be exported. Another thing is that this makes it less predictable what actually will end up in the chroot, because we have a "globalEnvironment" option that will get merged in as well, so users adding stuff to that option will also make it available in confined units. I also added a big fat warning about that in the description of the fullUnit option. [1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704 Signed-off-by: aszlig --- .../modules/security/systemd-confinement.nix | 27 ++++++++++++++++--- nixos/tests/systemd-confinement.nix | 13 +++++++++ 2 files changed, 37 insertions(+), 3 deletions(-) diff --git a/nixos/modules/security/systemd-confinement.nix b/nixos/modules/security/systemd-confinement.nix index a8367ca5eede..fc0ce020afc7 100644 --- a/nixos/modules/security/systemd-confinement.nix +++ b/nixos/modules/security/systemd-confinement.nix @@ -21,6 +21,22 @@ in { ''; }; + options.confinement.fullUnit = lib.mkOption { + type = types.bool; + default = false; + description = '' + Whether to include the full closure of the systemd unit file into the + chroot, instead of just the dependencies for the executables. + + While it may be tempting to just enable this option to + make things work quickly, please be aware that this might add paths + to the closure of the chroot that you didn't anticipate. It's better + to use to explicitly add additional store paths to the + chroot. + ''; + }; + options.confinement.packages = lib.mkOption { type = types.listOf (types.either types.str types.package); default = []; @@ -32,7 +48,9 @@ in { ${lib.concatMapStringsSep ", " mkScOption [ "ExecReload" "ExecStartPost" "ExecStartPre" "ExecStop" "ExecStopPost" - ]} and ${mkScOption "ExecStart"} options. + ]} and ${mkScOption "ExecStart"} options. If you want to have all the + dependencies of this systemd unit, you can use + . Only the latter (${mkScOption "ExecStart"}) will be used if @@ -87,7 +105,7 @@ in { config = let rootName = "${mkPathSafeName name}-chroot"; - inherit (config.confinement) binSh; + inherit (config.confinement) binSh fullUnit; in lib.mkIf config.confinement.enable { serviceConfig = { RootDirectory = pkgs.runCommand rootName {} "mkdir \"$out\""; @@ -111,7 +129,10 @@ in { execPkgs = lib.concatMap (opt: let isSet = config.serviceConfig ? ${opt}; in lib.optional isSet config.serviceConfig.${opt}) execOpts; - in execPkgs ++ lib.optional (binSh != null) binSh; + unitAttrs = toplevelConfig.systemd.units."${name}.service"; + allPkgs = lib.singleton (builtins.toJSON unitAttrs); + unitPkgs = if fullUnit then allPkgs else execPkgs; + in unitPkgs ++ lib.optional (binSh != null) binSh; }; })); }; diff --git a/nixos/tests/systemd-confinement.nix b/nixos/tests/systemd-confinement.nix index 63cb074d7ca1..b7b10fb36aac 100644 --- a/nixos/tests/systemd-confinement.nix +++ b/nixos/tests/systemd-confinement.nix @@ -132,6 +132,19 @@ import ./make-test.nix { ); ''; } + { description = "check if only Exec* dependencies are included"; + config.environment.FOOBAR = pkgs.writeText "foobar" "eek\n"; + testScript = '' + $machine->succeed('test "$(chroot-exec \'cat "$FOOBAR"\')" != eek'); + ''; + } + { description = "check if all unit dependencies are included"; + config.environment.FOOBAR = pkgs.writeText "foobar" "eek\n"; + config.confinement.fullUnit = true; + testScript = '' + $machine->succeed('test "$(chroot-exec \'cat "$FOOBAR"\')" = eek'); + ''; + } ]; options.__testSteps = lib.mkOption { From d13ad389b4a4ccaae3f3732f3735984814dbb851 Mon Sep 17 00:00:00 2001 From: aszlig Date: Fri, 15 Mar 2019 04:13:01 +0100 Subject: [PATCH 5/8] nixos/confinement: Explicitly set serviceConfig My implementation was relying on PrivateDevices, PrivateTmp, PrivateUsers and others to be false by default if chroot-only mode is used. However there is an ongoing effort[1] to change these defaults, which then will actually increase the attack surface in chroot-only mode, because it is expected that there is no /dev, /sys or /proc. If for example PrivateDevices is enabled by default, there suddenly will be a mounted /dev in the chroot and we wouldn't detect it. Fortunately, our tests cover that, but I'm preparing for this anyway so that we have a smoother transition without the need to fix our implementation again. Thanks to @Infinisil for the heads-up. [1]: https://github.com/NixOS/nixpkgs/issues/14645 Signed-off-by: aszlig --- .../modules/security/systemd-confinement.nix | 28 +++++++++++++------ 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/nixos/modules/security/systemd-confinement.nix b/nixos/modules/security/systemd-confinement.nix index fc0ce020afc7..49fde2dcc6d5 100644 --- a/nixos/modules/security/systemd-confinement.nix +++ b/nixos/modules/security/systemd-confinement.nix @@ -106,19 +106,31 @@ in { config = let rootName = "${mkPathSafeName name}-chroot"; inherit (config.confinement) binSh fullUnit; + wantsAPIVFS = lib.mkDefault (config.confinement.mode == "full-apivfs"); in lib.mkIf config.confinement.enable { serviceConfig = { RootDirectory = pkgs.runCommand rootName {} "mkdir \"$out\""; TemporaryFileSystem = "/"; MountFlags = lib.mkDefault "private"; - } // lib.optionalAttrs (config.confinement.mode == "full-apivfs") { - MountAPIVFS = true; - PrivateDevices = true; - PrivateTmp = true; - PrivateUsers = true; - ProtectControlGroups = true; - ProtectKernelModules = true; - ProtectKernelTunables = true; + + # https://github.com/NixOS/nixpkgs/issues/14645 is a future attempt + # to change some of these to default to true. + # + # If we run in chroot-only mode, having something like PrivateDevices + # set to true by default will mount /dev within the chroot, whereas + # with "chroot-only" it's expected that there are no /dev, /proc and + # /sys file systems available. + # + # However, if this suddenly becomes true, the attack surface will + # increase, so let's explicitly set these options to true/false + # depending on the mode. + MountAPIVFS = wantsAPIVFS; + PrivateDevices = wantsAPIVFS; + PrivateTmp = wantsAPIVFS; + PrivateUsers = wantsAPIVFS; + ProtectControlGroups = wantsAPIVFS; + ProtectKernelModules = wantsAPIVFS; + ProtectKernelTunables = wantsAPIVFS; }; confinement.packages = let startOnly = config.serviceConfig.RootDirectoryStartOnly or false; From 861a1cec60e202a2a2d17fd61bbfae0264168115 Mon Sep 17 00:00:00 2001 From: aszlig Date: Wed, 27 Mar 2019 20:16:44 +0100 Subject: [PATCH 6/8] nixos/confinement: Remove handling for StartOnly Noted by @Infinisil on IRC: infinisil: Question regarding the confinement PR infinisil: On line 136 you do different things depending on RootDirectoryStartOnly infinisil: But on line 157 you have an assertion that disallows that option being true infinisil: Is there a reason behind this or am I missing something I originally left this in so that once systemd supports that, we can just flip a switch and remove the assertion and thus support RootDirectoryStartOnly for our confinement module. However, this doesn't seem to be on the roadmap for systemd in the foreseeable future, so I'll just remove this, especially because it's very easy to add it again, once it is supported. Signed-off-by: aszlig --- nixos/modules/security/systemd-confinement.nix | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/nixos/modules/security/systemd-confinement.nix b/nixos/modules/security/systemd-confinement.nix index 49fde2dcc6d5..31b07b1b03d2 100644 --- a/nixos/modules/security/systemd-confinement.nix +++ b/nixos/modules/security/systemd-confinement.nix @@ -52,11 +52,7 @@ in { dependencies of this systemd unit, you can use . - Only the latter - (${mkScOption "ExecStart"}) will be used if - ${mkScOption "RootDirectoryStartOnly"} is enabled. - - Also, the store paths listed in are + The store paths listed in are not included in the closure as well as paths from other options except those listed above. @@ -133,8 +129,7 @@ in { ProtectKernelTunables = wantsAPIVFS; }; confinement.packages = let - startOnly = config.serviceConfig.RootDirectoryStartOnly or false; - execOpts = if startOnly then [ "ExecStart" ] else [ + execOpts = [ "ExecReload" "ExecStart" "ExecStartPost" "ExecStartPre" "ExecStop" "ExecStopPost" ]; From 52299bccf5a56f6af8a204a71c908c7b7623facb Mon Sep 17 00:00:00 2001 From: aszlig Date: Wed, 27 Mar 2019 20:27:02 +0100 Subject: [PATCH 7/8] nixos/confinement: Use PrivateMounts option So far we had MountFlags = "private", but as @Infinisil has correctly noticed, there is a dedicated PrivateMounts option, which does exactly that and is better integrated than providing raw mount flags. When checking for the reason why I used MountFlags instead of PrivateMounts, I found that at the time I wrote the initial version of this module (Mar 12 06:15:58 2018 +0100) the PrivateMounts option didn't exist yet and has been added to systemd in Jun 13 08:20:18 2018 +0200. Signed-off-by: aszlig --- nixos/modules/security/systemd-confinement.nix | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nixos/modules/security/systemd-confinement.nix b/nixos/modules/security/systemd-confinement.nix index 31b07b1b03d2..cd4eb81dbe19 100644 --- a/nixos/modules/security/systemd-confinement.nix +++ b/nixos/modules/security/systemd-confinement.nix @@ -107,7 +107,7 @@ in { serviceConfig = { RootDirectory = pkgs.runCommand rootName {} "mkdir \"$out\""; TemporaryFileSystem = "/"; - MountFlags = lib.mkDefault "private"; + PrivateMounts = lib.mkDefault true; # https://github.com/NixOS/nixpkgs/issues/14645 is a future attempt # to change some of these to default to true. From ada3239253444a6ac546f1e734988f117b3a289d Mon Sep 17 00:00:00 2001 From: aszlig Date: Wed, 27 Mar 2019 21:07:07 +0100 Subject: [PATCH 8/8] nixos/release-notes: Add entry about confinement First of all, the reason I added this to the "highlights" section is that we want users to be aware of these options, because in the end we really want to decrease the attack surface of NixOS services and this is a step towards improving that situation. The reason why I'm adding this to the changelog of the NixOS 19.03 release instead of 19.09 is that it makes backporting services that use these options easier. Doing the backport of the confinement module after the official release would mean that it's not part of the release announcement and potentially could fall under the radar of most users. These options and the whole module also do not change anything in existing services or affect other modules, so they're purely optional. Adding this "last minute" to the 19.03 release doesn't hurt and is probably a good preparation for the next months where we hopefully confine as much services as we can :-) I also have asked @samueldr and @lheckemann, whether they're okay with the inclusion in 19.03. While so far only @samueldr has accepted the change, we can still move the changelog entry to the NixOS 19.09 release notes in case @lheckemann rejects it. Signed-off-by: aszlig --- nixos/doc/manual/release-notes/rl-1903.xml | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/nixos/doc/manual/release-notes/rl-1903.xml b/nixos/doc/manual/release-notes/rl-1903.xml index a82724d7fb57..dddf4ed6016a 100644 --- a/nixos/doc/manual/release-notes/rl-1903.xml +++ b/nixos/doc/manual/release-notes/rl-1903.xml @@ -64,6 +64,17 @@ See: for details. + + + There is now a set of options for + , which allows to restrict services + into a + chroot + 2 + ed environment that only contains the store paths from + the runtime closure of the service. + +