Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic UID allocation #3600

Open
wants to merge 9 commits into
base: master
from
Open

Automatic UID allocation #3600

wants to merge 9 commits into from

Conversation

@edolstra
Copy link
Member

edolstra commented May 20, 2020

This adds an option auto-uid-allocation which provides an alternative to having a nixbld group of pre-created build users. When enabled, Nix allocates UIDs/GIDs in the range 872415232+.

It also adds an system feature uid-range that causes a build to be executed as root in a UID namespace with 65,536 UIDs available, and a system feature systemd-cgroup that causes a build to be executed in a cgroup namespace where the systemd cgroup hierarchy is available. This allows things like systemd-nspawn and NixOS containers to run inside a Nix build.

Required nix.conf configuration:

experimental-features = auto-allocate-uids systemd-cgroup
auto-allocate-uids = true
system-features = uid-range systemd-cgroup
@domenkozar
Copy link
Member

domenkozar commented May 20, 2020

A few questions/comments:

  1. Does it work on macOS?
  2. Why does it need to be experimental feature?
  3. Missing documentation for the nix.conf option
@edolstra
Copy link
Member Author

edolstra commented May 20, 2020

No, this will never work on macOS since it requires cgroups.

It's an experimental feature so we can remove it if it turns out not to be a good idea, or change the interface (e.g. the names for the requiredSystemFeatures).

@7c6f434c
Copy link
Member

7c6f434c commented May 20, 2020

This needs the systemd cgroup to already exist on host, and moreover this needs the daemon to be inside this cgroup, right? (Unsure whether it could just create the necessary structure for the sandbox even if it is missing on the host)

@edolstra
Copy link
Member Author

edolstra commented May 20, 2020

Yes, it needs the systemd hierarchy to be mounted on /sys/fs/cgroup/systemd. We could create it if it doesn't exist (it's just mount -t cgroup -o none,name=systemd none /sys/fs/cgroup/systemd). I don't think it's possible for a process not to be in a cgroup hierarchy. (If you create a named hierarchy it will contain all processes in the system in the root cgroup.)

@7c6f434c
Copy link
Member

7c6f434c commented May 20, 2020

I think I have heard of people trying to run systemd on NixOS using only cgroup2; and also, of course, I wonder what will be eventually needed to run lighter-weight NixOS tests on non-NixOS Linux. I guess creating a none,name=systemd cgroup hierarchy could be just mentioned in the error message. Is there a simple test I should run on my non-systemd Nixpkgs-kernel system to see if systemd-nspawn indeed works in such setups?

@edolstra
Copy link
Member Author

edolstra commented May 20, 2020

I initially used the unified hierarchy instead of the systemd hierarchy. The unified hierarchy works but it creates an impurity: it depends on the host system configuration what controllers are available in the unified hierarchy. So it seems better to use the systemd hierarchy since it only allows process tracking and not resource control.

If you want to test it on a non-NixOS system, mount the systemd hierarchy and try to build this expression: https://gist.github.com/edolstra/5cb2ec5c79ac8faf208058fd9375b448

@7c6f434c
Copy link
Member

7c6f434c commented May 20, 2020

auto-allocate-uids needs both to be enabled and added to experimental features; there is a message saying auto-allocate-uids = true is needed, but a more generic message is actually printed.

Once things are configured: I do get «Container nixos exited successfully.», and the target path is built with reasonable output (and there is a running systemd inside the build but no apparent attempts at escaping happen).

Logs: firewall failing seems expected, system-getty.slice failing due to it is slightly surprising, the following might mean I am doing something wrong on the host system or maybe you also get these:

Failed to create symlink /sys/fs/cgroup/cpuacct: Read-only file system
Failed to create symlink /sys/fs/cgroup/cpu: Read-only file system
Failed to create symlink /sys/fs/cgroup/net_prio: Read-only file system
Failed to create symlink /sys/fs/cgroup/net_cls: Read-only file system

Overall nix-store -r time is <3s wall-clock time, which is just great, thanks for that feature.

@7c6f434c
Copy link
Member

7c6f434c commented May 20, 2020

Looked at the code and got an idea to test… auto-allocate-uids requires systemd cgroup hierarchy to exist even if systemd-cgroup feature is not enabled.

@edolstra
Copy link
Member Author

edolstra commented May 20, 2020

auto-allocate-uids requires systemd cgroup hierarchy to exist even if systemd-cgroup feature is not enabled.

Yeah that's currently true, didn't think about that. In principle however it could use any existing hierarchy since it's only used for tracking processes.

@7c6f434c
Copy link
Member

7c6f434c commented May 20, 2020

… or even mount «nix» hierarchy if systemd one is not mounted, I guess…

I am OK with mounting systemd hierarchy on boot, I guess a slightly more detailed message is enough. I just have no cgroup hierarchies mounted by default so it was very cheap for me to check this combination of options.

@edolstra
Copy link
Member Author

edolstra commented May 20, 2020

firewall failing seems expected

Actually that succeeds for me. Which makes me realize a big problem with this approach: whether certain networking features (firewall, NAT, ...) work depend on what kernel modules are loaded in the host system. I don't think there is any way to restrict access...

@7c6f434c
Copy link
Member

7c6f434c commented May 20, 2020

Erm. Does that require structured required features (and, in this instance, assertions about host kernel modules) for a proper-ish solution? I guess whoever cares could write a very minimal sinit VM to demonstrate whether some dependency is not listed…

@edolstra
Copy link
Member Author

edolstra commented May 20, 2020

Or maybe a seccomp filter could be used to restrict access to undeclared features.

@7c6f434c
Copy link
Member

7c6f434c commented May 20, 2020

Seccomp filter sounds like something that requires ahead-of-time enumeration of all possible things that could go wrong

@edolstra
Copy link
Member Author

edolstra commented May 20, 2020

I just realized that the situation isn't that bad. Or rather, it was already bad and this doesn't make it worse. It was already possible to create network devices, firewall tables etc. depending on the host kernel configuration. For example:

with import <nixpkgs> {};

runCommand "foo"
  {
    buildInputs = [ pkgs.utillinux pkgs.iproute pkgs.iptables ];
  }
  ''
    unshare -m -n -U -r -- bash -c "
      set -e
      mkdir -p foo/run foo/nix/store
      mount --rbind /nix/store foo/nix/store
      ip link add foo-h type veth peer name foo-c
      chroot foo iptables -t nat -A POSTROUTING -p tcp -s 192.168.1.2
      chroot foo iptables -t nat -L
    "
    mkdir $out
  ''

This will even cause kernel modules like veth and iptable_nat to be loaded if they're not already.

@7c6f434c
Copy link
Member

7c6f434c commented May 20, 2020

Ah, it might be that the host dependency got exposed by the fact that I did not enable module autoloading. I guess this impurity was «mitigated» by there being little incentive to do such things in public expressions.

@edolstra edolstra force-pushed the auto-uid-allocation branch from 90b4689 to e263fd4 Jun 18, 2020
edolstra added 9 commits Oct 31, 2017
Rather than rely on a nixbld group, we now allocate UIDs/GIDs
dynamically starting at a configurable ID (872415232 by default).

Also, we allocate 2^18 UIDs and GIDs per build, and run the build as
root in its UID namespace. (This should not be the default since it
breaks some builds. We probably should enable this conditional on a
requiredSystemFeature.) The goal is to be able to run (NixOS)
containers in a build. However, this will also require some cgroup
initialisation.

The 2^18 UIDs/GIDs is intended to provide enough ID space to run
multiple containers per build, e.g. for distributed NixOS tests.
Also, run builds in a cgroup namespace (ensuring /proc/self/cgroup
doesn't leak information about the outside world) and mount /sys. This
enables running systemd-nspawn and thus NixOS containers in a Nix
build.
2^18 was overkill. The idea was to enable multiple containers to run
inside a build. However, those containers can use the same UID range -
we don't really care about perfect isolation between containers inside
a build.
"uid-range" provides 65536 UIDs to a build and runs the build as root
in its user namespace. "systemd-cgroup" allows the build to mount the
systemd cgroup controller (needed for running systemd-nspawn and NixOS
containers).

Also, add a configuration option "auto-allocate-uids" which is needed
to enable these features, and some experimental feature gates.

So to enable support for containers you need the following in
nix.conf:

  experimental-features = auto-allocate-uids systemd-cgroup
  auto-allocate-uids = true
  system-features = uid-range systemd-cgroup
Maybe this should be a separate system feature... /sys exposes a lot
of impure info about the host system.
@edolstra edolstra force-pushed the auto-uid-allocation branch from e263fd4 to 7349f25 Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.