New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setuid root does not work in sandbox #2522
Comments
Yes, that sounds likely. From the user_namespaces manpage:
So for this to work, there would have to be a uid mapping to the root user in the parent namespace. |
This would only be set when |
Curious what the use case is here, can you elaborate? It feels slightly wrong to be doing setuid things inside a build sandbox but I'm probably missing something. |
I should write a FAQ on this :-) We have a Hydra cluster running CI benchmarks on some open source telecom network stack called Snabb. The Nix job builds a benchmark report with Rmarkdown and that involves running many (thousands) benchmark jobs. The benchmark jobs need root privileges due to the nature of the software e.g. running userspace 10G/100G ethernet device drivers, resolving physical memory, performing DMA, etc. This cannot be wrapped in virtualization due to the performance-sensitive nature e.g. enabling the IOMMU could impact performance. Nix has been working amazingly for us based on the setup that @domenkozar cooked up. He has even had it running whole OpenStack clusters within the test environment. Have to do a NixCon talk about it one day. Meanwhile: Got to port the %!#@ thing to Nix 2.0 ;) |
Nix is also used to test pull requests including checking for breakage in our bespoke device drivers etc. So being able to run applications with superuser privileges inside Nix is really crucial to our testing strategy. |
Yes, having the mapping in general shouldn't hurt but it's best to be on the safe side. |
If Nix is run with the --allow-new-privileges option then it should be possible to run setuid binaries to gain root access. However, this did not work in practice because the root user/group did not exist in the sandbox due to exclusion from the kernel user-namespace. The uid would always be nixbld (1000) even when executing a setuid-root binary. This change adds uid/gid 0 to the sandbox userns only when --allow-new-privileges is enabled and makes setuid work as expected. Note: This makes it effective to *run* setuid executables but it does not make it possible to create them. This means that in practice to gain root in the sandbox you must both provide --allow-new-privileges and also add a suitable binary to the Nix sandbox path. Resolves NixOS#2522.
Related problem: Once I acquire uid/gid 0 inside the sandbox I still can't do other things like mount a hugetlbfs filesystem. Is this another kernel namespace causing hassle? (Can I somehow be rid of these restrictions once and for all somehow or is it better to take them one at a time?) |
Specifically the
|
This is a limitation the kernel puts in place. In a usernamespace processes even if root are only allowed to mount filesystems that have been white-listed. For that reason for example Ubuntu had a patch that allowed overlayfs. Bind mounts should be fine though. |
@Mic92 Can I relax this restriction somehow? How does white-listing work? I am running a program that expects to be able to mount filesystems within the sandbox. These are not mounted on the host and I don't want to mount them there either. |
It is set at compile time for example procfs set the FS_USERNS_MOUNT flag: |
The easiest way would be to ask nix to mount those from somewhere into the sandbox. |
Also though in the nix case this is still not ideal because in particular hugetablefs would be shared across builds. |
Tricky. On the one hand I don't really mind sharing the directory between sandboxes. On the other hand I want the application to have superuser privileges and not have to make special-case workarounds for each and every thing that it wants to do. Just now I lean towards patching the kernel to whitelist |
On further reflection it's probably not safe to share one So it might have to be the kernel patching route. |
You have to be aware that this flag is only set if it is safe to do so. |
For what definition of "safe" though? I am not concerned about security on the local machine here e.g. privilege escalation within the sandbox. I am concerned about machine crashed though :). If I bind-mount a shared directory from the host then I believe I'll also need to either prevent sandboxes from using the same PIDs (I'm assuming that they do now) or else patch the application I'm working with to protect against collisions with other processes having the same pid. This seems messy compared with a one-line kernel patch if that would work. |
... Could also be that updating the application to support running multiple instances inside userns namespaces with a shared hugetlbfs folder [pause for breath] is a legitimate enhancement that can be useful in other deployment contexts in the future. Today I can bypass userns if I really want to but maybe tomorrow I can't because a user wants to deploy in a more restricted environment. Have to ponder... |
By |
Thank you very much for the information. I'll consider my options until the next time I'm back at the keys. |
It might be even possible to compile a |
Reflection: This all seems to "just work" with |
Docker does not have usernamespaces enabled by default. |
Could be that what I need to do is disable usernamespaces? Have to take a step back and think about the goals/requirements for benchmarking system software with Nix where running on full bare metal takes priority over local privilege management on the build host. Otherwise it is just whack-a-mole. |
Well, you would need to rewrite some of nix's sandbox logic for that at the moment it depends on usernamespaces to map the builder's uid/gid on the host to uid 1000/gid 1000 |
This may be the right solution. The main downside I see is that untracked dependencies ("impurities") can creep into the builds. However, compared with fighting kernel features like userns that starts to seem pretty minor. Some other pros/cons that I overlook? |
Yeah, disabling user namespaces sounds like the way to go. It could be set per derivation using
Alternatively, you could just disable sandboxing, since you might not need it for benchmarking anyway? |
Can I have the chroot of the sandbox only? Is that compatible with the overall approach that Nix takes? |
You can always just run your daemon in a chroot that only sees
/nix/store (and /bin/sh and /usr/bin/env). Enumerating the store is
slow enough to notice in the benchmarking (and hard to write by
accident), and if the build has a path reference directly, it would be
in the sandbox anyway.
|
Not sure how your setup works, but you could also disable the sandbox just for the benchmark itself but not the build. Well you could even build a docker container with nix that contains the benchmark to have some test isolation: https://nixos.org/nixpkgs/manual/#sec-pkgs-dockerTools or a nixos-container based one: https://nixos.org/nixos/manual/index.html#ch-containers |
This is much trickier than I had anticipated :-) thanks for all the feedback everybody! Suppose we follow the route of running the build without a Nix sandbox but then having the build/test script construct a sandbox of its own. For example to create a chroot that shadows Is that a viable approach and what some of these suggestions are hinting at? This way we still have a modest level of test isolation, we avoid advanced kernel features like usernamespaces, and we avoid depending on the intricacies of |
@edolstra How would that be implemented? Just conditionalize some parts of |
Yes, basically make passing |
@edolstra I made a naieve attempt at conditionalizing the userns but nothing works with this attempt:
Maybe I disabled the usernamespace but other parts of |
I marked this as stale due to inactivity. → More info |
I closed this issue due to inactivity. → More info |
Running a setuid-root binary inside a Nix sandbox does not actually set the uid to 0. For example
whoami
will reportnixbld
andsecurity.wappers
programs will crash with an assertion failure when detecting that the effective uid does not match the file uid.Can reproduce either by calling
/run/wrappers/bin/sudo
in the sandbox (have to make it visible) or by running/run/as/root $(which id)
with my little asroot NixOS module.Seems to me like the
allow-new-privileges
option was added to make this work (#1429) but that this is not working with Nix from NixOS 18.09.Notes:
nosetuid
flag.allow-new-privileges = true
.Wild guess: Could this be due to userns sandboxing not providing a usable root user?
The text was updated successfully, but these errors were encountered: