Use seccomp to forge return values of *chown() syscalls #1131

aszlig · 2016-11-16T11:58:25Z

This should properly address ff0c0b6 so that we forge the return code of the chown() syscalls to be 0 while not actually running them. That way the builder can still run as (user-namespaced) root, which has the advantage that it allows for certain syscalls within that namespace we otherwise wouldn't have access to (like chroot).

This implementation is using libseccomp for generating the filter rules in an architecture-independent way.

My original proof of concept implementation can be found here: https://gist.github.com/aszlig/69c3d7cf92a2c05ee1230857c2734848

Original discussion: ff0c0b6#commitcomment-19619475

Cc: @edolstra, @copumpkin

This largely reverts c68e591. Running builds as root breaks "cp -p", since when running as root, "cp -p" assumes that it can succesfully chown() files. But that's not actually the case since the user namespace doesn't provide a complete uid mapping. So it barfs with a fatal error message ("cp: failed to preserve ownership for 'foo': Invalid argument").

aszlig · 2016-11-16T15:37:18Z

Don't merge this yet! Need to fix POSIX ACL handling and fixup the revert, so I'll rebase and write another comment once finished.

This reverts commit ff0c0b6. We're going to use seccomp to allow "cp -p" and force chown-related syscalls to always return 0. Signed-off-by: aszlig <aszlig@redmoonstudios.org>

We're going to use libseccomp instead of creating the raw BPF program, because we have different syscall numbers on different architectures. Although our initial seccomp rules will be quite small it really doesn't make sense to generate the raw BPF program because we need to duplicate it and/or make branches on every single architecture we want to suuport. Signed-off-by: aszlig <aszlig@redmoonstudios.org>

What we basically want is a seccomp mode 2 BPF program like this but for every architecture: BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_chown, 4, 0), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_fchown, 3, 0), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_fchownat, 2, 0), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_lchown, 1, 0), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ERRNO) However, on 32 bit architectures we do have chown32, lchown32 and fchown32, so we'd need to add all the architecture blurb which libseccomp handles for us. So we only need to make sure that we add the 32bit seccomp arch while we're on x86_64 and otherwise we just stay at the native architecture which was set during seccomp_init(), which more or less replicates setting 32bit personality during runChild(). The FORCE_SUCCESS() macro here could be a bit less ugly but I think repeating the seccomp_rule_add() all over the place is way uglier. Another way would have been to create a vector of syscalls to iterate over, but that would make error messages uglier because we can either only print the (libseccomp-internal) syscall number or use seccomp_syscall_resolve_num_arch() to get the name or even make the vector a pair number/name, essentially duplicating everything again. Signed-off-by: aszlig <aszlig@redmoonstudios.org>

Right now it only tests whether seccomp correctly forges the return value of chown, but the long-term goal is to test the full sandboxing functionality at some point in the future. Signed-off-by: aszlig <aszlig@redmoonstudios.org>

Commands such as "cp -p" also use fsetxattr() in addition to fchown(), so we need to make sure these syscalls always return successful as well in order to avoid nasty "Invalid value" errors. Signed-off-by: aszlig <aszlig@redmoonstudios.org>

These syscalls are only available in 32bit architectures, but libseccomp should handle them correctly even if we're on native architectures that do not have these syscalls. Signed-off-by: aszlig <aszlig@redmoonstudios.org>

aszlig · 2016-11-16T16:47:34Z

Okay, done... also running in production on https://headcounter.org/hydra/ :-)

domenkozar · 2016-12-11T11:12:17Z

Just for the record, did anyone take a look at https://github.com/projectatomic/bubblewrap?

vcunat · 2016-12-13T13:10:48Z

Oh, some changes around this on Hydra might be why these EPERM errors started: http://lists.science.uu.nl/pipermail/nix-dev/2016-December/022311.html

edolstra · 2016-12-15T15:13:27Z

I'm still wondering whether running as uid==0 in the user namespace is a good idea. For example, Hydra's make check now fails with

initdb: cannot be run as root

Of course we can work around this particular issue, but the general issue is that packages reasonably assume that uid 0 has capabilities (and risks) that the user namespace doesn't actually provide. So a non-zero uid might be better...

copumpkin · 2016-12-15T15:15:19Z

Yeah, I'd probably run as non-root, but still in a user namespace. The user namespace allows us to get other namespaces as non-root, and then we stay non-root inside the build.

…

On Thu, Dec 15, 2016 at 10:13 Eelco Dolstra ***@***.***> wrote: I'm still wondering whether running as uid==0 in the user namespace is a good idea. For example, Hydra's make check now fails with initdb: cannot be run as root Of course we can work around this particular issue, but the general issue is that packages reasonably assume that uid 0 has capabilities (and risks) that the user namespace doesn't actually provide. So a non-zero uid might be better... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1131 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAKP_l19cXLxjdipxcCoiJ7dlZJvB6Zks5rIVkYgaJpZM4KzxBt> .

aszlig · 2016-12-16T12:41:15Z

Okay, I guess it's a good idea to revert e883871 again and possibly also this PR, because seccomp does add a bit of overhead.

aszlig · 2016-12-16T12:51:47Z

Giving this a second thought, it might even make sense to allow builders to enter a user namespace as uid 0 on a per-derivation basis. That way we could use things like chroot within a builder environment without disrupting everything else.

aszlig · 2016-12-16T12:54:45Z

So for now I'd just revert this very PR (which includes e883871), because for the tests we already have NixOS/nixpkgs#20500 and maybe a few very specific packages that can't cope with user namespaces.

vcunat · 2016-12-18T08:22:30Z

Mass rebuilds keep discovering more failures – it's coreutils now:

checking whether mknod can create fifo without root privileges... configure: error: in `/tmp/nix-build-coreutils-8.26.drv-0/coreutils-8.26':
configure: error: you should not run configure as root (set FORCE_UNSAFE_CONFIGURE=1 in environment to bypass this check)

http://hydra.nixos.org/build/45043302

The check can be bypassed, but running the builds as uid 0 doesn't seem to turn out as a good default anyway. (Note that this is a gnulib-generated test, so it will probably be present in other packages as well.)

edolstra · 2016-12-19T13:21:49Z

Okay, I've reverted this. (3a4bd32)

aszlig force-pushed the seccomp branch from 6f16d00 to d75f0a1 Compare November 16, 2016 12:35

aszlig added 4 commits November 16, 2016 16:48

Run builds as root in user namespace again

e883871

This reverts commit ff0c0b6. We're going to use seccomp to allow "cp -p" and force chown-related syscalls to always return 0. Signed-off-by: aszlig <aszlig@redmoonstudios.org>

release.nix: Add a test for sandboxing

651a18d

Right now it only tests whether seccomp correctly forges the return value of chown, but the long-term goal is to test the full sandboxing functionality at some point in the future. Signed-off-by: aszlig <aszlig@redmoonstudios.org>

aszlig force-pushed the seccomp branch from d75f0a1 to 651a18d Compare November 16, 2016 16:09

aszlig added 2 commits November 16, 2016 17:29

seccomp: Forge return values for *chown32

4e1a2cd

These syscalls are only available in 32bit architectures, but libseccomp should handle them correctly even if we're on native architectures that do not have these syscalls. Signed-off-by: aszlig <aszlig@redmoonstudios.org>

aszlig mentioned this pull request Nov 17, 2016

nixos/tests: Use a patched QEMU for testing NixOS/nixpkgs#20500

Merged

vcunat mentioned this pull request Dec 14, 2016

NixOS Unstable and 16.09 Release Channel Not Updating NixOS/nixpkgs#21145

Closed

edolstra merged commit 4e1a2cd into NixOS:master Dec 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use seccomp to forge return values of *chown() syscalls #1131

Use seccomp to forge return values of *chown() syscalls #1131

aszlig commented Nov 16, 2016 •

edited

aszlig commented Nov 16, 2016

aszlig commented Nov 16, 2016

domenkozar commented Dec 11, 2016

vcunat commented Dec 13, 2016

edolstra commented Dec 15, 2016

copumpkin commented Dec 15, 2016 via email

aszlig commented Dec 16, 2016

aszlig commented Dec 16, 2016 •

edited

aszlig commented Dec 16, 2016 •

edited

vcunat commented Dec 18, 2016 •

edited

edolstra commented Dec 19, 2016

Use seccomp to forge return values of *chown() syscalls #1131

Use seccomp to forge return values of *chown() syscalls #1131

Conversation

aszlig commented Nov 16, 2016 • edited

aszlig commented Nov 16, 2016

aszlig commented Nov 16, 2016

domenkozar commented Dec 11, 2016

vcunat commented Dec 13, 2016

edolstra commented Dec 15, 2016

copumpkin commented Dec 15, 2016 via email

aszlig commented Dec 16, 2016

aszlig commented Dec 16, 2016 • edited

aszlig commented Dec 16, 2016 • edited

vcunat commented Dec 18, 2016 • edited

edolstra commented Dec 19, 2016

aszlig commented Nov 16, 2016 •

edited

aszlig commented Dec 16, 2016 •

edited

aszlig commented Dec 16, 2016 •

edited

vcunat commented Dec 18, 2016 •

edited