Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use seccomp to forge return values of *chown() syscalls #1131

Merged
merged 6 commits into from Dec 15, 2016

Conversation

aszlig
Copy link
Member

@aszlig aszlig commented Nov 16, 2016

This should properly address ff0c0b6 so that we forge the return code of the chown() syscalls to be 0 while not actually running them. That way the builder can still run as (user-namespaced) root, which has the advantage that it allows for certain syscalls within that namespace we otherwise wouldn't have access to (like chroot).

This implementation is using libseccomp for generating the filter rules in an architecture-independent way.

My original proof of concept implementation can be found here: https://gist.github.com/aszlig/69c3d7cf92a2c05ee1230857c2734848

Original discussion: ff0c0b6#commitcomment-19619475

Cc: @edolstra, @copumpkin

aszlig referenced this pull request Nov 16, 2016
This largely reverts c68e591. Running
builds as root breaks "cp -p", since when running as root, "cp -p"
assumes that it can succesfully chown() files. But that's not actually
the case since the user namespace doesn't provide a complete uid
mapping. So it barfs with a fatal error message ("cp: failed to
preserve ownership for 'foo': Invalid argument").
@aszlig
Copy link
Member Author

aszlig commented Nov 16, 2016

Don't merge this yet! Need to fix POSIX ACL handling and fixup the revert, so I'll rebase and write another comment once finished.

This reverts commit ff0c0b6.

We're going to use seccomp to allow "cp -p" and force chown-related
syscalls to always return 0.

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
We're going to use libseccomp instead of creating the raw BPF program,
because we have different syscall numbers on different architectures.

Although our initial seccomp rules will be quite small it really doesn't
make sense to generate the raw BPF program because we need to duplicate
it and/or make branches on every single architecture we want to suuport.

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
What we basically want is a seccomp mode 2 BPF program like this but for
every architecture:

  BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),
  BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_chown, 4, 0),
  BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_fchown, 3, 0),
  BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_fchownat, 2, 0),
  BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_lchown, 1, 0),
  BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
  BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ERRNO)

However, on 32 bit architectures we do have chown32, lchown32 and
fchown32, so we'd need to add all the architecture blurb which
libseccomp handles for us.

So we only need to make sure that we add the 32bit seccomp arch while
we're on x86_64 and otherwise we just stay at the native architecture
which was set during seccomp_init(), which more or less replicates
setting 32bit personality during runChild().

The FORCE_SUCCESS() macro here could be a bit less ugly but I think
repeating the seccomp_rule_add() all over the place is way uglier.

Another way would have been to create a vector of syscalls to iterate
over, but that would make error messages uglier because we can either
only print the (libseccomp-internal) syscall number or use
seccomp_syscall_resolve_num_arch() to get the name or even make the
vector a pair number/name, essentially duplicating everything again.

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Right now it only tests whether seccomp correctly forges the return
value of chown, but the long-term goal is to test the full sandboxing
functionality at some point in the future.

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Commands such as "cp -p" also use fsetxattr() in addition to fchown(),
so we need to make sure these syscalls always return successful as well
in order to avoid nasty "Invalid value" errors.

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
These syscalls are only available in 32bit architectures, but libseccomp
should handle them correctly even if we're on native architectures that
do not have these syscalls.

Signed-off-by: aszlig <aszlig@redmoonstudios.org>
@aszlig
Copy link
Member Author

aszlig commented Nov 16, 2016

Okay, done... also running in production on https://headcounter.org/hydra/ :-)

@domenkozar
Copy link
Member

Just for the record, did anyone take a look at https://github.com/projectatomic/bubblewrap?

@vcunat
Copy link
Member

vcunat commented Dec 13, 2016

Oh, some changes around this on Hydra might be why these EPERM errors started: http://lists.science.uu.nl/pipermail/nix-dev/2016-December/022311.html

@edolstra
Copy link
Member

I'm still wondering whether running as uid==0 in the user namespace is a good idea. For example, Hydra's make check now fails with

initdb: cannot be run as root

Of course we can work around this particular issue, but the general issue is that packages reasonably assume that uid 0 has capabilities (and risks) that the user namespace doesn't actually provide. So a non-zero uid might be better...

@copumpkin
Copy link
Member

copumpkin commented Dec 15, 2016 via email

@aszlig
Copy link
Member Author

aszlig commented Dec 16, 2016

Okay, I guess it's a good idea to revert e883871 again and possibly also this PR, because seccomp does add a bit of overhead.

@aszlig
Copy link
Member Author

aszlig commented Dec 16, 2016

Giving this a second thought, it might even make sense to allow builders to enter a user namespace as uid 0 on a per-derivation basis. That way we could use things like chroot within a builder environment without disrupting everything else.

@aszlig
Copy link
Member Author

aszlig commented Dec 16, 2016

So for now I'd just revert this very PR (which includes e883871), because for the tests we already have NixOS/nixpkgs#20500 and maybe a few very specific packages that can't cope with user namespaces.

@vcunat
Copy link
Member

vcunat commented Dec 18, 2016

Mass rebuilds keep discovering more failures – it's coreutils now:

checking whether mknod can create fifo without root privileges... configure: error: in `/tmp/nix-build-coreutils-8.26.drv-0/coreutils-8.26':
configure: error: you should not run configure as root (set FORCE_UNSAFE_CONFIGURE=1 in environment to bypass this check)

http://hydra.nixos.org/build/45043302

The check can be bypassed, but running the builds as uid 0 doesn't seem to turn out as a good default anyway. (Note that this is a gnulib-generated test, so it will probably be present in other packages as well.)

@edolstra
Copy link
Member

Okay, I've reverted this. (3a4bd32)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants