Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "spt" (seccomp) target #310

Merged
merged 10 commits into from Jan 21, 2019

Conversation

Projects
None yet
6 participants
@mato
Copy link
Member

mato commented Jan 2, 2019

Add a new "spt" target ("sandboxed process tender") for x86_64 and
aarch64 Linux hosts.

This target is a userspace process running inside a strict seccomp
sandbox (up to 7 system calls with block and network I/O). The
"solo5-spt" tender serves essentially as a loader and the bindings
implement the Solo5 API by calling the host system call ABI directly.

Note that the "solo5-spt" tender is not modularized on purpose, as it
effectively ceases to operate once control passes to the Solo5
application.

Summary of changes:

tenders/spt: solo5-spt tender
bindings/spt: spt bindings
tests/test_seccomp: spt-specific test that the seccomp policy is being
enforced

@mato mato self-assigned this Jan 2, 2019

@mato mato requested review from ricarkol and djwillia Jan 2, 2019

@mato

This comment has been minimized.

Copy link
Member Author

mato commented Jan 2, 2019

@djwillia @ricarkol:

This is an initial WIP but fully working version, which I'm posting for review. I will be on vacation for 10 days starting tomorrow, but would like to merge soon after getting back if you're happy with it. So please review, I will be responding to comments in the mean time, though with some delay.

This design is simpler to the version you did for nabla containers, in that the tender treats the process as an "empty shell", initialises resources, loads the Solo5 ELF (which is a normal static binary loaded at 1MB as for hvt), applies seccomp rules and then transfers control to the Solo5 guest, after which it is no longer involved. In fact, with some trampoline trickery, it should be possible to unmap the tender from the guest process memory space entirely.

This version passes all tests (and will also pass on the CI once it is updated to have libseccomp available), except:

  • test_notls: due to running as a user process -- this will probably need to be fixed by adding a "solo5_arch_set_tls_base()" or similar to the Solo5 API
  • test_zeropage: fails due to what looks like an error in tests.bats

Also, crt_init_early() is not currently called due to the TLS issue, so __stack_chk_guard, while present, is not randomized.

Further, there is no OPAM/Mirage integration yet, and there are still some TODO items marked in the code which need to be looked at.

Edit: CIs now have libseccomp and are behaving as expected, with the exception of Travis. Unclear what is going on there.

@cfcs
Copy link

cfcs left a comment

If we were to have future support for similar backends using Linux ebpf (extended BPF), OpenBSD pledge, the OSX sandboxing (that I don't know much about), etc - is the plan to split out the Linux seccomp-bpf-specific parts later, or would it make sense to do that from the beginning in this patchset?

Show resolved Hide resolved tenders/spt/spt_core.c
Show resolved Hide resolved tenders/spt/spt_core.c
Show resolved Hide resolved tenders/spt/spt_core.c Outdated
Show resolved Hide resolved bindings/spt/block.c
Show resolved Hide resolved bindings/spt/block.c
Show resolved Hide resolved bindings/spt/bindings.h Outdated
Show resolved Hide resolved bindings/spt/sys_linux_aarch64.c
Show resolved Hide resolved tenders/spt/spt_module_net.c
Show resolved Hide resolved tenders/spt/spt_core.c Outdated
Show resolved Hide resolved tenders/spt/spt_core.c
Show resolved Hide resolved tenders/spt/spt_core.c
@mato

This comment has been minimized.

Copy link
Member Author

mato commented Jan 9, 2019

@mato

This comment has been minimized.

Copy link
Member Author

mato commented Jan 9, 2019

@mato

This comment has been minimized.

Copy link
Member Author

mato commented Jan 9, 2019

@ricarkol

This comment has been minimized.

Copy link
Collaborator

ricarkol commented Jan 9, 2019

Here it is:

kollerr@kollerr-ThinkPad-P50:~/research/solo5$ ldd ./tenders/spt/solo5-spt
	linux-vdso.so.1 =>  (0x00007ffca6fd8000)
	libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007f363c64a000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f363c280000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f363c88f000)

and:

kollerr@kollerr-ThinkPad-P50:~/research/unikernel.github.ibm.com$ cat /proc/`pgrep solo5-spt`/maps 
00400000-00403000 r-xp 00000000 08:02 3550490                            /home/kollerr/research/solo5/tenders/spt/solo5-spt
00602000-00603000 r-xp 00002000 08:02 3550490                            /home/kollerr/research/solo5/tenders/spt/solo5-spt
00603000-00604000 rwxp 00003000 08:02 3550490                            /home/kollerr/research/solo5/tenders/spt/solo5-spt
02104000-02125000 rwxp 00000000 00:00 0                                  [heap]
7f5b2fd3a000-7f5b2fefa000 r-xp 00000000 08:02 12980652                   /lib/x86_64-linux-gnu/libc-2.23.so
7f5b2fefa000-7f5b300fa000 ---p 001c0000 08:02 12980652                   /lib/x86_64-linux-gnu/libc-2.23.so
7f5b300fa000-7f5b300fe000 r-xp 001c0000 08:02 12980652                   /lib/x86_64-linux-gnu/libc-2.23.so
7f5b300fe000-7f5b30100000 rwxp 001c4000 08:02 12980652                   /lib/x86_64-linux-gnu/libc-2.23.so
7f5b30100000-7f5b30104000 rwxp 00000000 00:00 0 
7f5b30104000-7f5b30132000 r-xp 00000000 08:02 12980809                   /lib/x86_64-linux-gnu/libseccomp.so.2.3.1
7f5b30132000-7f5b30332000 ---p 0002e000 08:02 12980809                   /lib/x86_64-linux-gnu/libseccomp.so.2.3.1
7f5b30332000-7f5b30348000 r-xp 0002e000 08:02 12980809                   /lib/x86_64-linux-gnu/libseccomp.so.2.3.1
7f5b30348000-7f5b30349000 rwxp 00044000 08:02 12980809                   /lib/x86_64-linux-gnu/libseccomp.so.2.3.1
7f5b30349000-7f5b3036f000 r-xp 00000000 08:02 12980624                   /lib/x86_64-linux-gnu/ld-2.23.so
7f5b30549000-7f5b3054d000 rwxp 00000000 00:00 0 
7f5b3056e000-7f5b3056f000 r-xp 00025000 08:02 12980624                   /lib/x86_64-linux-gnu/ld-2.23.so
7f5b3056f000-7f5b30570000 rwxp 00026000 08:02 12980624                   /lib/x86_64-linux-gnu/ld-2.23.so
7f5b30570000-7f5b30571000 rwxp 00000000 00:00 0 
7fff1aea4000-7fff1aec5000 rwxp 00000000 00:00 0                          [stack]
7fff1aee9000-7fff1aeec000 r--p 00000000 00:00 0                          [vvar]
7fff1aeec000-7fff1aeee000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

You are right, that's the problem. 0 - 00400000 is not enough for 1GB. And just to double check, it is enough for 5MBs:

kollerr@kollerr-ThinkPad-P50:~/research/solo5$ ./tenders/spt/solo5-spt --mem=4 tests/test_hello/test_hello.spt 

            |      ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Memory map: 4 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x100000 - 0x102fff)
Solo5:     rodata @ (0x103000 - 0x103fff)
Solo5:       data @ (0x104000 - 0x105fff)
Solo5:       heap >= 0x106000 < stack < 0x400000

**** Solo5 standalone test_hello ****

Hello, World
Command line is: ''
Solo5: solo5_exit(0) called
@ricarkol

This comment has been minimized.

Copy link
Collaborator

ricarkol commented Jan 9, 2019

Not the prettiest solution, but that's why we ended up using a linker script for the tender in nabla: https://github.com/nabla-containers/solo5/blob/ukvm-linux-seccomp/ukvm/ukvm.lds#L21

That would be one of the nice things about loading the unikernel as a dynamic library: dlopen takes care of everything.

@mato

This comment has been minimized.

Copy link
Member Author

mato commented Jan 14, 2019

@cfcs and others:

Thanks for the detailed review. I have addressed the error checking for all libseccomp calls, and some other small fixes in c5a5ffa.

@ricarkol:

Fix/workaround for the address space conflict issue coming up shortly. I'll also comment in full on the TLS issue, in the mean time I've disabled test_notls for spt.

@mato

This comment has been minimized.

Copy link
Member Author

mato commented Jan 14, 2019

@ricarkol:

Regarding the address space conflicts: See 49c62a4 for a workaround, which uses -Ttext-segment=0x40000000 at link time to locate the solo5-spt at 1GB on systems without PIE support in the toolchain. This version now passes all tests on Travis, which is using Ubuntu Xenial (16.04 LTS) and does not support PIE.

In the long term the true solution to this problem, especially if we want to support spt on 32-bit architectures, is to a) require PIE support on the host and b) implement ASLR using static PIE on the Solo5 (guest) side as described in #304. Then we can mmap() guest memory without MAP_FIXED and avoid the problem altogether.

In the short term I think this is an acceptable workaround, LMK if you think otherwise.

Regarding your other proposed solutions/comments: As you can see, a linker script for spt is not necessary for this workaround, asking the linker to set the starting address of the tender's text segment is sufficient. As for using the host dlopen() and making the guest a "shared library", I'm not in favour of that for many reasons, among others that it introduces the (unsandboxed) host dynamic linker into the TCB which is undesirable.

Regarding TLS:

The issue, however, is that seccomp_load segfaults at gen_bpf_generate (i.e., some random place). So, I guess there is no other choice than allowing sys_arch_prctl to set FS/GS. Not sure if there should be a solo5_set_fs/gs though.

That's expected. You cannot touch the arch-specific TLS base (i.e. %fs on x86_64) and expect any call into libc to continue to work. There are various ways to solve this, however as I mentioned for the time being I've just disabled that test for spt. I will refactor the SSP init code in a separate commit to ensure that is done properly for spt.

As for providing an abstract solo5_set_arch_tls_base API, I think we will need one sooner or later to support language runtimes that insist on using TLS as part of their ABI. However, this is also a separate discussion.

Show resolved Hide resolved tenders/spt/spt_core.c Outdated
@ricarkol

This comment has been minimized.

Copy link
Collaborator

ricarkol commented Jan 14, 2019

As for providing an abstract solo5_set_arch_tls_base API, I think we will need one sooner or later to support language runtimes that insist on using TLS as part of their ABI. However, this is also a separate discussion.

Had this issue with golang. Hopefully we don't have to open a syscall. But in any case, that's for another PR. So, all good with me.

Regarding the "shared library" trick, my main reason is gdb. It would be great to find a way of tricking gdb into loading the symbols of the unikernel without having to do a manual symbol-file (I guess that's the only extra step). But in any case, that's for another PR. So, all good with me on this one as well.

@cfcs

This comment has been minimized.

Copy link

cfcs commented Jan 15, 2019

Should there be a solo5-bindings-spt.opam (and .pc.in etc) similar to solo5-bindings-hvt.opam?
Here's the two packages I had to install on my systems:

depexts: [
  "libseccomp-dev" {os-distribution = "debian"}
  "libseccomp-devel" {os-distribution = "fedora" | os-distribution = "centos" }
]
run ${TIMEOUT} --foreground 30s ${SPT_TENDER} --net=${NET} -- ${UNIKERNEL} limit
[ "$status" -eq 0 ]
}

@test "abort hvt" {

This comment has been minimized.

@cfcs

cfcs Jan 15, 2019

It seems like the @test "abort spt" case is missing in this file (tests/tests.bats), apart from that the test suite runs for me and crashes in _exception,_zeropage and _seccomp as expected.

This comment has been minimized.

@mato

mato Jan 15, 2019

Author Member

That's intentional. The "abort" test case is misnamed, and actually hvt-specific as it tests the functionality of the "dumpcore" module. I have it on my list to rename it and also remove the virtio invocation.

@mato

This comment has been minimized.

Copy link
Member Author

mato commented Jan 15, 2019

@cfcs:

Should there be a solo5-bindings-spt.opam (and .pc.in etc) similar to solo5-bindings-hvt.opam?

Yes, those will come in a separate commit.

mato added some commits Dec 31, 2018

"spt" (seccomp) target
Add a new "spt" target ("sandboxed process tender") for x86_64 and
aarch64 Linux hosts.

This target is a userspace process running inside a strict seccomp
sandbox (up to 7 system calls with block and network I/O). The
"solo5-spt" tender serves essentially as a loader and the bindings
implement the Solo5 API by calling the host system call ABI directly.

Note that the "solo5-spt" tender is not modularized on purpose, as it
effectively ceases to operate once control passes to the Solo5
application.

Summary of changes:

tenders/spt: solo5-spt tender
bindings/spt: spt bindings
tests/test_seccomp: spt-specific test that the seccomp policy is being
enforced
spt: Handle host toolchains without PIE support
Apply a workaround for virtual address space conflicts when solo5-spt is
built with a host toolchain that does not support PIE. In this case we
manually set the solo5-spt tender text segment to start at 1GB, which
allows for guests up to that size.

Conversely, on host systems with PIE support, add some extra checks to
ensure that ASLR has mapped the tender binary at a high address (>=
4GB).

@mato mato force-pushed the mato:wip-spt branch from 49c62a4 to 4aaad10 Jan 18, 2019

mato added some commits Jan 18, 2019

spt: Tighten seccomp policy for ppoll, block ops
For ppoll(), we never use sigmask so enforce that it is always zero.

For block operations, enforce that reads/writes beyond the end of the
device (e.g. if backed by a regular file) are not allowed.
spt: Correct types for sys_...() functions
The sys_...() functions are intentionally weakly typed as they only pass
through values to/from the system call without interpretation. All
integer values are passed as (long) and all pointer values are passed as
(void *). This matches the convention musl libc uses for x86_64 and
aarch64.

This will need to be re-considered for 32-bit archs, and we should also
consider explicitly inlining these functions.

@mato mato force-pushed the mato:wip-spt branch from 4aaad10 to e6f0684 Jan 18, 2019

@mato mato removed the request for review from djwillia Jan 18, 2019

mato added some commits Jan 18, 2019

bindings: Split up crt_init_early()
Split up crt_init_early() into crt_init_tls() and crt_init_ssp() as we
cannot call the former on spt until we have Solo5 APIs in place for TLS
manipulation, or some other solution in place tender-side.
spt: OPAM integration
Add OPAM files, .pc and Makefile targets for OPAM integration.
tenders/spt: Link with -z noexecstack
...and refuse to run if personality includes READ_IMPLIES_EXEC at run
time.
@mato

This comment has been minimized.

Copy link
Member Author

mato commented Jan 18, 2019

This is now almost ready to merge. I've addressed the remaining feedback, added OPAM integration and found and fixed an "interesting" Linux "feature" (see a87c50f).

I have preliminary branches with MirageOS changes required. Folks that would like to test, please ensure you have done an opam update and then install the following pins:

opam pin add mirage-solo5 https://github.com/mato/mirage-solo5#spt
opam pin add ocaml-freestanding.0.5.0 https://github.com/mato/ocaml-freestanding#spt
opam pin add solo5-bindings-spt.0.5.0 https://github.com/mato/solo5#wip-spt
opam pin add mirage https://github.com/mato/mirage#spt

If there are no major showstoppers, I will merge this over the weekend or latest on Monday, and (after adding some documentation) cut a Solo5 0.5.0 release.

Show resolved Hide resolved bindings/spt/block.c
Show resolved Hide resolved tenders/spt/spt_module_block.c Outdated
*/
if(size != block_size)
return SOLO5_R_EINVAL;
if(offset & (block_size - 1))

This comment has been minimized.

@cfcs

cfcs Jan 19, 2019

Why keep this check (alignment) on the C side if not in the seccomp rules?

This comment has been minimized.

@mato

mato Jan 19, 2019

Author Member

For the same reason as keeping the "invalid offset" check here (see previous comment) and additionally, the Solo5 test cases (test_blk) test for a SOLO5_R_EINVAL return.

This comment has been minimized.

@cfcs

cfcs Jan 19, 2019

I think it makes sense to want to return SOLO5_R_EINVAL instead of crashing, but I'm not a fan of having the checks duplicated like this; it complicates security review (well, and future maintenance), and having a discrepancy between what leads to an error and what is actually enforced also adds mental overhead: Now a reviewer has to reason about whether or not unaligned writes to or reads from a block device can be a security problem; they certainly lead to more code in the kernel being touched due to having to read two blocks and give you a buffer that consists of the requested overlap.

Another option would be to have two seccomp filter rule entries, so you first had the ALLOW rule that allows the syscall, and a second rule (the BPF engine will fall through to the next one when it fails to match the first rule) along the lines of:

rc = seccomp_rule_add(spt->sc_ctx,
                       /* return SOLO5_R_EINVAL to userland: */
                       SCMP_ACT_ERRNO(SOLO5_R_EINVAL),
                       SCMP_SYS(pread64),
                       1, /* arg count */
                       /* only match on operations dealing with diskfd: */
                       SCMP_A0(SCMP_CMP_EQ, diskfd)   );

This comment has been minimized.

@mato

mato Jan 21, 2019

Author Member

All good points. Unfortunately, your suggestion won't work with the current implementation as libseccomp does not guarantee the order in which rules are applied (see e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1035748).

I've thought about the alternatives but don't have a clear answer yet, so will leave this as-is for now on the basis that:

  1. The seccomp ruleset we apply should be as minimal as possible, and composed only of rules which either ALLOW something or don't match at all and thus cause the filter to return the default SCMP_ACT_KILL.
  2. I'd like to keep the semantics of the Solo5 layer as "if you pass something which is incorrect, but not obviously malicious, you should get an error return back rather than crash".
  3. Conversely, if possible I'd like to ensure that we don't accidentally "mask errors" from the system call (by turning EWHATEVER into an EINVAL).

I think this is good enough for an initial "experimental" release and will file an issue for this so that we don't forget about it and can re-visit it in the (near) future. It may well be that the above points are not the behaviour we want in the long term.

This comment has been minimized.

@mato

mato Jan 21, 2019

Author Member

One other important point to make here is that reading or writing to an unaligned offset is not a security issue. The reason that limitation exists in the Solo5 API is twofold. First, to provide a lowest common denominator that supports porting bindings to targets which cannot deal with unaligned block access, such as virtio. Second, the Solo5 block API semantics are defined as "the operation will either succeed in it's entirety, or fail". In practice this currently means limiting reads or writes to one block per operation.

Regarding your argument about the security / host kernel surface impact. While it's theoretically possible to argue that an unaligned read or write on a target such as spt might execute more kernel code than purely aligned operations, in practice that is a very weak argument since the host kernel's I/O scheduler is free to do whatever it wants with the underlying operations at the device driver layer.

This comment has been minimized.

@cfcs

cfcs Jan 21, 2019

First: I don't see any problem with this patchset as it stands now either, don't let this block you from merging. :-)

Re: libseccomp not letting you order the rules: what the fuck, good catch!

  1. Fair enough., good point against SCMP_ACT_ERRNO(), would need a unique error number for SOLO5_R_EINVAL to avoid that.

  2. Also fair, although it's pretty context-dependent what is malicious or not: Crashing on supposed malicious values can turn bugs that normally would not have any impact into DoS vectors, effectively upgrading the impact of the bug. OTOH things falling through gracefully can also be a problem, and can enable worse things than crashing, which arguably is in the low end, and easier to detect + fix.
    In this case we have duplication of the enforcement logic, and the "soft" errors hiding the "hard" errors, which makes it trickier to unit test.

  1. Succeed or fail: Oh, speaking of that, would it make sense to open the block_fd with O_DSYNC or call fsync() in platform_exit() since we don't expose a sync primitive and don't do direct I/O?
  1. (writing to an unaligned offset is not a security issue): In the ideal world, you're right. In practice you're most probably also right, but in theory the more flexibility we have, the more exposed we are, and this exposure is multiplied by the number of different backends we support. See for example CVE-2011-1750.
Show resolved Hide resolved bindings/spt/sys_linux_aarch64.c
spt: block: Check lseek() result, use 64-bit types
Check return values from lseek() when setting up block device.
Additionally, define _FILE_OFFSET_BITS=64 which is the portable way of
specifying that we want a 64-bit off_t and associated interfaces.
spt: Add basic documentation
Add some basic documentation for "spt". Re-word the general text where
appropriate.

@mato mato changed the title (DO NOT MERGE) "spt" (seccomp) target Add "spt" (seccomp) target Jan 21, 2019

@mato

This comment has been minimized.

Copy link
Member Author

mato commented Jan 21, 2019

Added some initial documentation for spt in 6fd3f90. Looking good for an initial release, so merging.

@mato mato merged commit 7e499ee into Solo5:master Jan 21, 2019

4 checks passed

aarch64-Debian9-gcc630 Surf Build Server
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
vm-amd64-FreeBSD11_1-clang40 Surf Build Server
Details
vm-x86_64-Debian9-gcc630 Surf Build Server
Details

@mato mato deleted the mato:wip-spt branch Mar 22, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.