Skip to content

arachsys/ucontain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ucontain
========

These bash scripts are a simplified reimplementation of my 2013 containers
package. They wrap unshare and nsenter from util-linux, newuidmap and
newgidmap from shadow-utils, pivot and seal from Arachsys init. Despite
their brevity, they deliver the key features with the same security story.

My aim is to demonstrate how to use namespaces and ad hoc containers in your
own scripts, so the notes below focus on implementation rather than usage.


contain
-------

This first re-execs itself in fresh namespaces with unshare, passing the
--map-auto and --map-root-user flags to set user namespace mappings using
setuid newuidmap/newgidmap from shadow-utils. It detects this is done when
its PID is 1 and UID is 0.

After making the required bind mounts, we must pivot into the new root
filesystem and detach the old one before running the container init. This
needs care as the umount binary can't be trusted after a pivot_root() call.

Even if umount from the old root is run by absolute path, the untrusted
dynamic linker from the new root would be used, and if old ld.so is invoked
directly on old umount, it must somehow be configured to search only paths
on the old root. Both ld.so and umount must ignore any configuration from
new /etc. This would be dangerously fragile, depending on implementation
details of ld.so, libc and umount.

An alternative is to fork before the unsharing the mount namespace. The
outer process can synchronise with the inner process using pipes or
signals, and use umount -N to detach the old root on behalf of the inner
process after it pivots to the new root. This is complicated and a little
ugly, but would work fine provided the inner process is not allowed to
proceed until the old root is safely detached.

However, unlike pivot_root from util-linux, pivot from Arachsys init can
safely detach the old root itself before returning, which sidesteps any
difficulty. We use this less convoluted option here.

This version of contain doesn't emulate a tty on /dev/console. Instead of
the special behaviour of the original when run privileged, it maps explicit
entries for root in /etc/subuid and /etc/subgid if they exist, and leaves
the invoking root uid and gid unmapped. Helper scripts are not directly
supported, but extra commands are easy to add to the contain script itself.


inject
------

A straightforward nsenter is nearly sufficient here, apart from the magic
treatment of /proc/PID/exe which would introduce a security hole. Although
stat() sees this as a symlink to the absolute path of the binary, open()
accesses the binary itself whether or not the symlink can be resolved in the
filesystem namespace of the opening process.

Inside a container, this leaks a path to any host binary that joins that
container's namespaces. CVE-2019-5736 describes an attack on runC
'privileged containers' using this route: as the host root user is
mapped within the container, a container process can open the relevant
/proc/PID/exe file for writing and compromise it.

The standard fix is for binaries to clone themselves to a sealed memfd and
re-exec from that before joining untrusted containers. We use seal from
Arachsys init to do this on behalf of nsenter.


isolate
-------

To run a command without network access, use an isolated network namespace.
If we are unprivileged, we must also create a pass-through user namespace
to allow the network namespace to be unshared.


pseudo
------

This is recreated by unshare --user. We pass --map-auto and --map-root-user
to set user namespace mappings using newuidmap/newgidmap, as with contain.

This version of pseudo doesn't implement the special behaviour of the
original when run privileged. Instead, it maps explicit entries for root
in /etc/subuid and /etc/subgid if they exist, and leaves the invoking root
uid and gid unmapped.


Copying
-------

These scripts were written by Chris Webb <chris@arachsys.com> and are
distributed as Free Software under the terms of the MIT license in COPYING.