This README describes the server component. Please refer to the top level README for an overview of all components.
The project's primary goal is to provide a simple API for managing isolated environments. These isolated environments -- or containers -- can be limited in terms of CPU usage, memory usage, disk usage, and network access. As of writing, the only supported OS is Linux.
Isolation is achieved by namespacing kernel resources that would otherwise be shared. The intended level of isolation is set such that multiple containers present on the same host should not be aware of each others presence. This means that these containers are given (among others) their own PID (Process ID) namespace, network namespace, and mount namespace.
Resource control is done by using Control Groups. Every container is placed in its own control group, where it is configured to use an equal slice of CPU compared to other containers, and the maximum amount of memory it may use.
The following sections give a brief summary of the techniques used to
implement the Linux backend for Warden. A more detailed description can
be found in the root/linux
directory of this repository.
Every container is assigned a network interface which is one side of a
virtual ethernet pair created on the host. The other side of the virtual
ethernet pair is only visible on the host (from the root namespace).
The pair is configured to use IPs in a small and static subnet. Traffic
from and to the container can be forwarded using NAT. Additionally, all
traffic can be filtered and shaped as needed, using readily available
tools such as iptables
.
Every container gets a private root filesystem. This filesystem is
created by stacking a read-only filesytem and a read-write filesystem.
This is implemented by using aufs
on Ubuntu versions from 10.04 up to
11.10, and overlayfs
on Ubuntu 12.04.
The read-only filesystem contains the minimal set of Ubuntu packages and Warden-specific modifications common to all containers. The read-write filesystem stores files overriding container-specific settings when necessary. Because all writes are applied to the read-write filesystem, containers can share the same read-only base filesystem.
The read-write filesystem is created by formatting a large sparse file. Because the size of this file is fixed, the filesystem that it contains cannot grow beyond this initial size.
The Linux Containers or LXC project has goals that are similar to those of Warden; isolation and resource control. They both use the same Linux kernel primitives to achieve their goals. In fact, early versions of Warden even used LXC.
The major difference between the two projects is that LXC is explicitly tied to Linux, where Warden backends can be implemented for any operating system that implements some way of isolating environments. It is a daemon that manages containers and can be controlled via a simple API rather than a set of tools that are individually executed.
While the Linux backend for Warden was initially implemented with LXC, the current version no longer depends on it. During development, we found that running LXC out of the box is a very opaque and static process. There is little control over when different parts of the container start process are executed, and how they relate to each other. Because Warden relies on a very small subset of the functionality that LXC offers, we decided to create a tool that only implements the functionality we need in under 1k LOC of C code. This tool executes preconfigured hooks at different stages of the container start process, such that required resources can be set up without worrying about concurrency issues. These hooks make the start process more transparent, allowing for easier debugging when parts of this process are not working as expected.
The entire lifecyle of containers is managed by Warden. The API allows users to create, configure, use, and destroy containers. Additionally, it can automatically clean up unused containers when needed.
Every container is identified by its handle, which is returned by Warden upon creating it. It is a hexadecimal representation of the IP address that is allocated for the container. Regardless of whether the backend providing the container functionality supports networking or not, an IP address will be allocated by Warden to identify a container.
When a container was created and its handle was returned to the caller, it is immediately ready for use. All resources will be allocated, the necessary processes will be started and all firewalling tables will have been updated.
If Warden is configured to clean up containers after activity, it will use the number of connections that have referenced the container as a metric to determine inactivity. If the number of connections referencing the container drops to zero, the container will automatically be destroyed after a preconfigured interval. If in the mean time the container is referenced again, this timer is cancelled.
The container can be used by running arbitrary scripts, copying files in and out, modifying firewall rules and modifying resource limits. A complete list of operations is discussed under "Interface".
When a container is destroyed -- either per user request, or
automatically after being idle -- Warden first kills all unprivileged
processes running inside the container. These processes first receive a
TERM
signal followed by a KILL
if they haven't exited after a couple
of seconds. When these processes have terminated, the root of the
container's process tree is sent a KILL
. Once all resources the
container used have been released, its files are removed and it is
considered destroyed.
Warden uses a line-based JSON protocol to communicate with its clients,
and does so over a Unix socket which is located at /tmp/warden.sock
by
default. Every command invocation is formatted as a JSON array, where
the first element is the command name and subsequent elements can be any
JSON object. The commands it responds to are as follows:
Creates a new container.
Returns the handle of the container which is used to identify it.
The optional CONFIG
parameter is a hash that specifies configuration
options used during container creation. The supported options are:
If supplied, this specifies a set of paths to be bind mounted inside the container. The value must be an array. The elements in this array specify the bind mounts to execute, and are executed in order. Every element must be of the form:
[
# Path in the host filesystem
"/host/path",
# Path in the container
"/path/in/container",
# Optional hash with options. The `mode` key specifies whether the bind
# mount should be remounted as `ro` (read-only) or `rw` (read-write).
{
"mode" => "ro|rw"
}
]
If specified, this setting overrides the default time of a container not
being referenced by any client until it is destroyed. The value can
either be the number of seconds as floating point number or integer, or
the null
value to completely disable the grace time.
If specified, this setting overrides the default size of the container's scratch filesystem. The value is expected to be an integer number.
Run the script SCRIPT
in the container identified by HANDLE
.
Returns a job identifier that can be used to reap its exit status at some point in the future. Also, the connection that issued the command may go away and reconnect later while still being able to reap the job.
The optional OPTS
parameter is a hash that specifies options modifying the
command being run. The supported options are:
If true, this specifies that the script should be run as root.
Reap the script identified by JOB_ID
, running in the container
identified by HANDLE
.
Returns a 3-element tuple containing the integer exit status, a string
containing its STDOUT
and a string containing its STDERR
. These
elements may be null
when they cannot be determined (e.g. the
script couldn't be executed, was killed, etc.).
Stream STDOUT
and STDERR
of scripts identified by JOB_ID
, running
in the container identified by HANDLE
.
Returns a 2-element tuple containing the type of stream viz. STDOUT
or STDERR
as the first element, and a chunk of the stream as the
second element. Returns an empty tuple when no more data is available
in the stream.
Set or get resource limits for the container identified by HANDLE
.
The following resources can be limited:
- The memory limit is specified in number of bytes. It is enforced using the control group associated with the container. When a container exceeds this limit, one or more of its processes will be killed by the kernel. Additionally, the Warden will be notified that an OOM happened and it subsequently tears down the container.
Forward a port on the external interface of the host to the container
identified by HANDLE
.
Returns the port number that is mapped to the container. This port number is the same on the inside of the container.
Allow traffic from the container identified by HANDLE
to the network
address specified by ADDRESS
. Additionally, the address may be masked
to allow a network of addresses, and a port to only allow traffic to a
specific port.
Returns ok
.
Copy the contents at SRC_PATH
on the host to DST_PATH
in the
container identified by HANDLE
.
Returns ok
.
File permissions and symbolic links will be preserved, while hardlinks
will be materialized. If SRC_PATH
contains a trailing /
only the
contents of the directory will be copied. Otherwise, the outermost
directory, along with its contents, will be copied. The unprivileged
user will be the owner of the files in the container.
Copy the contents at SRC_PATH
in the container identified by HANDLE
to DST_PATH
on the host.
Returns ok
.
Its semantics are identical to copy HANDLE in
except in respect
to file ownership. By default, the files on the host will be owned by
root. If the OWNER
argument is supplied (in the form of USER:GROUP
),
files on the host will be chowned to this user/group after the copy has
completed.
Stop processes running inside the container identified by HANDLE
.
Returns ok
.
Because all processes are taken down, unfinished scripts will likely terminate without an exit status being available.
Stop processes and destroy all resources associated with the container
identified HANDLE
.
Returns ok
.
Because everything related to the container is destroyed, artifacts from
running an earlier script should be copied out before calling destroy
.
Warden can be configured by passing a configuration file when it is
started. An example configuration is located at config/linux.yml
in
the repository.
Warden runs on Ubuntu 10.04 and higher.
A backported kernel needs to be installed on 10.04. This kernel is
available as linux-image-server-lts-backport-natty
(substitute
server
for generic
if you are running Warden on a desktop variant of
Ubuntu 10.04).
Other dependencies are:
- build-essential (for compiling Warden's C bits)
- debootstrap (for bootstrapping the container's base filesystem)
- quota (for managing file system quotas)
Further bootstrapping of Warden can be done by running rake setup
.
The included tests create and destroy real containers, so require system prerequisites to be in place. They need to be run as root if the backend to be tested requires it.
See root/<backend>/README.md
for backend-specific information.
The project is licensed under the Apache 2.0 license (see the
LICENSE
file in the root directory of the repository).