To ensure clean separation of concerns, we have organized the units of containerd's behavior into components. Components are roughly organized into subsystems. Components that bridge subsystems may be referred to as modules. Modules typically provide cross-cutting functionality, such as persistent storage or event distribution. Understanding these components and their relationships is key to modifying and extending the system.
This document will cover very high-level interaction. For details on each module, please see the relevant design document.
The main goal of this architecture is to coordinate the creation and execution of bundles. Bundles contain configuration, metadata and root filesystem data and are consumed by the runtime. A bundle is the on-disk representation of a runtime container. Bundles are mutable and can be passed to other systems for modification or packed up and distributed. In practice, it is simply a directory on the filesystem.
Note that while these architectural ideas are important to understanding the system, code layout may not reflect the exact architecture. These ideas should be used as a guide for placing functionality and behavior and understanding the thought behind the design.
External users interact with services, made available via a GRPC API.
- Distribution: The distribution service supports pulling images.
- Bundle: The bundle service allows the user to extract and pack bundles from disk images.
- Runtime: The runtime service supports the execution of bundles, including the creation of runtime containers.
Typically, each subsystem will have one or more related controller components that implement the behavior of the subsystem. The behavior of the subsystem may be exported for access via corresponding services.
In addition to the subsystems have, we have several components that may cross subsystem boundaries, referened to as components. We have the following components:
- Executor: The executor implements the actual container runtime.
- Supervisor: The supervisor monitors and reports container state.
- Metadata: Stores metadata in a graph database. Use to store any persistent references to images and bundles. Data entered into the database will have schemas coordinated between components to provide access to arbitrary data. Other functionality includes hooks for garbage collection of on-disk resources.
- Content: Provides access to content addressable storage. All immutable content will be stored here, keyed by content hash.
- Snapshot: Manages filesystem snapshots for container images. This is analogous to the graphdriver in Docker today. Layers are unpacked into snapshots.
- Events: Supports the collection and consumption of events for providing consistent, event driven behavior and auditing. Events may be replayed to various modules
- Metrics: Each components will export several metrics, accessible via the metrics API. (We may want to promote this to a subsystem.
As discussed above, the concept of a bundle is central to containerd. Below is a diagram illustrating the data flow for bundle creation.
Let's take pulling an image as a demonstrated example:
- Instruct the Distribution layer to pull a particular image. The distribution layer places the image content into the content store. The image name and root manifest pointers are registered with the metadata store.
- Once the image is pulled, the user can instruct the bundle controller to unpack the image into a bundle. Consuming from the content store, layers from the image are unpacked into the snapshot component.
- When the snapshot for the rootfs of a container is ready, the bundle controller can use the image manifest and config to prepare the execution configuration. Part of this is entering mounts into the execution config from the snapshot module.
- The prepared bundle is then passed off to the runtime subsystem for execution. It reads the bundle configuration to create a running container.