diff --git a/docs/snapshotters/README.md b/docs/snapshotters/README.md index 17d928c2ff647..96d143970f856 100644 --- a/docs/snapshotters/README.md +++ b/docs/snapshotters/README.md @@ -11,7 +11,7 @@ Generic: - `native`: Native file copying driver. Akin to Docker/Moby's "vfs" driver. Block-based: -- `blockfile`: A driver using raw block files for each snapshot. Block files are copied from a parent or base empty block file. Mounting requires a virtual machine or support for loopback mounts. +- [`blockfile`](./blockfile.md): A driver using raw block files for each snapshot. Block files are copied from a parent or base empty block file. Mounting requires a virtual machine or support for loopback mounts. - `devmapper`: ext4/xfs device mapper. See [`devmapper.md`](./devmapper.md). Filesystem-specific: diff --git a/docs/snapshotters/blockfile.md b/docs/snapshotters/blockfile.md new file mode 100644 index 0000000000000..cb2ca161bfc78 --- /dev/null +++ b/docs/snapshotters/blockfile.md @@ -0,0 +1,145 @@ +# Blockfile Snapshotter + +The blockfile snapshotter uses raw block files for each snapshot. Block files are +copied from a parent or base empty block file. Mounting requires a virtual machine +or support for loopback mounts. + +## Use Case + +Snapshotters serve the purpose of extracting an image from the OCI image store and +creating a snapshot that is useful to containers. It handles setting up the +underlying infrastructure, such as preparing a directory or other filesystem setup, +applying the layers to create a single mountable directory to serve as the container +base, and mounting into the container upon start. + +The most commonly used snapshotter is the overlayfs snapshotter, which is the default +in containerd. The overlayfs snapshotter provides a directory on the host filesystem, +which then is bind-mounted into the container. + +The blockfile snapshotter targets a use case where the container will run inside a +VM. Specifically, the OCI image will be the filesystem for the container, like with +a normal container, but the container itself will be run inside a VM. +Since the VM cannot bind-mount directories from the host, the blockfile snapshotter +creates a block device for the snapshot, which can be attached to the VM as a block +device to facilitate getting the contents into the guest. + +## Alternatives + +There are alternatives to the blockfile snapshotter for mounting directories into a +VM. One alternative is a [virtiofs](https://virtio-fs.gitlab.io) driver, +assuming your VMM supports it. Similarly, you can use +[9p](https://www.kernel.org/doc/Documentation/filesystems/9p.txt) to mount a local +directory into the VM, assuming your VMM supports it. + +Additionally, the [devicemapper snapshotter](./devmapper.md) can be used to create +snapshots on filesystem images in a devicemapper thin-pool. + +## Usage + +### Checking if the blockfile snapshotter is available + +To check if the blockfile snapshotter is available, run the following command: + +```bash +$ ctr plugins ls | grep blockfile +``` + +### Configuration + +To configure the snapshotter, you can use the following configuration options +in your containerd `config.toml`. Don't forget to restart it after changing the +configuration. + +```toml + [plugins.'io.containerd.snapshotter.v1.blockfile'] + scratch_file = "/opt/containerd/blockfile" + root_path = "/somewhere/on/disk" + fs_type = 'ext4' + mount_options = [] + recreate_scratch = true +``` + +- `root_path`: The directory where the block files are stored. This directory must be writable by the containerd process. +- `scratch_file`: The path to the empty file that will be used as the base for the block files. This file should exist before first using the snapshotter. +- `fs_type`: The filesystem type to use for the block files. Currently supported are `ext4` and `xfs`. +- `mount_options`: Additional mount options to use when mounting the block files. +- `recreate_scratch`: If set to `true`, the snapshotter will recreate the scratch file if it is missing. If set to `false`, the snapshotter will fail if the scratch file is missing. + +### Creating the scratch file + +You can create a scratch file as follows. This example uses a 500MB scratch file. + +```bash +$ # make a 500M file +$ dd if=/dev/zero of=/opt/containerd/blockfile bs=1M count=500 +500+0 records in +500+0 records out +524288000 bytes (524 MB, 500 MiB) copied, 1.76253 s, 297 MB/s + +$ # format the file with ext4 +$ sudo mkfs.ext4 /opt/containerd/blockfile +mke2fs 1.47.0 (5-Feb-2023) +Discarding device blocks: done +Creating filesystem with 512000 1k blocks and 128016 inodes +Filesystem UUID: d9947ecc-722d-4627-9cf9-fa2a3b622106 +Superblock backups stored on blocks: + 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409 + +Allocating group tables: done +Writing inode tables: done +Creating journal (8192 blocks): done +Writing superblocks and filesystem accounting information: done +``` + +### Running a container + +To run a container using the blockfile snapshotter, you need to specify the +snapshotter: + +```bash +$ # ensure that the image we are using exists; it is a regular OCI image +$ ctr image pull docker.io/library/busybox:latest +$ # run the container with the provides snapshotter +$ ctr run -rm -t --snapshotter blockfile docker.io/library/busybox:latest hello sh +``` + +To use it via the go client API, it is identical to using any other snapshotter: + +```go +// TODO: will fill in +``` + +## How It Works + +The blockfile snapshotter functions similarly to other snapshotters. It uses the +provided image to find layers, and then applies each on top of the previous, +creating a usable directory. + +The blockfile snapshotter is unique in two ways: + +1. It applies layers inside a disk image file, rather than on the host filesystem. +1. It creates a block image file for each layer, applying the previous on top of it. + +For every layer the snapshotter creates a new blockfile, starting with a copy of the +blockfile from the previous layer. If there is no previous layer, i.e. for the first +layer, it copies the scratch file. + +For example, for an image with 3 layers - called A, B, C - the process is as follows: + +1. Copy the scratch file to a new blockfile for layer A. +1. Loopback-mount the blockfile for layer A. +1. Apply layer A to the mount. +1. Unmount the blockfile for layer A. +1. Copy the blockfile for layer A to a new blockfile for layer B. +1. Loopback-mount the blockfile for layer B. +1. Apply layer B to the mount. +1. Unmount the blockfile for layer B. +1. Copy the blockfile for layer B to a new blockfile for layer C. +1. Loopback-mount the blockfile for layer C. +1. Apply layer C to the mount. +1. Unmount the blockfile for layer C. + +Each unpack of a layer builds upon the contents of the previous layers into a new +blockfile. + +TODO: Does this mean a lot of wasted space? Does it remove previous layer blockfiles after they have been used? E.g. a 500MB scratch image copied for layer A, then for B, then for C, means 1.5GB of space used, even though the final disk image is only 500MB.