Docker Container Checkpoint and Restore
Docker has a rich set of commands to control the execution of a container. Commands such as start, stop, restart, kill, pause, and unpause. However, what is currently missing is the ability to checkpoint and restore a container.
Container Checkpoint and Restore (C/R) offers the following use cases:
- Ability to stop and start the Docker daemon (say for an upgrade) without having to stop the containers and restart them from scratch, losing the work they had done when they were stopped.
- Ability to reboot the system without having to restart the containers from scratch. Same benefits as use case 1 above.
- Ability to do "forensic debugging" of processes running in a container by examining their checkpoint images (open files, memory segments, etc.).
- Ability to migrate containers by restoring them on a different machine than where they were checkpointed.
Here is a 2-minute demo video of Docker C/R in action.
Instead of adding a huge amount of code to Docker to do C/R, this project uses the Checkpoint Restore In Userspace (CRIU) utility to bring native C/R functionality to Docker. CRIU is a powerful open source tool that has been in wide usage for several years.
Project started out in April 2014. The initial effort was identifying missing features in CRIU to successfully checkpoint and restore a Docker container. Patches to address the missing features were merged upstream.
Starting with Docker 1.3, it was possible to C/R a Docker container externally. Details are provided here.
As a proof of concept, external Docker C/R was presented at Docker Meetup in September 2014.
The first version of native Docker C/R was presented at Linux Plumbers Conference in October 2014.
A more complete version of Docker C/R was demonstrated in a short video in January 2015.
The existing C/R code in Docker should be considered alpha, ready for detailed review and extensive testing to reach beta quality. It works with the native exec driver (i.e., libcontainer) but it will not be difficult to make it work with LXC because LXC already supports CRIU.
Currently it's not possible to C/R an interactive Docker container (using /dev/console) but work is underway in CRIU to address this issue.
Work to integrate C/R changes into the new libcontainer which has changed significantly has started in the criu branch of libcontainer. Still work in progress, but here is a 5-minute video of a Redis container checkpointed on one machine and restored on another.
The criu branch of libcontainer was merged into master and PR closed (https://github.com/docker/libcontainer/pull/204).
Ross Boucher rebased C/R support to Docker 1.7 and submitted a pull request (https://github.com/docker/docker/pull/13602).
C/R Support in Libcontainer
Docker's native execution driver, libcontainer, is a package with its own repository. For easier C/R code review and build purposes, libcontainer in this repo includes C/R changes.
Avoiding Regression Issues
In order to avoid introducing regression issues, C/R support has been added without any real changes in the existing Docker code. Therefore, parts of the C/R logic that are similar to the existing Docker logic can be combined together if they don't introduce any performance, stability, security, or other issues.
Comments Marked with XXX
When reviewing the source code for C/R support, please pay special attention to comments marked with XXX as these are questions that need answers and/or discussion. All XXX comments should be removed after the final code review.