New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to checkpoint/restore containers #469
Conversation
|
Can one of the admins verify this patch?
|
|
This is my first attempt to provide checkpoint/restore to podman. This needs conmon changes which are available here: cri-o/cri-o#1427 Right now I am mainly hoping for some feedback if my approach is correct or if this should be done differently. Container checkpointing and restoring works but the network does not yet work after the restore as I am not setting up the network namespace during restore. |
|
@adrianreber Added [WIP} to heading to prevent accidentally getting merged. |
libpod/container.go
Outdated
| // CreateContainer creates a new container | ||
| CreateContainer CreateMode = iota | ||
| // RestoreContainer restores an existing container | ||
| RestoreContainer CreateMode = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we see there being more of these in the future? I can't think of obvious ones, so I would lean towards just leaving this a bool argument in OCIRuntime.createContainer
libpod/container_api.go
Outdated
| //} | ||
| } | ||
|
|
||
| if c.state.State == ContainerStateRunning { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth including Paused in there too, I doubt we want to restore over a paused container
|
Fundamental approach looks right - it looks like our other API functions, which is a good thing I'm wondering if we want to add a build tag for this, to avoid having another hard dependency. Not sure yet. |
|
@mheon Thanks for the feedback, I will change the two things you mentioned and try to get networking running after restore and update this PR. |
libpod/container_api.go
Outdated
| // This also happens in Cechkpoint() | ||
| // Not sure where is the right place for this. | ||
| // If the container is stopped via runc, it needs to be here for a successful restart. | ||
| logrus.Debugf("Cleaning up %s", filepath.Join(c.bundlePath(), "attach")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, think you want a space before attach here and ctl on 765
|
☔ The latest upstream changes (presumably edbfb53) made this pull request unmergeable. Please resolve the merge conflicts. |
|
@adrianreber Is this still something you are interested in? Needs a rebase. |
|
@rhatdan I am still on it. I rebased it locally and I am currently trying to get the network in the same state as is was during the checkpoint. |
|
This will take some more time as runc and CRIU do not handle pre-created namespaces correctly like podman does. See: opencontainers/runc#1786 Once I have the necessary changes on CRIU and runc I will continue here. |
|
@adrianreber Thanks for the update. We'll be happy to accept patches once upstream issues are resolved! |
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.
If the network namespace is defined like
{
"type": "network",
"path": "/run/netns/test"
}
there is the expectation that the restored container is again running in
the network namespace specified with 'path'.
This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.
This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.
With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.
Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.
runc still falls back to the old behavior if CRIU older than 3.11 is
installed.
Fixes opencontainers#1786
Related to containers/podman#469
Signed-off-by: Adrian Reber <areber@redhat.com>
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.
If the network namespace is defined like
{
"type": "network",
"path": "/run/netns/test"
}
there is the expectation that the restored container is again running in
the network namespace specified with 'path'.
This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.
This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.
With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.
Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.
runc still falls back to the old behavior if CRIU older than 3.11 is
installed.
Fixes opencontainers#1786
Related to containers/podman#469
Signed-off-by: Adrian Reber <areber@redhat.com>
|
Necessary changes for runc are finally on their way: opencontainers/runc#1849 |
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.
If the network namespace is defined like
{
"type": "network",
"path": "/run/netns/test"
}
there is the expectation that the restored container is again running in
the network namespace specified with 'path'.
This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.
This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.
With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.
Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.
runc still falls back to the old behavior if CRIU older than 3.11 is
installed.
Fixes opencontainers#1786
Related to containers/podman#469
Thanks to Andrei Vagin for all the help in getting the interface between
CRIU and runc right!
Signed-off-by: Adrian Reber <areber@redhat.com>
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.
If the network namespace is defined like
{
"type": "network",
"path": "/run/netns/test"
}
there is the expectation that the restored container is again running in
the network namespace specified with 'path'.
This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.
This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.
With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.
Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.
runc still falls back to the old behavior if CRIU older than 3.11 is
installed.
Fixes opencontainers#1786
Related to containers/podman#469
Thanks to Andrei Vagin for all the help in getting the interface between
CRIU and runc right!
Signed-off-by: Adrian Reber <areber@redhat.com>
|
I have finally updated this PR to introduce checkpoint and restore in podman. All changes in required dependencies (conmon, runc, criu) have been committed to the respective repositories. Right now I am using all of those dependencies from git checkouts. For CRIU a one-line patch is still under discussion: https://lists.openvz.org/pipermail/criu/2018-September/042613.html I am also removing the [WIP] flag as for the initial implementation everything is ready. I can checkpoint and restore containers and after restore I can still (again) connect to the network service provided by the container, using the same IP address. I tried to address the review comments from the initial commit from March. Next steps from my point of view:
|
|
Ah, I did not run the code linter. Will do that and update the PR. |
libpod/container_api.go
Outdated
| } | ||
|
|
||
| // This needs to be deleted so that the restore works correctly. | ||
| // This also happens in Chechkpoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Chechkpoint to Checkpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in upcoming update. Thanks.
libpod/container_api.go
Outdated
| os.Remove(filepath.Join(c.bundlePath(), "stats-dump")) | ||
| os.Remove(filepath.Join(c.bundlePath(), "stats-restore")) | ||
| // Network status file | ||
| os.Remove(filepath.Join(c.bundlePath(), "network.status")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur with the no error out, but am just wondering if throwing a debug message on error might be helpful if things ever go wrong. I think it's probably fine as is, but thought I'd raise the question for consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test and man page changes?
docs/podman-container-restore.1.md
Outdated
| Keep all temporary log and statistics files created by CRIU during | ||
| checkpointing as well as restoring. These files are not deleted if restoring | ||
| fails fur further debugging. If restoring succeeds these files are | ||
| theoretically not needed, but if these files are needed podman can keep the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto podman to Podman
docs/tutorials/podman_tutorial.md
Outdated
| @@ -157,6 +157,28 @@ $ sudo podman top <container_id> | |||
| 101 31889 31873 0 09:21 ? 00:00:00 nginx: worker process | |||
| ``` | |||
|
|
|||
| ### Checkpointing the container | |||
| Checkpointing a container stops the container while writing the state of all processes in the container to disk. | |||
| With this a container can be later restored and continue running at exact the same point in time of the | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest: With this a container can later be restored and continue running at exactly the same point in time as the checkpoint.
docs/tutorials/podman_tutorial.md
Outdated
| ### Checkpointing the container | ||
| Checkpointing a container stops the container while writing the state of all processes in the container to disk. | ||
| With this a container can be later restored and continue running at exact the same point in time of the | ||
| checkpoint. This needs at least CRIU 3.11 installed on the system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest "needs at least CRIU 3.11" "capability requires CRIU 3.11 or later"
docs/tutorials/podman_tutorial.md
Outdated
| $ sudo podman container restore <container_id> | ||
| ``` | ||
|
|
||
| After restore the container should once again answer to request like it did before checkpointing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely confused by this line. Perhaps:
After being restored, the container will answer requests again as it did before checkpointing.
docs/tutorials/podman_tutorial.md
Outdated
| $ sudo podman container restore <container_id> | ||
| ``` | ||
|
|
||
| After restore the container should once again answer to request like it did before checkpointing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A blog posting, especially with a asciinema type of demo would be awesome once this is up and running.
libpod/container_internal_linux.go
Outdated
| // Let's try to stat() CRIU's inventory file. If it does not exist, it makes | ||
| // no sense to try a restore. This is a minimal check if a checkpoint exist. | ||
| if _, err := os.Stat(filepath.Join(c.CheckpointPath(), "inventory.img")); os.IsNotExist(err) { | ||
| return errors.Wrapf(err, "It seems there does not exist a complete checkpoint, cannot restore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest: "A complete checkpoint for this container cannot be found."
|
@mheon I changed all the points from your review. Thanks for the feedback. @TomSweeneyRedHat I also fixed everything you found, thanks. Concerning a blog post, I am happy to write an article once this is ready. Would podman.io be the right place for this? All changes have been pushed, let's see if the CI still likes the current pull request. |
|
LGTM |
|
☔ The latest upstream changes (presumably #1578) made this pull request unmergeable. Please resolve the merge conflicts. |
|
@adrianreber Sorry but this PR needs a rebase. |
runc uses CRIU to support checkpoint and restore of containers. This brings an initial checkpoint/restore implementation to podman. None of the additional runc flags are yet supported and container migration optimization (pre-copy/post-copy) is also left for the future. The current status is that it is possible to checkpoint and restore a container. I am testing on RHEL-7.x and as the combination of RHEL-7 and CRIU has seccomp troubles I have to create the container without seccomp. With the following steps I am able to checkpoint and restore a container: # podman run --security-opt="seccomp=unconfined" -d registry.fedoraproject.org/f27/httpd # curl -I 10.22.0.78:8080 HTTP/1.1 403 Forbidden # <-- this is actually a good answer # podman container checkpoint <container> # curl -I 10.22.0.78:8080 curl: (7) Failed connect to 10.22.0.78:8080; No route to host # podman container restore <container> # curl -I 10.22.0.78:8080 HTTP/1.1 403 Forbidden I am using CRIU, runc and conmon from git. All required changes for checkpoint/restore support in podman have been merged in the corresponding projects. To have the same IP address in the restored container as before checkpointing, CNI is told which IP address to use. If the saved network configuration cannot be found during restore, the container is restored with a new IP address. For CRIU to restore established TCP connections the IP address of the network namespace used for restore needs to be the same. For TCP connections in the listening state the IP address can change. During restore only one network interface with one IP address is handled correctly. Support to restore containers with more advanced network configuration will be implemented later. v2: * comment typo * print debug messages during cleanup of restore files * use createContainer() instead of createOCIContainer() * introduce helper CheckpointPath() * do not try to restore a container that is paused * use existing helper functions for cleanup * restructure code flow for better readability * do not try to restore if checkpoint/inventory.img is missing * git add checkpoint.go restore.go v3: * move checkpoint/restore under 'podman container' v4: * incorporated changes from latest reviews Signed-off-by: Adrian Reber <areber@redhat.com>
This adds the podman-container-checkpoint and podman-container-restore man pages. Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
|
Rebased and pushed |
|
Any further comments or changes necessary? Ready to be merged? |
|
I think we're good to merge |
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mheon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
LGTM, @adrianreber TYVM for getting this all together and for addressing all the comments. |
|
Thanks a lot for keeping on reviewing my changes and please let me now if I broke something. Thanks everyone! I have already the next patches prepared for checkpoint/restore in podman and will soon open the next PR(s). |
See: containers/podman#469 Signed-off-by: Adrian Reber <areber@redhat.com>
See: containers/podman#469 Signed-off-by: Adrian Reber <areber@redhat.com>
|
Very interesting and useful features. 👍 |
This is still work in progress and just published for initial feedback.
runc uses CRIU to support checkpoint and restore of containers. This
brings an initial checkpoint/restore implementation to podman.
None of the additional runc flags are yet supported and container
migration optimization (pre-copy/post-copy) is also left for the future.
The current status is that it is possible to checkpoint and restore a
container. I am testing on RHEL-7.x and as the combination of RHEL-7 and
CRIU has seccomp troubles I have to create the container without
seccomp.
With the following steps I am able to checkpoint and restore a
container:
I am using CRIU, runc and conmon from git. Especially conmon needs
additional changes for checkpoint/restore which are still in a PR.
Signed-off-by: Adrian Reber areber@redhat.com