New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to checkpoint/restore containers #469

Merged
merged 5 commits into from Oct 4, 2018

Conversation

@adrianreber
Copy link
Collaborator

adrianreber commented Mar 9, 2018

This is still work in progress and just published for initial feedback.

runc uses CRIU to support checkpoint and restore of containers. This
brings an initial checkpoint/restore implementation to podman.

None of the additional runc flags are yet supported and container
migration optimization (pre-copy/post-copy) is also left for the future.

The current status is that it is possible to checkpoint and restore a
container. I am testing on RHEL-7.x and as the combination of RHEL-7 and
CRIU has seccomp troubles I have to create the container without
seccomp.

With the following steps I am able to checkpoint and restore a
container:

 # cat unconf.json
 {}
 # podman run --security-opt="seccomp=unconf.json" -d registry.fedoraproject.org/f26/httpd
 # podman checkpoint <container>
 # podman restore <container>

I am using CRIU, runc and conmon from git. Especially conmon needs
additional changes for checkpoint/restore which are still in a PR.

Signed-off-by: Adrian Reber areber@redhat.com

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

rh-atomic-bot commented Mar 9, 2018

Can one of the admins verify this patch?
I understand the following commands:

  • bot, add author to whitelist
  • bot, test pull request
  • bot, test pull request once
@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Mar 9, 2018

This is my first attempt to provide checkpoint/restore to podman. This needs conmon changes which are available here: kubernetes-sigs/cri-o#1427

Right now I am mainly hoping for some feedback if my approach is correct or if this should be done differently.

Container checkpointing and restoring works but the network does not yet work after the restore as I am not setting up the network namespace during restore.

@rhatdan rhatdan changed the title Add support to checkpoint/restore containers [WIP] Add support to checkpoint/restore containers Mar 9, 2018

@rhatdan

This comment has been minimized.

Copy link
Member

rhatdan commented Mar 9, 2018

@adrianreber Added [WIP} to heading to prevent accidentally getting merged.

@adrianreber adrianreber force-pushed the adrianreber:master branch from 44afd73 to 4981e14 Mar 9, 2018

// CreateContainer creates a new container
CreateContainer CreateMode = iota
// RestoreContainer restores an existing container
RestoreContainer CreateMode = iota

This comment has been minimized.

@mheon

mheon Mar 9, 2018

Collaborator

Do we see there being more of these in the future? I can't think of obvious ones, so I would lean towards just leaving this a bool argument in OCIRuntime.createContainer

//}
}

if c.state.State == ContainerStateRunning {

This comment has been minimized.

@mheon

mheon Mar 9, 2018

Collaborator

Probably worth including Paused in there too, I doubt we want to restore over a paused container

@mheon

This comment has been minimized.

Copy link
Collaborator

mheon commented Mar 9, 2018

Fundamental approach looks right - it looks like our other API functions, which is a good thing

I'm wondering if we want to add a build tag for this, to avoid having another hard dependency. Not sure yet.

@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Mar 9, 2018

@mheon Thanks for the feedback, I will change the two things you mentioned and try to get networking running after restore and update this PR.

// This also happens in Cechkpoint()
// Not sure where is the right place for this.
// If the container is stopped via runc, it needs to be here for a successful restart.
logrus.Debugf("Cleaning up %s", filepath.Join(c.bundlePath(), "attach"))

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Mar 9, 2018

Collaborator

nit, think you want a space before attach here and ctl on 765

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

rh-atomic-bot commented Mar 13, 2018

☔️ The latest upstream changes (presumably edbfb53) made this pull request unmergeable. Please resolve the merge conflicts.

@rhatdan

This comment has been minimized.

Copy link
Member

rhatdan commented Apr 4, 2018

@adrianreber Is this still something you are interested in? Needs a rebase.

@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Apr 4, 2018

@rhatdan I am still on it. I rebased it locally and I am currently trying to get the network in the same state as is was during the checkpoint.

@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Apr 24, 2018

This will take some more time as runc and CRIU do not handle pre-created namespaces correctly like podman does. See: opencontainers/runc#1786

Once I have the necessary changes on CRIU and runc I will continue here.

@mheon

This comment has been minimized.

Copy link
Collaborator

mheon commented Apr 24, 2018

@adrianreber Thanks for the update. We'll be happy to accept patches once upstream issues are resolved!

@jwhonce jwhonce added the enhancement label Jul 6, 2018

adrianreber added a commit to adrianreber/runc that referenced this pull request Jul 24, 2018

criu: restore into existing namespace when specified
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes opencontainers#1786

Related to containers/libpod#469

Signed-off-by: Adrian Reber <areber@redhat.com>

adrianreber added a commit to adrianreber/runc that referenced this pull request Jul 24, 2018

criu: restore into existing namespace when specified
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes opencontainers#1786

Related to containers/libpod#469

Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Jul 24, 2018

Necessary changes for runc are finally on their way: opencontainers/runc#1849

adrianreber added a commit to adrianreber/runc that referenced this pull request Aug 14, 2018

criu: restore into existing namespace when specified
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes opencontainers#1786

Related to containers/libpod#469

Thanks to Andrei Vagin for all the help in getting the interface between
CRIU and runc right!

Signed-off-by: Adrian Reber <areber@redhat.com>

adrianreber added a commit to adrianreber/runc that referenced this pull request Aug 22, 2018

criu: restore into existing namespace when specified
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes opencontainers#1786

Related to containers/libpod#469

Thanks to Andrei Vagin for all the help in getting the interface between
CRIU and runc right!

Signed-off-by: Adrian Reber <areber@redhat.com>

@adrianreber adrianreber force-pushed the adrianreber:master branch from 4981e14 to 911f712 Sep 18, 2018

@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Sep 18, 2018

I have finally updated this PR to introduce checkpoint and restore in podman.

All changes in required dependencies (conmon, runc, criu) have been committed to the respective repositories. Right now I am using all of those dependencies from git checkouts.

For CRIU a one-line patch is still under discussion: https://lists.openvz.org/pipermail/criu/2018-September/042613.html

I am also removing the [WIP] flag as for the initial implementation everything is ready.

I can checkpoint and restore containers and after restore I can still (again) connect to the network service provided by the container, using the same IP address.

I tried to address the review comments from the initial commit from March. Next steps from my point of view:

  • more advanced network setups
  • offer more of the CRIU functionality exposed through runc also in podman (i.e. migrating established TCP connections)
  • handle container migration between hosts. Right now I am only checkpointing and restoring on one system
  • add support for pre-copy and post-copy container migration

@adrianreber adrianreber changed the title [WIP] Add support to checkpoint/restore containers Add support to checkpoint/restore containers Sep 18, 2018

@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Sep 18, 2018

Ah, I did not run the code linter. Will do that and update the PR.

}

// This needs to be deleted so that the restore works correctly.
// This also happens in Chechkpoint()

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Sep 18, 2018

Collaborator

nit Chechkpoint to Checkpoint

This comment has been minimized.

@adrianreber

adrianreber Sep 18, 2018

Collaborator

Fixed in upcoming update. Thanks.

os.Remove(filepath.Join(c.bundlePath(), "stats-dump"))
os.Remove(filepath.Join(c.bundlePath(), "stats-restore"))
// Network status file
os.Remove(filepath.Join(c.bundlePath(), "network.status"))

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Sep 18, 2018

Collaborator

I concur with the no error out, but am just wondering if throwing a debug message on error might be helpful if things ever go wrong. I think it's probably fine as is, but thought I'd raise the question for consideration.

This comment has been minimized.

@adrianreber

adrianreber Sep 18, 2018

Collaborator

Sounds like a good idea.

@TomSweeneyRedHat
Copy link
Collaborator

TomSweeneyRedHat left a comment

Test and man page changes?


Keep all temporary log and statistics files created by CRIU during
checkpointing as well as restoring. These files are not deleted if restoring
fails fur further debugging. If restoring succeeds these files are

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Sep 28, 2018

Collaborator

fur to for here too.

Keep all temporary log and statistics files created by CRIU during
checkpointing as well as restoring. These files are not deleted if restoring
fails fur further debugging. If restoring succeeds these files are
theoretically not needed, but if these files are needed podman can keep the

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Sep 28, 2018

Collaborator

ditto podman to Podman

@@ -157,6 +157,28 @@ $ sudo podman top <container_id>
101 31889 31873 0 09:21 ? 00:00:00 nginx: worker process
```

### Checkpointing the container
Checkpointing a container stops the container while writing the state of all processes in the container to disk.
With this a container can be later restored and continue running at exact the same point in time of the

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Sep 28, 2018

Collaborator

Suggest: With this a container can later be restored and continue running at exactly the same point in time as the checkpoint.

### Checkpointing the container
Checkpointing a container stops the container while writing the state of all processes in the container to disk.
With this a container can be later restored and continue running at exact the same point in time of the
checkpoint. This needs at least CRIU 3.11 installed on the system.

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Sep 28, 2018

Collaborator

Suggest "needs at least CRIU 3.11" "capability requires CRIU 3.11 or later"

$ sudo podman container restore <container_id>
```

After restore the container should once again answer to request like it did before checkpointing.

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Sep 28, 2018

Collaborator

Completely confused by this line. Perhaps:

After being restored, the container will answer requests again as it did before checkpointing.

$ sudo podman container restore <container_id>
```

After restore the container should once again answer to request like it did before checkpointing.

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Sep 28, 2018

Collaborator

A blog posting, especially with a asciinema type of demo would be awesome once this is up and running.

Show resolved Hide resolved libpod/container_internal_linux.go
// Let's try to stat() CRIU's inventory file. If it does not exist, it makes
// no sense to try a restore. This is a minimal check if a checkpoint exist.
if _, err := os.Stat(filepath.Join(c.CheckpointPath(), "inventory.img")); os.IsNotExist(err) {
return errors.Wrapf(err, "It seems there does not exist a complete checkpoint, cannot restore")

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Sep 28, 2018

Collaborator

Suggest: "A complete checkpoint for this container cannot be found."

@adrianreber adrianreber force-pushed the adrianreber:master branch from caec913 to b16d0bf Oct 1, 2018

@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Oct 1, 2018

@mheon I changed all the points from your review. Thanks for the feedback.

@TomSweeneyRedHat I also fixed everything you found, thanks. Concerning a blog post, I am happy to write an article once this is ready. Would podman.io be the right place for this?

All changes have been pushed, let's see if the CI still likes the current pull request.

@rhatdan

This comment has been minimized.

Copy link
Member

rhatdan commented Oct 2, 2018

LGTM

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

rh-atomic-bot commented Oct 3, 2018

☔️ The latest upstream changes (presumably #1578) made this pull request unmergeable. Please resolve the merge conflicts.

@rhatdan

This comment has been minimized.

Copy link
Member

rhatdan commented Oct 3, 2018

@adrianreber Sorry but this PR needs a rebase.

adrianreber added some commits Sep 18, 2018

Add support to checkpoint/restore containers
runc uses CRIU to support checkpoint and restore of containers. This
brings an initial checkpoint/restore implementation to podman.

None of the additional runc flags are yet supported and container
migration optimization (pre-copy/post-copy) is also left for the future.

The current status is that it is possible to checkpoint and restore a
container. I am testing on RHEL-7.x and as the combination of RHEL-7 and
CRIU has seccomp troubles I have to create the container without
seccomp.

With the following steps I am able to checkpoint and restore a
container:

 # podman run --security-opt="seccomp=unconfined" -d registry.fedoraproject.org/f27/httpd
 # curl -I 10.22.0.78:8080
 HTTP/1.1 403 Forbidden # <-- this is actually a good answer
 # podman container checkpoint <container>
 # curl -I 10.22.0.78:8080
 curl: (7) Failed connect to 10.22.0.78:8080; No route to host
 # podman container restore <container>
 # curl -I 10.22.0.78:8080
 HTTP/1.1 403 Forbidden

I am using CRIU, runc and conmon from git. All required changes for
checkpoint/restore support in podman have been merged in the
corresponding projects.

To have the same IP address in the restored container as before
checkpointing, CNI is told which IP address to use.

If the saved network configuration cannot be found during restore, the
container is restored with a new IP address.

For CRIU to restore established TCP connections the IP address of the
network namespace used for restore needs to be the same. For TCP
connections in the listening state the IP address can change.

During restore only one network interface with one IP address is handled
correctly. Support to restore containers with more advanced network
configuration will be implemented later.

v2:
 * comment typo
 * print debug messages during cleanup of restore files
 * use createContainer() instead of createOCIContainer()
 * introduce helper CheckpointPath()
 * do not try to restore a container that is paused
 * use existing helper functions for cleanup
 * restructure code flow for better readability
 * do not try to restore if checkpoint/inventory.img is missing
 * git add checkpoint.go restore.go

v3:
 * move checkpoint/restore under 'podman container'

v4:
 * incorporated changes from latest reviews

Signed-off-by: Adrian Reber <areber@redhat.com>
docs: add checkpoint and restore man pages
This adds the podman-container-checkpoint and
podman-container-restore man pages.

Signed-off-by: Adrian Reber <areber@redhat.com>
tutorial: add checkpoint/restore to tutorial
Signed-off-by: Adrian Reber <areber@redhat.com>
tests: add checkpoint/restore test
Signed-off-by: Adrian Reber <areber@redhat.com>
completions: add checkpoint/restore completions
Signed-off-by: Adrian Reber <areber@redhat.com>

@adrianreber adrianreber force-pushed the adrianreber:master branch from b16d0bf to dc987af Oct 3, 2018

@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Oct 3, 2018

Rebased and pushed

@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Oct 4, 2018

Any further comments or changes necessary? Ready to be merged?

@mheon

This comment has been minimized.

Copy link
Collaborator

mheon commented Oct 4, 2018

I think we're good to merge

@mheon

This comment has been minimized.

Copy link
Collaborator

mheon commented Oct 4, 2018

/lgtm

@mheon

This comment has been minimized.

Copy link
Collaborator

mheon commented Oct 4, 2018

/approve

@openshift-ci-robot

This comment has been minimized.

Copy link
Collaborator

openshift-ci-robot commented Oct 4, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@TomSweeneyRedHat

This comment has been minimized.

Copy link
Collaborator

TomSweeneyRedHat commented Oct 4, 2018

LGTM, @adrianreber TYVM for getting this all together and for addressing all the comments.

@openshift-merge-robot openshift-merge-robot merged commit 06a959f into containers:master Oct 4, 2018

9 checks passed

CAH 7-smoketested - Containerized (Podman in Docker) All tests passed.
Details
FAH28 - Containerized (Podman in Podman) All tests passed.
Details
Fedora 28 Cloud All tests passed.
Details
ci/prow/images Job succeeded.
Details
ci/prow/lint Job succeeded.
Details
ci/prow/validate Job succeeded.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
required 2/2 PASSES
Details
tide In merge pool.
Details
@adrianreber

This comment has been minimized.

Copy link
Collaborator

adrianreber commented Oct 4, 2018

Thanks a lot for keeping on reviewing my changes and please let me now if I broke something. Thanks everyone! I have already the next patches prepared for checkpoint/restore in podman and will soon open the next PR(s).

adrianreber added a commit to adrianreber/podman.io that referenced this pull request Oct 9, 2018

Add article about checkpoint/restore
See: containers/libpod#469

Signed-off-by: Adrian Reber <areber@redhat.com>

TomSweeneyRedHat added a commit to containers/podman.io that referenced this pull request Oct 10, 2018

Add article about checkpoint/restore (#25)
See: containers/libpod#469

Signed-off-by: Adrian Reber <areber@redhat.com>
@warmchang

This comment has been minimized.

Copy link
Contributor

warmchang commented Oct 27, 2018

Very interesting and useful features. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment