Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to checkpoint/restore containers #469

Merged
merged 5 commits into from Oct 4, 2018

Conversation

adrianreber
Copy link
Collaborator

@adrianreber adrianreber commented Mar 9, 2018

This is still work in progress and just published for initial feedback.

runc uses CRIU to support checkpoint and restore of containers. This
brings an initial checkpoint/restore implementation to podman.

None of the additional runc flags are yet supported and container
migration optimization (pre-copy/post-copy) is also left for the future.

The current status is that it is possible to checkpoint and restore a
container. I am testing on RHEL-7.x and as the combination of RHEL-7 and
CRIU has seccomp troubles I have to create the container without
seccomp.

With the following steps I am able to checkpoint and restore a
container:

 # cat unconf.json
 {}
 # podman run --security-opt="seccomp=unconf.json" -d registry.fedoraproject.org/f26/httpd
 # podman checkpoint <container>
 # podman restore <container>

I am using CRIU, runc and conmon from git. Especially conmon needs
additional changes for checkpoint/restore which are still in a PR.

Signed-off-by: Adrian Reber areber@redhat.com

@rh-atomic-bot
Copy link
Collaborator

Can one of the admins verify this patch?
I understand the following commands:

  • bot, add author to whitelist
  • bot, test pull request
  • bot, test pull request once

@adrianreber
Copy link
Collaborator Author

This is my first attempt to provide checkpoint/restore to podman. This needs conmon changes which are available here: cri-o/cri-o#1427

Right now I am mainly hoping for some feedback if my approach is correct or if this should be done differently.

Container checkpointing and restoring works but the network does not yet work after the restore as I am not setting up the network namespace during restore.

@rhatdan rhatdan changed the title Add support to checkpoint/restore containers [WIP] Add support to checkpoint/restore containers Mar 9, 2018
@rhatdan
Copy link
Member

rhatdan commented Mar 9, 2018

@adrianreber Added [WIP} to heading to prevent accidentally getting merged.

// CreateContainer creates a new container
CreateContainer CreateMode = iota
// RestoreContainer restores an existing container
RestoreContainer CreateMode = iota
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we see there being more of these in the future? I can't think of obvious ones, so I would lean towards just leaving this a bool argument in OCIRuntime.createContainer

//}
}

if c.state.State == ContainerStateRunning {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth including Paused in there too, I doubt we want to restore over a paused container

@mheon
Copy link
Member

mheon commented Mar 9, 2018

Fundamental approach looks right - it looks like our other API functions, which is a good thing

I'm wondering if we want to add a build tag for this, to avoid having another hard dependency. Not sure yet.

@adrianreber
Copy link
Collaborator Author

@mheon Thanks for the feedback, I will change the two things you mentioned and try to get networking running after restore and update this PR.

// This also happens in Cechkpoint()
// Not sure where is the right place for this.
// If the container is stopped via runc, it needs to be here for a successful restart.
logrus.Debugf("Cleaning up %s", filepath.Join(c.bundlePath(), "attach"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, think you want a space before attach here and ctl on 765

@rh-atomic-bot
Copy link
Collaborator

☔ The latest upstream changes (presumably edbfb53) made this pull request unmergeable. Please resolve the merge conflicts.

@rhatdan
Copy link
Member

rhatdan commented Apr 4, 2018

@adrianreber Is this still something you are interested in? Needs a rebase.

@adrianreber
Copy link
Collaborator Author

@rhatdan I am still on it. I rebased it locally and I am currently trying to get the network in the same state as is was during the checkpoint.

@adrianreber
Copy link
Collaborator Author

This will take some more time as runc and CRIU do not handle pre-created namespaces correctly like podman does. See: opencontainers/runc#1786

Once I have the necessary changes on CRIU and runc I will continue here.

@mheon
Copy link
Member

mheon commented Apr 24, 2018

@adrianreber Thanks for the update. We'll be happy to accept patches once upstream issues are resolved!

adrianreber added a commit to adrianreber/runc that referenced this pull request Jul 24, 2018
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes opencontainers#1786

Related to containers/podman#469

Signed-off-by: Adrian Reber <areber@redhat.com>
adrianreber added a commit to adrianreber/runc that referenced this pull request Jul 24, 2018
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes opencontainers#1786

Related to containers/podman#469

Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber
Copy link
Collaborator Author

Necessary changes for runc are finally on their way: opencontainers/runc#1849

adrianreber added a commit to adrianreber/runc that referenced this pull request Aug 14, 2018
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes opencontainers#1786

Related to containers/podman#469

Thanks to Andrei Vagin for all the help in getting the interface between
CRIU and runc right!

Signed-off-by: Adrian Reber <areber@redhat.com>
adrianreber added a commit to adrianreber/runc that referenced this pull request Aug 22, 2018
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes opencontainers#1786

Related to containers/podman#469

Thanks to Andrei Vagin for all the help in getting the interface between
CRIU and runc right!

Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber
Copy link
Collaborator Author

I have finally updated this PR to introduce checkpoint and restore in podman.

All changes in required dependencies (conmon, runc, criu) have been committed to the respective repositories. Right now I am using all of those dependencies from git checkouts.

For CRIU a one-line patch is still under discussion: https://lists.openvz.org/pipermail/criu/2018-September/042613.html

I am also removing the [WIP] flag as for the initial implementation everything is ready.

I can checkpoint and restore containers and after restore I can still (again) connect to the network service provided by the container, using the same IP address.

I tried to address the review comments from the initial commit from March. Next steps from my point of view:

  • more advanced network setups
  • offer more of the CRIU functionality exposed through runc also in podman (i.e. migrating established TCP connections)
  • handle container migration between hosts. Right now I am only checkpointing and restoring on one system
  • add support for pre-copy and post-copy container migration

@adrianreber adrianreber changed the title [WIP] Add support to checkpoint/restore containers Add support to checkpoint/restore containers Sep 18, 2018
@adrianreber
Copy link
Collaborator Author

Ah, I did not run the code linter. Will do that and update the PR.

}

// This needs to be deleted so that the restore works correctly.
// This also happens in Chechkpoint()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit Chechkpoint to Checkpoint

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in upcoming update. Thanks.

os.Remove(filepath.Join(c.bundlePath(), "stats-dump"))
os.Remove(filepath.Join(c.bundlePath(), "stats-restore"))
// Network status file
os.Remove(filepath.Join(c.bundlePath(), "network.status"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with the no error out, but am just wondering if throwing a debug message on error might be helpful if things ever go wrong. I think it's probably fine as is, but thought I'd raise the question for consideration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea.

Copy link
Member

@TomSweeneyRedHat TomSweeneyRedHat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test and man page changes?

Keep all temporary log and statistics files created by CRIU during
checkpointing as well as restoring. These files are not deleted if restoring
fails fur further debugging. If restoring succeeds these files are
theoretically not needed, but if these files are needed podman can keep the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto podman to Podman

@@ -157,6 +157,28 @@ $ sudo podman top <container_id>
101 31889 31873 0 09:21 ? 00:00:00 nginx: worker process
```

### Checkpointing the container
Checkpointing a container stops the container while writing the state of all processes in the container to disk.
With this a container can be later restored and continue running at exact the same point in time of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest: With this a container can later be restored and continue running at exactly the same point in time as the checkpoint.

### Checkpointing the container
Checkpointing a container stops the container while writing the state of all processes in the container to disk.
With this a container can be later restored and continue running at exact the same point in time of the
checkpoint. This needs at least CRIU 3.11 installed on the system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest "needs at least CRIU 3.11" "capability requires CRIU 3.11 or later"

$ sudo podman container restore <container_id>
```

After restore the container should once again answer to request like it did before checkpointing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely confused by this line. Perhaps:

After being restored, the container will answer requests again as it did before checkpointing.

$ sudo podman container restore <container_id>
```

After restore the container should once again answer to request like it did before checkpointing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A blog posting, especially with a asciinema type of demo would be awesome once this is up and running.

libpod/container_internal_linux.go Show resolved Hide resolved
// Let's try to stat() CRIU's inventory file. If it does not exist, it makes
// no sense to try a restore. This is a minimal check if a checkpoint exist.
if _, err := os.Stat(filepath.Join(c.CheckpointPath(), "inventory.img")); os.IsNotExist(err) {
return errors.Wrapf(err, "It seems there does not exist a complete checkpoint, cannot restore")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest: "A complete checkpoint for this container cannot be found."

@adrianreber
Copy link
Collaborator Author

@mheon I changed all the points from your review. Thanks for the feedback.

@TomSweeneyRedHat I also fixed everything you found, thanks. Concerning a blog post, I am happy to write an article once this is ready. Would podman.io be the right place for this?

All changes have been pushed, let's see if the CI still likes the current pull request.

@rhatdan
Copy link
Member

rhatdan commented Oct 2, 2018

LGTM

@rh-atomic-bot
Copy link
Collaborator

☔ The latest upstream changes (presumably #1578) made this pull request unmergeable. Please resolve the merge conflicts.

@rhatdan
Copy link
Member

rhatdan commented Oct 3, 2018

@adrianreber Sorry but this PR needs a rebase.

runc uses CRIU to support checkpoint and restore of containers. This
brings an initial checkpoint/restore implementation to podman.

None of the additional runc flags are yet supported and container
migration optimization (pre-copy/post-copy) is also left for the future.

The current status is that it is possible to checkpoint and restore a
container. I am testing on RHEL-7.x and as the combination of RHEL-7 and
CRIU has seccomp troubles I have to create the container without
seccomp.

With the following steps I am able to checkpoint and restore a
container:

 # podman run --security-opt="seccomp=unconfined" -d registry.fedoraproject.org/f27/httpd
 # curl -I 10.22.0.78:8080
 HTTP/1.1 403 Forbidden # <-- this is actually a good answer
 # podman container checkpoint <container>
 # curl -I 10.22.0.78:8080
 curl: (7) Failed connect to 10.22.0.78:8080; No route to host
 # podman container restore <container>
 # curl -I 10.22.0.78:8080
 HTTP/1.1 403 Forbidden

I am using CRIU, runc and conmon from git. All required changes for
checkpoint/restore support in podman have been merged in the
corresponding projects.

To have the same IP address in the restored container as before
checkpointing, CNI is told which IP address to use.

If the saved network configuration cannot be found during restore, the
container is restored with a new IP address.

For CRIU to restore established TCP connections the IP address of the
network namespace used for restore needs to be the same. For TCP
connections in the listening state the IP address can change.

During restore only one network interface with one IP address is handled
correctly. Support to restore containers with more advanced network
configuration will be implemented later.

v2:
 * comment typo
 * print debug messages during cleanup of restore files
 * use createContainer() instead of createOCIContainer()
 * introduce helper CheckpointPath()
 * do not try to restore a container that is paused
 * use existing helper functions for cleanup
 * restructure code flow for better readability
 * do not try to restore if checkpoint/inventory.img is missing
 * git add checkpoint.go restore.go

v3:
 * move checkpoint/restore under 'podman container'

v4:
 * incorporated changes from latest reviews

Signed-off-by: Adrian Reber <areber@redhat.com>
This adds the podman-container-checkpoint and
podman-container-restore man pages.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber
Copy link
Collaborator Author

Rebased and pushed

@adrianreber
Copy link
Collaborator Author

Any further comments or changes necessary? Ready to be merged?

@mheon
Copy link
Member

mheon commented Oct 4, 2018

I think we're good to merge

@mheon
Copy link
Member

mheon commented Oct 4, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2018
@mheon
Copy link
Member

mheon commented Oct 4, 2018

/approve

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2018
@TomSweeneyRedHat
Copy link
Member

LGTM, @adrianreber TYVM for getting this all together and for addressing all the comments.

@openshift-merge-robot openshift-merge-robot merged commit 06a959f into containers:master Oct 4, 2018
@adrianreber
Copy link
Collaborator Author

Thanks a lot for keeping on reviewing my changes and please let me now if I broke something. Thanks everyone! I have already the next patches prepared for checkpoint/restore in podman and will soon open the next PR(s).

adrianreber added a commit to adrianreber/podman.io that referenced this pull request Oct 9, 2018
See: containers/podman#469

Signed-off-by: Adrian Reber <areber@redhat.com>
TomSweeneyRedHat pushed a commit to containers/podman.io_old that referenced this pull request Oct 10, 2018
See: containers/podman#469

Signed-off-by: Adrian Reber <areber@redhat.com>
@warmchang
Copy link
Contributor

Very interesting and useful features. 👍

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 27, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants