Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add support to migrate containers #2272

Open
wants to merge 10 commits into
base: master
from

Conversation

Projects
None yet
9 participants
@adrianreber
Copy link
Collaborator

commented Feb 5, 2019

This series adds container migration support to Podman.

The basic steps to migrate containers are:

 Source system:
  * podman container checkpoint -l -e /tmp/checkpoint.tar.gz
  * scp /tmp/checkpoint.tar.gz destination:/tmp

 Destination system:
  * podman pull 'container-image-as-on-source-system'
  * podman container restore -i /tmp/checkpoint.tar.gz

For the newly added test to work an updated runc is required, which is still under review: opencontainers/runc#1968

Tries to solve: #1618

This PR includes the actual code, man-pages, bash-completion, tests, tutorial update.

Once this is merged I would also like to publish a related article on podman.io.

@openshift-ci-robot openshift-ci-robot requested review from baude and rhatdan Feb 5, 2019

@openshift-ci-robot

This comment has been minimized.

Copy link
Collaborator

commented Feb 5, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: adrianreber
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: vrothberg

If they are not already assigned, you can assign the PR to them by writing /assign @vrothberg in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@adrianreber adrianreber force-pushed the adrianreber:migration branch 4 times, most recently from 331cae2 to 4d3fa26 Feb 5, 2019

Show resolved Hide resolved cmd/podman/restore.go Outdated
@rst0git

This comment has been minimized.

Copy link

commented Feb 5, 2019

The restore of a looper example fails for me. Still investigating why...
restore.log

Show resolved Hide resolved libpod/container_api.go Outdated
Show resolved Hide resolved libpod/container_internal.go Outdated
Show resolved Hide resolved libpod/container_internal_linux.go
Show resolved Hide resolved libpod/container_internal_linux.go Outdated
Show resolved Hide resolved libpod/container_internal_linux.go Outdated
Show resolved Hide resolved libpod/runtime_ctr.go Outdated
Show resolved Hide resolved cmd/podman/restore.go Outdated
@openshift-merge-robot

This comment has been minimized.

Copy link
Collaborator

commented Feb 6, 2019

/retest

@adrianreber

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 6, 2019

The restore of a looper example fails for me. Still investigating why...
restore.log

Are you using runc with the changes from opencontainers/runc#1968? Could be there is some runc code missing to create /proc just as it did not create bind mount mountpoints before opencontainers/runc#1968. Please let me know how you are testing it so I can try to recreate.

@rst0git

This comment has been minimized.

Copy link

commented Feb 6, 2019

Are you using runc with the changes from opencontainers/runc#1968?

Yes, I installed runc from opencontainers/runc#1968, criu from the criu-dev branch and podman from this PR.

Could be there is some runc code missing to create /proc just as it did not create bind mount mountpoints before opencontainers/runc#1968. Please let me know how you are testing it so I can try to recreate.

I followed these steps to create, checkpoint, and restore a container:

sudo podman run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
sudo podman container checkpoint looper -e /tmp/checkpoint.tar.gz
sudo podman container rm looper
sudo podman container restore -i /tmp/checkpoint.tar.gz

Even though the restore has failed in the last step, a looper container is still created (with status Created). Therefore, I can still run sudo podman container start looper to start the container.

@adrianreber

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 6, 2019

@rst0git This is the same bug as in my runc pull request except that it is also true for non-bind mount mountpoints. It seems runc just creates all missing mountpoints even for read-only root filesystems. If you use a container image which includes all required mountpoints (in this case /run /proc /sys are missing) it should work. I will update my runc PR to handle also missing non-bind mount mountpoints.

@adrianreber

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 6, 2019

@mheon Thanks for the review. At first I was unsure if I can do all the necessary changes, but now, after I have thought about it, I think I can actually do everything you suggested and the result will be better. I hope. So thanks for pointing it out.

@openshift-merge-robot

This comment has been minimized.

Copy link
Collaborator

commented Feb 6, 2019

/retest

@avagin

This comment has been minimized.

Copy link

commented Feb 6, 2019

  • scp /tmp/checkpoint.tar.gz destination:/tmp
    What is inside /tmp/checkpoint.tar.gz? Do you snapshot a container rootfs?
@adrianreber

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 6, 2019

  • scp /tmp/checkpoint.tar.gz destination:/tmp
    What is inside /tmp/checkpoint.tar.gz? Do you snapshot a container rootfs?

The checkpoint directory and the container definition (spec, config, network). No filesystem (yet). I was thinking about doing an automatic commit of the highest layer and including it also. But right now it is only the output of CRIU and some json files.

@adrianreber adrianreber force-pushed the adrianreber:migration branch 2 times, most recently from 74ab7b8 to 0ba0da7 Feb 6, 2019

@adrianreber

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 6, 2019

@mheon I tried to rework my changes to the newContainer() function. There is now a RestoreContainer() function. Please have a look if this is now a better approach.

@adrianreber

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 6, 2019

All CI errors are the expected errors as long as the necessary runc patches have not been merged.

Show resolved Hide resolved libpod/runtime_ctr.go Outdated
@mheon

This comment has been minimized.

Copy link
Collaborator

commented Feb 6, 2019

I'll do a more thorough review tomorrow, but I'm generally in favor of the way NewContainer was split up.

Still a little iffy on copying over the OCI config... I need to check, but there shouldn't be much in there that isn't deterministically generated, so it might not be necessary.

We do need to make sure that the ContainerConfig we copied over makes sense before we use it - if the container is in a pod, we need a pod with the same ID present on the remote host, and the same holds for named volumes. We might want to make a generic sanity checker for ContainerConfig that we can use for both NewContainer and this.

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

commented Feb 6, 2019

☔️ The latest upstream changes (presumably #2252) made this pull request unmergeable. Please resolve the merge conflicts.

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

commented Mar 29, 2019

☔️ The latest upstream changes (presumably #2730) made this pull request unmergeable. Please resolve the merge conflicts.

@adrianreber adrianreber force-pushed the adrianreber:migration branch from a627b9e to 986b96b Apr 1, 2019

@adrianreber adrianreber changed the title Add support to migrate containers [WIP] Add support to migrate containers Apr 1, 2019

@adrianreber

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 1, 2019

Rebased and forced push. It seems almost all external problems (runc, CRIU and SELinux) are almost solved. Almost, as some smaller discussions are still ongoing so maybe soon the tests might start to return green.

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 3, 2019

☔️ The latest upstream changes (presumably #2833) made this pull request unmergeable. Please resolve the merge conflicts.

@adrianreber adrianreber force-pushed the adrianreber:migration branch from 986b96b to 7d2f56b Apr 4, 2019

@adrianreber

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 4, 2019

Rebased but still not ready. Still waiting on a few external dependencies.

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 13, 2019

☔️ The latest upstream changes (presumably #2830) made this pull request unmergeable. Please resolve the merge conflicts.

@adrianreber adrianreber force-pushed the adrianreber:migration branch from 7d2f56b to 53e406f Apr 15, 2019

@adrianreber adrianreber force-pushed the adrianreber:migration branch from 53e406f to c7efe54 Apr 15, 2019

@adrianreber

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 15, 2019

Rebased, but there are still external dependencies (mainly CRIU for Fedora) which are not ready yet.

@rh-atomic-bot

This comment has been minimized.

Copy link
Collaborator

commented Apr 16, 2019

☔️ The latest upstream changes (presumably #2946) made this pull request unmergeable. Please resolve the merge conflicts.

@adrianreber

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 16, 2019

@cevich which version of runc is used in the ubuntu circle ci image? Can it be updated? For this PR I would need to have at least a version of runc which includes opencontainers/runc#1968

adrianreber added some commits Feb 5, 2019

Fix restore options help text and comments
Signed-off-by: Adrian Reber <areber@redhat.com>
Add const string for /dev/shm
The linter was starting to complain that the string "/dev/shm" is used
at multiple places, this creates the const string DevShmPath which
contains "/dev/shm".

Signed-off-by: Adrian Reber <areber@redhat.com>
Added helper functions for container migration
This adds a couple of function in structure members needed in the next
commit to make container migration actually work. This just splits of
the function which are not modifying existing code.

Signed-off-by: Adrian Reber <areber@redhat.com>
Add test case for container migration
The difference between container checkpoint/restore and container
migration is that for migration the container which was checkpointed
must not exist during restore. To simulate migration the container
is remove ('podman rm -fa') before being restored. The migration test
does following steps:

 * podman run
 * podman container checkpoint -l -e /tmp/checkpoint.tar.gz
 * podman rm -fa
 * podman container restore -i /tmp/checkpoint.tar.gz

Signed-off-by: Adrian Reber <areber@redhat.com>
Added bash completion for container migration
Signed-off-by: Adrian Reber <areber@redhat.com>
Add man-pages for container migration
Signed-off-by: Adrian Reber <areber@redhat.com>
Include container migration into tutorial
Signed-off-by: Adrian Reber <areber@redhat.com>
Added support to migrate containers
This commit adds an option to the checkpoint command to export a
checkpoint into a tar.gz file as well as importing a checkpoint tar.gz
file during restore. With all checkpoint artifacts in one file it is
possible to easily transfer a checkpoint and thus enabling container
migration in Podman. With the following steps it is possible to migrate
a running container from one system (source) to another (destination).

 Source system:
  * podman container checkpoint -l -e /tmp/checkpoint.tar.gz
  * scp /tmp/checkpoint.tar.gz destination:/tmp

 Destination system:
  * podman pull 'container-image-as-on-source-system'
  * podman container restore -i /tmp/checkpoint.tar.gz

The exported tar.gz file contains the checkpoint image as created by
CRIU and a few additional JSON files describing the state of the
checkpointed container.

Now the container is running on the destination system with the same
state just as during checkpointing. If the container is kept running
on the source system with the checkpoint flag '-R', the result will be
that the same container is running on two different hosts.

Signed-off-by: Adrian Reber <areber@redhat.com>
migration: add possibility to restore a container with a new name
The option to restore a container from an external checkpoint archive
(podman container restore -i /tmp/checkpoint.tar.gz) restores a
container with the same name and same ID as id had before checkpointing.

This commit adds the option '--name,-n' to 'podman container restore'.
With this option the restored container gets the name specified after
'--name,-n' and a new ID. This way it is possible to restore one
container multiple times.

If a container is restored with a new name Podman will not try to
request the same IP address for the container as it had during
checkpointing. This implicitly assumes that if a container is restored
from a checkpoint archive with a different name, that it will be
restored multiple times and restoring a container multiple times with
the same IP address will fail as each IP address can only be used once.

Signed-off-by: Adrian Reber <areber@redhat.com>
Also download container images during restore
If restoring a container from a checkpoint it was necessary that the
image the container is based was already available (podman pull).

This commit adds the image download to podman container restore if it
does not exist.

Signed-off-by: Adrian Reber <areber@redhat.com>

@adrianreber adrianreber force-pushed the adrianreber:migration branch from c7efe54 to 75dc698 Apr 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.