Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to migrate containers #2272

Merged
merged 9 commits into from
Jun 7, 2019

Conversation

adrianreber
Copy link
Collaborator

This series adds container migration support to Podman.

The basic steps to migrate containers are:

 Source system:
  * podman container checkpoint -l -e /tmp/checkpoint.tar.gz
  * scp /tmp/checkpoint.tar.gz destination:/tmp

 Destination system:
  * podman pull 'container-image-as-on-source-system'
  * podman container restore -i /tmp/checkpoint.tar.gz

For the newly added test to work an updated runc is required, which is still under review: opencontainers/runc#1968

Tries to solve: #1618

This PR includes the actual code, man-pages, bash-completion, tests, tutorial update.

Once this is merged I would also like to publish a related article on podman.io.

cmd/podman/restore.go Outdated Show resolved Hide resolved
@rst0git
Copy link
Contributor

rst0git commented Feb 5, 2019

The restore of a looper example fails for me. Still investigating why...
restore.log

libpod/container_api.go Outdated Show resolved Hide resolved
libpod/container_internal.go Outdated Show resolved Hide resolved
libpod/container_internal_linux.go Show resolved Hide resolved
libpod/container_internal_linux.go Outdated Show resolved Hide resolved
libpod/container_internal_linux.go Outdated Show resolved Hide resolved
libpod/runtime_ctr.go Outdated Show resolved Hide resolved
cmd/podman/restore.go Outdated Show resolved Hide resolved
@openshift-merge-robot
Copy link
Collaborator

/retest

@adrianreber
Copy link
Collaborator Author

The restore of a looper example fails for me. Still investigating why...
restore.log

Are you using runc with the changes from opencontainers/runc#1968? Could be there is some runc code missing to create /proc just as it did not create bind mount mountpoints before opencontainers/runc#1968. Please let me know how you are testing it so I can try to recreate.

@rst0git
Copy link
Contributor

rst0git commented Feb 6, 2019

Are you using runc with the changes from opencontainers/runc#1968?

Yes, I installed runc from opencontainers/runc#1968, criu from the criu-dev branch and podman from this PR.

Could be there is some runc code missing to create /proc just as it did not create bind mount mountpoints before opencontainers/runc#1968. Please let me know how you are testing it so I can try to recreate.

I followed these steps to create, checkpoint, and restore a container:

sudo podman run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
sudo podman container checkpoint looper -e /tmp/checkpoint.tar.gz
sudo podman container rm looper
sudo podman container restore -i /tmp/checkpoint.tar.gz

Even though the restore has failed in the last step, a looper container is still created (with status Created). Therefore, I can still run sudo podman container start looper to start the container.

@adrianreber
Copy link
Collaborator Author

@rst0git This is the same bug as in my runc pull request except that it is also true for non-bind mount mountpoints. It seems runc just creates all missing mountpoints even for read-only root filesystems. If you use a container image which includes all required mountpoints (in this case /run /proc /sys are missing) it should work. I will update my runc PR to handle also missing non-bind mount mountpoints.

@adrianreber
Copy link
Collaborator Author

@mheon Thanks for the review. At first I was unsure if I can do all the necessary changes, but now, after I have thought about it, I think I can actually do everything you suggested and the result will be better. I hope. So thanks for pointing it out.

@openshift-merge-robot
Copy link
Collaborator

/retest

@avagin
Copy link

avagin commented Feb 6, 2019

  • scp /tmp/checkpoint.tar.gz destination:/tmp
    What is inside /tmp/checkpoint.tar.gz? Do you snapshot a container rootfs?

@adrianreber
Copy link
Collaborator Author

  • scp /tmp/checkpoint.tar.gz destination:/tmp
    What is inside /tmp/checkpoint.tar.gz? Do you snapshot a container rootfs?

The checkpoint directory and the container definition (spec, config, network). No filesystem (yet). I was thinking about doing an automatic commit of the highest layer and including it also. But right now it is only the output of CRIU and some json files.

@adrianreber adrianreber force-pushed the migration branch 2 times, most recently from 74ab7b8 to 0ba0da7 Compare February 6, 2019 19:26
@adrianreber
Copy link
Collaborator Author

@mheon I tried to rework my changes to the newContainer() function. There is now a RestoreContainer() function. Please have a look if this is now a better approach.

@adrianreber
Copy link
Collaborator Author

All CI errors are the expected errors as long as the necessary runc patches have not been merged.

libpod/runtime_ctr.go Outdated Show resolved Hide resolved
@mheon
Copy link
Member

mheon commented Feb 6, 2019

I'll do a more thorough review tomorrow, but I'm generally in favor of the way NewContainer was split up.

Still a little iffy on copying over the OCI config... I need to check, but there shouldn't be much in there that isn't deterministically generated, so it might not be necessary.

We do need to make sure that the ContainerConfig we copied over makes sense before we use it - if the container is in a pod, we need a pod with the same ID present on the remote host, and the same holds for named volumes. We might want to make a generic sanity checker for ContainerConfig that we can use for both NewContainer and this.

@rh-atomic-bot
Copy link
Collaborator

☔ The latest upstream changes (presumably #2252) made this pull request unmergeable. Please resolve the merge conflicts.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2019
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
If restoring a container from a checkpoint it was necessary that the
image the container is based was already available (podman pull).

This commit adds the image download to podman container restore if it
does not exist.

Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber
Copy link
Collaborator Author

I think I was able to implememt all the changes from the latest review, let's see if the tests still pass.

@adrianreber adrianreber force-pushed the migration branch 4 times, most recently from 49b624d to 0fc0218 Compare June 4, 2019 09:38
The option to restore a container from an external checkpoint archive
(podman container restore -i /tmp/checkpoint.tar.gz) restores a
container with the same name and same ID as id had before checkpointing.

This commit adds the option '--name,-n' to 'podman container restore'.
With this option the restored container gets the name specified after
'--name,-n' and a new ID. This way it is possible to restore one
container multiple times.

If a container is restored with a new name Podman will not try to
request the same IP address for the container as it had during
checkpointing. This implicitly assumes that if a container is restored
from a checkpoint archive with a different name, that it will be
restored multiple times and restoring a container multiple times with
the same IP address will fail as each IP address can only be used once.

Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber
Copy link
Collaborator Author

All tests green again (after a few retries).

@adrianreber
Copy link
Collaborator Author

Any further comments regarding this PR?

@mheon
Copy link
Member

mheon commented Jun 6, 2019

Sorry, we've been in a bit of a rush trying to get the recent CVE patched.

I'm good to merge with one more LGTM

@rhatdan
Copy link
Member

rhatdan commented Jun 7, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 7, 2019
@openshift-merge-robot openshift-merge-robot merged commit 3461287 into containers:master Jun 7, 2019
@rh-atomic-bot rh-atomic-bot mentioned this pull request Jun 7, 2019
7 tasks
@adrianreber
Copy link
Collaborator Author

Thanks everyone for the reviews and the patience getting this merged.

// a container from one host to another
It("podman checkpoint container with export (migration)", func() {
// CRIU does not work with seccomp correctly on RHEL7
session := podmanTest.Podman([]string{"run", "-it", "--security-opt", "seccomp=unconfined", "-d", ALPINE, "top"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @adrianreber, I was wondering what is the reason for seccomp=unconfined, is there a GitHub issue for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rst0git : This is a RHEL7 - CRIU limitation. CRIU cannot handle seccomp with the RHEL7 kernel. I never really tried to understand why it does not work, but is does not work on the RHEL7 kernel. I was using RHEL7 as a development platform initially, that is why I worked around the seccomp limitations there. Not sure it is necessary to fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation Adrian! I was curious because the seccomp support seems to work on Fedora.

edsantiago added a commit to edsantiago/libpod that referenced this pull request Jun 10, 2019
Various small fixes to get BATS tests working again.
Split from containers#2947 because that one keeps getting stalled,
and I'm hoping these separate changes get approved.

I consider these changes urgent because RHEL8 gating
tests are failing, and will fail even more if/when containers#2272
gets picked up and packaged for RHEL8, and I consider
it important to have clean passing tests for RHEL8.

  * info test: 'insecure registries' is gone. A recent
    commit (d1a7378) changed the format of 'podman info',
    removing the 'insecure registries' key. Deal with it.

  * info test: remove check for .host.{Conmon,OCIRuntime}.package;
    the value on f28 and f29 is 'Unknown' (instead of an NVR).
    We can live without this check.

  * 'load' test: skip when running in CI, because stdin
    is not a tty.

  * container restore: fix arg processing. containers#2272 broke argument
    processing: 'podman container restore', with no args, should
    exit with 'argument required' error. Root cause is that the
    new --import option takes the place of an argument, so the
    checkAllAndLatest() call had to be changed to not exit on error.
    Workaround is (sigh) to copy/paste the skipped checkAllAndLatest()
    code, with minor tweaks to accommodate --import.

    *** FIXME FIXME FIXME! If I understand --import correctly,
    *** there should also be a check to prevent positional
    *** arguments with --import. Can someone please confirm/deny?

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this pull request Jun 11, 2019
Various small fixes to get BATS tests working again.
Split from containers#2947 because that one keeps getting stalled,
and I'm hoping these separate changes get approved.

I consider these changes urgent because RHEL8 gating
tests are failing, and will fail even more if/when containers#2272
gets picked up and packaged for RHEL8, and I consider
it important to have clean passing tests for RHEL8.

  * info test: 'insecure registries' is gone. A recent
    commit (d1a7378) changed the format of 'podman info',
    removing the 'insecure registries' key. Deal with it.

  * info test: remove check for .host.{Conmon,OCIRuntime}.package;
    the value on f28 and f29 is 'Unknown' (instead of an NVR).
    We can live without this check.

  * 'load' test: skip when running in CI, because stdin
    is not a tty.

  * container restore: fix arg processing. containers#2272 broke argument
    processing: 'podman container restore', with no args, should
    exit with 'argument required' error. Root cause is that the
    new --import option takes the place of an argument, so the
    checkAllAndLatest() call had to be changed to not exit on error.
    Workaround is (sigh) to copy/paste the skipped checkAllAndLatest()
    code, with minor tweaks to accommodate --import.

Signed-off-by: Ed Santiago <santiago@redhat.com>
edsantiago added a commit to edsantiago/libpod that referenced this pull request Jun 11, 2019
Various small fixes to get BATS tests working again.
Split from containers#2947 because that one keeps getting stalled,
and I'm hoping these separate changes get approved.

I consider these changes urgent because RHEL8 gating
tests are failing, and will fail even more if/when containers#2272
gets picked up and packaged for RHEL8, and I consider
it important to have clean passing tests for RHEL8.

  * info test: 'insecure registries' is gone. A recent
    commit (d1a7378) changed the format of 'podman info',
    removing the 'insecure registries' key. Deal with it.

  * info test: remove check for .host.{Conmon,OCIRuntime}.package;
    the value on f28 and f29 is 'Unknown' (instead of an NVR).
    We can live without this check.

  * 'load' test: skip when running in CI, because stdin
    is not a tty.

  * container restore: fix arg processing. containers#2272 broke argument
    processing: 'podman container restore', with no args, should
    exit with 'argument required' error. Root cause is that the
    new --import option takes the place of an argument, so the
    checkAllAndLatest() call had to be changed to not exit on error.
    Workaround is (sigh) to copy/paste the skipped checkAllAndLatest()
    code, with minor tweaks to accommodate --import.

Signed-off-by: Ed Santiago <santiago@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants