Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recreate mountpoints during restore #342

Merged
merged 1 commit into from Apr 28, 2020

Conversation

adrianreber
Copy link
Contributor

If restoring a container from an unmodified root file-system certain mountpoints have to be re-created during restore. This happens when using Podman to restore from an exported checkpoint.

In the case of 'podman container checkpoint -l', the checkpointed container is not destroyed and the root file-system is kept.

If restoring from an exported checkpoint using 'podman container restore --import=export.tar.gz' the root file-system is re-created and certain mountpoints will be missing. A good example is
'/run/.containerenv'. Usually no container has the mountpoint for '/run/.containerenv' so crun creates the mountpoint and bind mounts something on top of it as described in config.json when the container is created.

During restore this mountpoint has to be created just as during initial container creation and this is what this commit now does. If the mountpoint is not created CRIU fails during restore when trying to
remount, in this case, the corresponding bind mount.

With this it is now possible to migrate containers using Podman.

Before enabling checkpoint in crun I would like to add support to let CRIU use the freezer instead of the current ptrace() based process pausing and I would like to wait for CRIU 3.14 and make crun depend on CRIU 3.14 to make sure Fedora 31 can use the Podman/crun/CRIU combination.

Related to #71

CC: @kolyshkin (just as you have been active around CRIU recently)

@giuseppe
Copy link
Member

thanks.

LGTM

@giuseppe
Copy link
Member

I am fine to merge it, but I'll wait for a review from @kolyshkin since you've CC'ed him

src/libcrun/criu.c Outdated Show resolved Hide resolved
int is_dir = 1;
size_t j;

/* cgroup restore should be handled by CRIU itself */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is oversimplification :)

Also I believe in case of cgroup v2 the /sys/fs/cgroup needs to be created (see e.g. opencontainers/runc@f7d0e37#diff-7b8effb45402944e445a664e4d9c296dR1188)

have you tested this on F31 with cgroup v2 unified (i.e. without systemd.unified_cgroup_hierarchy=0)? From a quick glance it should not work unless /sys/fs/cgroup mp is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have only tested on F31 with cgroup2. Mounting /sys always provides /sys/fs/cgroup, so the comment is wrong, but it should work for cgroup1 and cgroup2. Actually, let me test i quickly... Yes, rebooted F31 with cgroup1 and it also works.

Works for cgroup1 as /sys/fs/cgroup is created by mounting sysfs and all other cgroup mountpoints are in a tmpfs restored by CRIU. For cgroup2 it is only one mountpoint: /sys/fs/cgroup

No idea if we need to think about the case of no sysfs but a mounted cgroup. That would not work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was dead wrong about the need to create a mountpoint for /sys/fs/cgroup.

src/libcrun/criu.c Outdated Show resolved Hide resolved
/* Check if the parent mount is on a tmpfs. CRIU restores
* all tmpfs. We do need to recreate directories on a tmpfs. */
if (strncmp (dest, dest_loop, strlen (dest_loop)) == 0 &&
strcmp (def->mounts[j]->type, "tmpfs") == 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tmpfs check need to be moved to right after for (j = 0, ..., i.e.

for (j = 0; j < def->mounts_len; j++)
 {
  cleanup_free char *dest_loop = NULL;

  if strcmp (def->mounts[j]->type, "tmpfs") != 0)
   continue;
...

(and looks like = NULL will be needed in this case, so please ignore my earlier comment about it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what you mean with 'the tmpfs check need to be moved right after'.

Ah, maybe that if it is on a tmpfs I do not need to compare the paths at all...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I mean as I wrote it in the example above. The current code could have avoided allocating memory for dest_loop, the copy, and the comparison.

src/libcrun/criu.c Outdated Show resolved Hide resolved
src/libcrun/criu.c Outdated Show resolved Hide resolved
{
int ret;

ret = crun_safe_ensure_file_at (root_fd, root, strlen (root), dest,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly I don't remember runc creating files, only directories.

The only way such files might appear here I can think of are masked path under /proc or /sys, and obviously those do not need to be created.

Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, runc does not create files, that is correct, but as crun at other places in the code did differentiate between files and directories I also used it here. One example is /run/.containerenv. I guess it is not really necessary, but I was just following existing crun code.

Copy link
Collaborator

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits and questions

If restoring a container from an unmodified root file-system certain
mountpoints have to be re-created during restore. This happens when
using Podman to restore from an exported checkpoint.

In the case of 'podman container checkpoint -l', the checkpointed
container is not destroyed and the root file-system is kept.

If restoring from an exported checkpoint using
'podman container restore --import=export.tar.gz' the root file-system
is re-created and certain mountpoints will be missing. A good example is
'/run/.containerenv'. Usually no container has the mountpoint for
'/run/.containerenv' so crun creates the mountpoint and bind mounts
something on top of it as described in config.json when the container is
created.

During restore this mountpoint has to be created just as during initial
container creation and this is what this commit now does. If the
mountpoint is not created CRIU fails during restore when trying to
remount, in this case, the corresponding bind mount.

With this it is now possible to migrate containers using Podman.

Signed-off-by: Adrian Reber <areber@redhat.com>
@adrianreber
Copy link
Contributor Author

@kolyshkin thanks for the review. I addressed a couple of your comments. Not sure about the one where you suggested moving the tmpfs check. Please have another look.

@giuseppe
Copy link
Member

@kolyshkin does the new version LGTY?

@giuseppe
Copy link
Member

I'll merge this PR. If there are further comments we can address them as follow-ups.

@giuseppe giuseppe merged commit f8ee151 into containers:master Apr 28, 2020
@adrianreber adrianreber mentioned this pull request Apr 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants