New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recreate mountpoints during restore #342
Conversation
thanks. LGTM |
I am fine to merge it, but I'll wait for a review from @kolyshkin since you've CC'ed him |
int is_dir = 1; | ||
size_t j; | ||
|
||
/* cgroup restore should be handled by CRIU itself */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is oversimplification :)
Also I believe in case of cgroup v2 the /sys/fs/cgroup needs to be created (see e.g. opencontainers/runc@f7d0e37#diff-7b8effb45402944e445a664e4d9c296dR1188)
have you tested this on F31 with cgroup v2 unified (i.e. without systemd.unified_cgroup_hierarchy=0
)? From a quick glance it should not work unless /sys/fs/cgroup mp is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have only tested on F31 with cgroup2. Mounting /sys
always provides /sys/fs/cgroup
, so the comment is wrong, but it should work for cgroup1 and cgroup2. Actually, let me test i quickly... Yes, rebooted F31 with cgroup1 and it also works.
Works for cgroup1 as /sys/fs/cgroup
is created by mounting sysfs and all other cgroup mountpoints are in a tmpfs restored by CRIU. For cgroup2 it is only one mountpoint: /sys/fs/cgroup
No idea if we need to think about the case of no sysfs but a mounted cgroup. That would not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was dead wrong about the need to create a mountpoint for /sys/fs/cgroup
.
/* Check if the parent mount is on a tmpfs. CRIU restores | ||
* all tmpfs. We do need to recreate directories on a tmpfs. */ | ||
if (strncmp (dest, dest_loop, strlen (dest_loop)) == 0 && | ||
strcmp (def->mounts[j]->type, "tmpfs") == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tmpfs check need to be moved to right after for (j = 0, ...
, i.e.
for (j = 0; j < def->mounts_len; j++)
 {
  cleanup_free char *dest_loop = NULL;
  if strcmp (def->mounts[j]->type, "tmpfs") != 0)
   continue;
...
(and looks like = NULL
will be needed in this case, so please ignore my earlier comment about it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand what you mean with 'the tmpfs check need to be moved right after'.
Ah, maybe that if it is on a tmpfs I do not need to compare the paths at all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I mean as I wrote it in the example above. The current code could have avoided allocating memory for dest_loop, the copy, and the comparison.
{ | ||
int ret; | ||
|
||
ret = crun_safe_ensure_file_at (root_fd, root, strlen (root), dest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly I don't remember runc creating files, only directories.
The only way such files might appear here I can think of are masked path under /proc
or /sys
, and obviously those do not need to be created.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, runc does not create files, that is correct, but as crun at other places in the code did differentiate between files and directories I also used it here. One example is /run/.containerenv
. I guess it is not really necessary, but I was just following existing crun code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nits and questions
If restoring a container from an unmodified root file-system certain mountpoints have to be re-created during restore. This happens when using Podman to restore from an exported checkpoint. In the case of 'podman container checkpoint -l', the checkpointed container is not destroyed and the root file-system is kept. If restoring from an exported checkpoint using 'podman container restore --import=export.tar.gz' the root file-system is re-created and certain mountpoints will be missing. A good example is '/run/.containerenv'. Usually no container has the mountpoint for '/run/.containerenv' so crun creates the mountpoint and bind mounts something on top of it as described in config.json when the container is created. During restore this mountpoint has to be created just as during initial container creation and this is what this commit now does. If the mountpoint is not created CRIU fails during restore when trying to remount, in this case, the corresponding bind mount. With this it is now possible to migrate containers using Podman. Signed-off-by: Adrian Reber <areber@redhat.com>
b1db180
to
a84598a
Compare
@kolyshkin thanks for the review. I addressed a couple of your comments. Not sure about the one where you suggested moving the tmpfs check. Please have another look. |
@kolyshkin does the new version LGTY? |
I'll merge this PR. If there are further comments we can address them as follow-ups. |
If restoring a container from an unmodified root file-system certain mountpoints have to be re-created during restore. This happens when using Podman to restore from an exported checkpoint.
In the case of 'podman container checkpoint -l', the checkpointed container is not destroyed and the root file-system is kept.
If restoring from an exported checkpoint using 'podman container restore --import=export.tar.gz' the root file-system is re-created and certain mountpoints will be missing. A good example is
'/run/.containerenv'. Usually no container has the mountpoint for '/run/.containerenv' so crun creates the mountpoint and bind mounts something on top of it as described in config.json when the container is created.
During restore this mountpoint has to be created just as during initial container creation and this is what this commit now does. If the mountpoint is not created CRIU fails during restore when trying to
remount, in this case, the corresponding bind mount.
With this it is now possible to migrate containers using Podman.
Before enabling checkpoint in crun I would like to add support to let CRIU use the freezer instead of the current ptrace() based process pausing and I would like to wait for CRIU 3.14 and make crun depend on CRIU 3.14 to make sure Fedora 31 can use the Podman/crun/CRIU combination.
Related to #71
CC: @kolyshkin (just as you have been active around CRIU recently)