Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRIU support #71

Closed
giuseppe opened this issue Aug 12, 2019 · 35 comments
Closed

CRIU support #71

giuseppe opened this issue Aug 12, 2019 · 35 comments

Comments

@giuseppe
Copy link
Member

crun should support checkpointing and restoring running containers.

@giuseppe
Copy link
Member Author

@adrianreber I've not even looked into it so I've no idea at all. How much work would it be to add CRIU support to crun?

@adrianreber
Copy link
Contributor

Reading about crun I also already thought about it. Hard to tell how much work it would actually be, definitely doable

The bigger questions is how to communicate between crun and criu. runc starts criu in a special RPC mode and then transfers protobuf messages to criu. For a C based program there are two approaches, either the same as runc, but that requires protobuf-c to translate protobuf to C code and some code around starting criu and sending the messages. All this is already implemented in libcriu, crun could just directly link against libcriu.

How do you feel about adding a library dependency for criu support to crun? Or would you prefer to include the criu wrapper code directly into the crun code base (like one header and one (or two) C file) without adding an additional library?

@giuseppe
Copy link
Member Author

I think the libcriu approach is to prefer as I see it offers a nicer C integration.

I see no problems with the new dependency as long as we have a way to disable it at ./configure time as we can do with systemd. The dependency and the CRIU support can be enabled by default.

@rhatdan
Copy link
Member

rhatdan commented Aug 19, 2019

Since we plan on defaulting to crun in Fedora 31, I think we need to get going on this. or CRIU will not work with it correct?

@rhatdan
Copy link
Member

rhatdan commented Sep 11, 2019

@adrianreber @giuseppe Any movement on this, or will CRIU be broken in F31?

@adrianreber
Copy link
Contributor

I did not had a chance to work on this. So currently there is no CRIU support in crun from my side.

@rhatdan
Copy link
Member

rhatdan commented Sep 11, 2019

Well then it will be broken in F31.

@crosbymichael
Copy link
Contributor

Thinking about this for a while, it would be nice to move this functionality out of "runc/crun". I think it can be done. Most of the runc specific code is around state management of the container. It would be worth it to factor this out into higher layers and provide the correct hooks in crun to do this while being controlled from above.

@adrianreber
Copy link
Contributor

It would be worth it to factor this out into higher layers and provide the correct hooks in crun to do this while being controlled from above.

@crosbymichael This sounds interesting. Could you describe your idea a bit more in detail? I like the idea but I do not yet understand how this would look like. What would be in crun/runc, what would be in the upper layers? How would those higher layers interact with crun/runc?

@adrianreber
Copy link
Contributor

@giuseppe I started to integrate CRIU support into crun and I am hitting a problem that CRIU cannot checkpoint a process with the following error message:

(00.042961) sid=0 pgid=0 pid=1
(00.042964) Error (criu/cr-dump.c:1327): A session leader of 155391(1) is outside of its pid namespace

Looking closer at this I see a difference between a process running in crun (compared to runc). The following is from a Fedora 31 system running a RHEL7 based container:

[root@fedora01 rhel7-httpd]# runc run rhel7-httpd -d -b /runc/containers/rhel7-httpd/
[root@fedora01 rhel7-httpd]# runc exec rhel7-httpd cat /proc/1/status | grep NSsid
NSsid:	1
[root@fedora01 rhel7-httpd]# runc delete -f rhel7-httpd
[root@fedora01 rhel7-httpd]# crun run -d -b /runc/containers/rhel7-httpd/  rhel7-httpd 
[root@fedora01 rhel7-httpd]# crun exec rhel7-httpd cat /proc/1/status | grep NSsid
NSsid:	0

CRIU's error message show the same, that sid=0. CRIU does a getsid() from within the process to be checkpointed to figure out the SID.

Do you know if there a specific reason crun behaves differently than runc for this?

Checkpointing a runc container CRIU gives me following information about PID 1 in the container:

(00.064662) sid=1 pgid=1 pid=1

So sid and pgid is different when running crun.

CRIU has the following check in its code:

        if (item->sid == 0) {
                pr_err("A session leader of %d(%d) is outside of its pid namespace\n",
                        item->pid->real, vpid(item));
                goto err_cure;
        }

@giuseppe
Copy link
Member Author

@adrianreber could it be related to the setsid() call crun does in src/libcrun/linux.c?

@giuseppe
Copy link
Member Author

I'll take a look to understand what is going on

@giuseppe
Copy link
Member Author

I think the setsid should be inconditional, does this patch work for you?

diff --git a/src/libcrun/container.c b/src/libcrun/container.c
index a71bfdd..76645b4 100644
--- a/src/libcrun/container.c
+++ b/src/libcrun/container.c
@@ -676,14 +676,14 @@ container_init_setup (void *args, const char *notify_socket,
         }
     }
 
+  ret = setsid ();
+  if (UNLIKELY (ret < 0))
+    return crun_make_error (err, errno, "setsid");
+
   if (has_terminal)
     {
       cleanup_close int terminal_fd = -1;
 
-      ret = setsid ();
-      if (UNLIKELY (ret < 0))
-        return crun_make_error (err, errno, "setsid");
-
       fflush (stderr);
 
       terminal_fd = libcrun_set_terminal (container, err);
diff --git a/src/libcrun/linux.c b/src/libcrun/linux.c
index 145a77a..ea493c5 100644
--- a/src/libcrun/linux.c
+++ b/src/libcrun/linux.c
@@ -2838,12 +2838,6 @@ libcrun_join_process (libcrun_container_t *container, pid_t pid_to_join, libcrun
   for (i = 0; all_namespaces[i]; i++)
     close_and_reset (&fds[i]);
 
-  if (detach && setsid () < 0)
-    {
-      crun_make_error (err, errno, "setsid");
-      goto exit;
-    }
-
   /* We need to fork once again to join the PID namespace.  */
   pid = fork ();
   if (UNLIKELY (pid < 0))
@@ -2881,9 +2875,6 @@ libcrun_join_process (libcrun_container_t *container, pid_t pid_to_join, libcrun
         {
           cleanup_close int master_fd = -1;
 
-          if (setsid () < 0)
-            libcrun_fail_with_error (errno, "setsid");
-
           master_fd = open_terminal (container, &slave, err);
           if (UNLIKELY (master_fd < 0))
             {
@@ -2899,6 +2890,9 @@ libcrun_join_process (libcrun_container_t *container, pid_t pid_to_join, libcrun
             }
         }
 
+      if (setsid () < 0)
+        libcrun_fail_with_error (errno, "setsid");
+
       if (r != 0)
         _exit (EXIT_FAILURE);

@adrianreber
Copy link
Contributor

@giuseppe Thanks. With that patch I do not see the session leader error any more.

@adrianreber
Copy link
Contributor

@giuseppe Another thing I am having a problem is, is /dev/null. If stdin, stdout, stderr point to/dev/null runc reopens /dev/null inside of the container (func reOpenDevNull()). Does crun also do that? I haven't seen it so far.

It seems like /dev/null still points to the outside of the container.

@giuseppe
Copy link
Member Author

@adrianreber no, it is not currently done. I will add that

@giuseppe
Copy link
Member Author

@adrianreber is that currently blocking you?

@adrianreber
Copy link
Contributor

@adrianreber is that currently blocking you?

Yes, but knowing that it is missing I can look into it myself. If you have a patch I am happy to use that, but I can try to provide a patch myself.

@giuseppe
Copy link
Member Author

Yes, but knowing that it is missing I can look into it myself. If you have a patch I am happy to use that, but I can try to provide a patch myself.

ok thanks. CRIU support is very important, so if there is anything I can do to help with it, just let me know.

Is the mnt_id the only visible effect of re-opening /dev/null inside of the container?

@rst0git
Copy link
Contributor

rst0git commented Feb 12, 2020

@adrianreber I think the error occurs because in crun /dev/null is mounted on masked paths:

ret = do_mount (container, "/dev/null", pathfd, path, NULL, MS_BIND | MS_UNBINDABLE | MS_REC, NULL, 0, err);

The same is done in runc, but it also adds /dev/null as an external mount map for CRIU: https://github.com/opencontainers/runc/blob/f6fb7a0338c3ea8488bd9bd7cc7667b113aff8d8/libcontainer/container_linux.go#L870L872

@adrianreber
Copy link
Contributor

@rst0git I thought I am already correctly handling the masked paths. Let me have a closer look at this, but I can definitely see that runc uses /dev/null from the inside of the container and crun /dev/null from the outside.

@adrianreber
Copy link
Contributor

If I am pointing stdin, stdout, stderr to the /dev/null in the container if the point to /dev/null on the outside then I successfully checkpoint the container.

Basic checkpointing works for me now with @giuseppe setsid patch from above. I will open a PR for the /dev/null handling.

@giuseppe
Copy link
Member Author

Basic checkpointing works for me now with @giuseppe setsid patch from above.

that is great news! Do you want me to prepare a PR with the setsid patch above?

@adrianreber
Copy link
Contributor

that is great news! Do you want me to prepare a PR with the setsid patch above?

Yes, please.

@giuseppe
Copy link
Member Author

@adrianreber opened here: #267

@adrianreber
Copy link
Contributor

Question how to continue here. My main 'problem' is Podman. Podman currently figures out if the OCI runtime supports checkpointing by running the subcommand 'checkpoint' and if it is successful it assume this OCI runtime has all the features Podman needs. This basically means that I cannot add my checkpoint/restore patches to crun before if is ready for Podman to actually being used.

On the other hand this means that it will be one huge patch once I am ready. I think smaller patches/pull requests would make more sense as they are easier to review. Is there a way to introduce an experimental feature into crun? One idea I had was not mentioning it in crun --help and instead of calling the subcommands checkpoint and restore maybe prefix if with an underscore (_checkpoint, _restore). This way it could be tested by anyone interested but it would not be immediately obvious that it exists and Podman would also not detect it if it is prefixed with an underscore. Once everything is ready I would remove the prefix.

Would this be a workable approach for you? Any other ideas how to handle this best?

@giuseppe
Copy link
Member Author

yes I agree smaller patches are better, so we can already start playing with it.

I am fine with _checkpoint and _restore as well as using them something different like checkpoint-experimental

@adrianreber
Copy link
Contributor

@giuseppe If I want to pass multiple checkpoint/restore options from the subcommand parser down to libcrun, should I create a new structure in src/libcrun/container.h or make the new checkpoint/restore specific members part of the context structure?

@giuseppe
Copy link
Member Author

both are fine, as long as the new features are guarded by macros so it is possible to disable CRIU att build time if needed

@adrianreber
Copy link
Contributor

With the merge of #342 this is now almost done. Once CRIU 3.14 is released I will bump crun's CRIU minimum version to 3.14 and enable the checkpoint/restore subcommands.

@giuseppe
Copy link
Member Author

With the merge of #342 this is now almost done. Once CRIU 3.14 is released I will bump crun's CRIU minimum version to 3.14 and enable the checkpoint/restore subcommands.

I think we can already enable the subcommands and fix the dependency in the .spec file once I cut a new release. Do you think the dependency should be at compile time?

@adrianreber
Copy link
Contributor

I am just fixing one more thing. So unfortunately not ready to be enabled. I tried to run Podman's checkpoint/restore tests and almost all tests are failing because the tests are using -t and the crun with CRIU cannot not yet correctly connect the console-socket back. This needs one more fix.

@adrianreber
Copy link
Contributor

This change in CRIU (checkpoint-restore/criu#1063) to be able to run Podman's checkpoint/restore tests with crun. Once these CRIU changes are merged I can open the corresponding crun PR.

@rhatdan
Copy link
Member

rhatdan commented Sep 30, 2020

@adrianreber Any update on this issue?

@giuseppe
Copy link
Member Author

giuseppe commented Nov 9, 2020

closed by #480 🎉 🎉 🎉 🎉 🎉

@giuseppe giuseppe closed this as completed Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants