New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRIU support #71
Comments
@adrianreber I've not even looked into it so I've no idea at all. How much work would it be to add CRIU support to crun? |
Reading about crun I also already thought about it. Hard to tell how much work it would actually be, definitely doable The bigger questions is how to communicate between crun and criu. runc starts criu in a special RPC mode and then transfers protobuf messages to criu. For a C based program there are two approaches, either the same as runc, but that requires protobuf-c to translate protobuf to C code and some code around starting criu and sending the messages. All this is already implemented in libcriu, crun could just directly link against libcriu. How do you feel about adding a library dependency for criu support to crun? Or would you prefer to include the criu wrapper code directly into the crun code base (like one header and one (or two) C file) without adding an additional library? |
I think the libcriu approach is to prefer as I see it offers a nicer C integration. I see no problems with the new dependency as long as we have a way to disable it at |
Since we plan on defaulting to crun in Fedora 31, I think we need to get going on this. or CRIU will not work with it correct? |
@adrianreber @giuseppe Any movement on this, or will CRIU be broken in F31? |
I did not had a chance to work on this. So currently there is no CRIU support in crun from my side. |
Well then it will be broken in F31. |
Thinking about this for a while, it would be nice to move this functionality out of "runc/crun". I think it can be done. Most of the runc specific code is around state management of the container. It would be worth it to factor this out into higher layers and provide the correct hooks in crun to do this while being controlled from above. |
@crosbymichael This sounds interesting. Could you describe your idea a bit more in detail? I like the idea but I do not yet understand how this would look like. What would be in crun/runc, what would be in the upper layers? How would those higher layers interact with crun/runc? |
@giuseppe I started to integrate CRIU support into crun and I am hitting a problem that CRIU cannot checkpoint a process with the following error message:
Looking closer at this I see a difference between a process running in crun (compared to runc). The following is from a Fedora 31 system running a RHEL7 based container:
CRIU's error message show the same, that Do you know if there a specific reason crun behaves differently than runc for this? Checkpointing a runc container CRIU gives me following information about PID 1 in the container:
So CRIU has the following check in its code:
|
@adrianreber could it be related to the |
I'll take a look to understand what is going on |
I think the diff --git a/src/libcrun/container.c b/src/libcrun/container.c
index a71bfdd..76645b4 100644
--- a/src/libcrun/container.c
+++ b/src/libcrun/container.c
@@ -676,14 +676,14 @@ container_init_setup (void *args, const char *notify_socket,
}
}
+ ret = setsid ();
+ if (UNLIKELY (ret < 0))
+ return crun_make_error (err, errno, "setsid");
+
if (has_terminal)
{
cleanup_close int terminal_fd = -1;
- ret = setsid ();
- if (UNLIKELY (ret < 0))
- return crun_make_error (err, errno, "setsid");
-
fflush (stderr);
terminal_fd = libcrun_set_terminal (container, err);
diff --git a/src/libcrun/linux.c b/src/libcrun/linux.c
index 145a77a..ea493c5 100644
--- a/src/libcrun/linux.c
+++ b/src/libcrun/linux.c
@@ -2838,12 +2838,6 @@ libcrun_join_process (libcrun_container_t *container, pid_t pid_to_join, libcrun
for (i = 0; all_namespaces[i]; i++)
close_and_reset (&fds[i]);
- if (detach && setsid () < 0)
- {
- crun_make_error (err, errno, "setsid");
- goto exit;
- }
-
/* We need to fork once again to join the PID namespace. */
pid = fork ();
if (UNLIKELY (pid < 0))
@@ -2881,9 +2875,6 @@ libcrun_join_process (libcrun_container_t *container, pid_t pid_to_join, libcrun
{
cleanup_close int master_fd = -1;
- if (setsid () < 0)
- libcrun_fail_with_error (errno, "setsid");
-
master_fd = open_terminal (container, &slave, err);
if (UNLIKELY (master_fd < 0))
{
@@ -2899,6 +2890,9 @@ libcrun_join_process (libcrun_container_t *container, pid_t pid_to_join, libcrun
}
}
+ if (setsid () < 0)
+ libcrun_fail_with_error (errno, "setsid");
+
if (r != 0)
_exit (EXIT_FAILURE); |
@giuseppe Thanks. With that patch I do not see the session leader error any more. |
@giuseppe Another thing I am having a problem is, is It seems like |
@adrianreber no, it is not currently done. I will add that |
@adrianreber is that currently blocking you? |
Yes, but knowing that it is missing I can look into it myself. If you have a patch I am happy to use that, but I can try to provide a patch myself. |
ok thanks. CRIU support is very important, so if there is anything I can do to help with it, just let me know. Is the |
@adrianreber I think the error occurs because in crun Line 1020 in a1c0eff
The same is done in runc, but it also adds /dev/null as an external mount map for CRIU: https://github.com/opencontainers/runc/blob/f6fb7a0338c3ea8488bd9bd7cc7667b113aff8d8/libcontainer/container_linux.go#L870L872
|
@rst0git I thought I am already correctly handling the masked paths. Let me have a closer look at this, but I can definitely see that runc uses /dev/null from the inside of the container and crun /dev/null from the outside. |
If I am pointing stdin, stdout, stderr to the Basic checkpointing works for me now with @giuseppe setsid patch from above. I will open a PR for the |
that is great news! Do you want me to prepare a PR with the setsid patch above? |
Yes, please. |
@adrianreber opened here: #267 |
Question how to continue here. My main 'problem' is Podman. Podman currently figures out if the OCI runtime supports checkpointing by running the subcommand 'checkpoint' and if it is successful it assume this OCI runtime has all the features Podman needs. This basically means that I cannot add my checkpoint/restore patches to crun before if is ready for Podman to actually being used. On the other hand this means that it will be one huge patch once I am ready. I think smaller patches/pull requests would make more sense as they are easier to review. Is there a way to introduce an experimental feature into crun? One idea I had was not mentioning it in Would this be a workable approach for you? Any other ideas how to handle this best? |
yes I agree smaller patches are better, so we can already start playing with it. I am fine with |
@giuseppe If I want to pass multiple checkpoint/restore options from the subcommand parser down to libcrun, should I create a new structure in src/libcrun/container.h or make the new checkpoint/restore specific members part of the context structure? |
both are fine, as long as the new features are guarded by macros so it is possible to disable CRIU att build time if needed |
With the merge of #342 this is now almost done. Once CRIU 3.14 is released I will bump crun's CRIU minimum version to 3.14 and enable the checkpoint/restore subcommands. |
I think we can already enable the subcommands and fix the dependency in the .spec file once I cut a new release. Do you think the dependency should be at compile time? |
I am just fixing one more thing. So unfortunately not ready to be enabled. I tried to run Podman's checkpoint/restore tests and almost all tests are failing because the tests are using |
This change in CRIU (checkpoint-restore/criu#1063) to be able to run Podman's checkpoint/restore tests with crun. Once these CRIU changes are merged I can open the corresponding crun PR. |
@adrianreber Any update on this issue? |
closed by #480 🎉 🎉 🎉 🎉 🎉 |
crun should support checkpointing and restoring running containers.
The text was updated successfully, but these errors were encountered: