Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker checkpoint an experimental feature #718

Closed
ashu-mehra opened this issue Jun 8, 2019 · 12 comments
Closed

docker checkpoint an experimental feature #718

ashu-mehra opened this issue Jun 8, 2019 · 12 comments
Labels

Comments

@ashu-mehra
Copy link
Contributor

Would any one know why is docker checkpoint an experimental feature and not suggested to be used in production as per the official doc - https://docs.docker.com/engine/reference/commandline/checkpoint/
Does it have anything to do with stability of criu or is there any security concern or something else?
And what can be done to be able to use criu in production environments?

@rppt
Copy link
Member

rppt commented Jun 8, 2019

AFAIK, the experimental part is the integration of CRIU in docker. CRIU is used in production with other container engines, see e.g.
https://www.linuxplumbersconf.org/event/2/contributions/69/

cc @kolyshkin

@adrianreber
Copy link
Member

Just yesterday my work on supporting container migration in Podman was merged (containers/podman#2272). This will be part of the upcoming Podman 1.4.0 release. Maybe this is something you could try out.

I just added some documentation to the CRIU wiki concerning CRIU's Podman integration at: https://criu.org/Podman

As I am actively working on Podman's checkpoint/restore support and container migration support I would be interested to know if this is something you could use.

@ashu-mehra
Copy link
Contributor Author

@adrianreber thanks for providing the links.
I understand live migration is one of the main use-case for CRIU, but we are actually looking at using it for startup improvements, especially for JVM based applications, i.e. create a checkpoint once application's startup phase is over, and then for each new instance of the application just restore it from the checkpoint.
Have you guys looked into that use-case or do you know if anyone is looking into it actively? Any (potential) problems (like security concerns, or usability/functional issues) that you think may come up in using CRIU like this?

@adrianreber
Copy link
Member

@ashu-mehra If you look at https://criu.org/Podman there is a link to a recording: https://asciinema.org/a/249922

In that recording I am basically doing what you are describing.

I am starting Wildfly once. That takes about 8 seconds. Then I am checkpointing it and restoring it multiple times from the checkpoint. Starting Wildfly from the checkpoint only takes 4 seconds.

In that example I can reduce container start up time by 50%.

There has been a FOSDEM talk about including CRIU into the JVM which also tries to reduce start up time: https://fosdem.org/2019/schedule/event/checkpoint_restore/

The biggest concerns with the JVM is that the restarted JVM has somehow to detect the new environment on the new systems and adapt itself to the new environment (like CPUS, number of GC threads, hostname).

@chflood
Copy link

chflood commented Jun 19, 2019

@ashu-mehra

I have a prototype of a Java API to allow you to call CRIU from Java.
You can see it here: https://github.com/chflood/CRIUForJava/

Comments welcome

Christine

@kirs
Copy link

kirs commented Jun 24, 2019

@adrianreber super excited to see your work on podman pull + podman container restore that accepts snapshot that was possibly made on another host.

Is there anything similar planned for Docker? From what I understand, docker start --checkpoint only works on the same host as where the snapshot was collected. Is there a workaround?

@adrianreber
Copy link
Member

@kirs I am not familiar with Docker's checkpoint/restore implementation. I cannot say anything about if there are any plans to continue working on it.

Just for the record: podman pull is not really required before doing'podman container restore. That will happen automatically.

@ashu-mehra
Copy link
Contributor Author

@adrianreber @chflood
Thanks for providing your inputs on startup use-case for CRIU. We have some internal discussions on using CRIU for improving startup performance of applications running in container, and there were some points raised regarding security aspect of restored applications:

  1. Since every instance of application restored from the checkpoint would be running with same address map, would it nullify the benefit of address space layout randomization, thereby making the application vulnerable?
  2. The checkpoint can potentially contain application secrets and keys depending on when the checkpointing is done, which is again a potential security issue.
  3. How would random generation algorithms behave in restored applications? Would it make them predictable?

It would be good to hear the view of the community on these issues, and whether or not something can be or need to be done.
Thanks again!

@adrianreber
Copy link
Member

@ashu-mehra as you are explicitly tagging me in your question, I will answer, but probably not a very helpful answer. I guess that all your security concerns are true. Depending on the control you have over your application you could make sure that secrets are removed from the memory before doing the checkpoint and you could reseed your random generation algorithms after the restore.

@ashu-mehra
Copy link
Contributor Author

@adrianreber - thanks for quick response.

Depending on the control you have over your application you could make sure that secrets are removed from the memory before doing the checkpoint and you could reseed your random generation algorithms after the restore.

Makes sense and we had similar thoughts on tackling these which is why I mentioned depending on when the checkpointing is done.
Can address space layout randomization be addressed in some way? One workaround could be to recreate the checkpoint periodically. Any other thoughts?

@avagin avagin added the docker label Jul 22, 2019
@LeBovin
Copy link

LeBovin commented May 10, 2020

Do we have any details about how experimental it is ?

@adrianreber
Copy link
Member

Do we have any details about how experimental it is ?

I am not aware of any one working on it actively. At least, as far as I know, no one has contacted the CRIU community about it. As I have implemented the checkpoint/restore feature for Podman I never really looked into the docker code, so I do not no any details, but neither on the github bug tracker or on the CRIU mailing-list are any communications concerning the docker checkpoint integration. Only users having problems with it. If you need checkpoint/restore please try out Podman's checkpoint/restore support.

I am closing this ticket as we cannot answer that question here for almost one year so we will probably not have an answer any time soon. Podman's checkpoint/restore support is not marked experimental and if there are any problems I am happy to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants