-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker checkpoint an experimental feature #718
Comments
AFAIK, the experimental part is the integration of CRIU in docker. CRIU is used in production with other container engines, see e.g. cc @kolyshkin |
Just yesterday my work on supporting container migration in Podman was merged (containers/podman#2272). This will be part of the upcoming Podman 1.4.0 release. Maybe this is something you could try out. I just added some documentation to the CRIU wiki concerning CRIU's Podman integration at: https://criu.org/Podman As I am actively working on Podman's checkpoint/restore support and container migration support I would be interested to know if this is something you could use. |
@adrianreber thanks for providing the links. |
@ashu-mehra If you look at https://criu.org/Podman there is a link to a recording: https://asciinema.org/a/249922 In that recording I am basically doing what you are describing. I am starting Wildfly once. That takes about 8 seconds. Then I am checkpointing it and restoring it multiple times from the checkpoint. Starting Wildfly from the checkpoint only takes 4 seconds. In that example I can reduce container start up time by 50%. There has been a FOSDEM talk about including CRIU into the JVM which also tries to reduce start up time: https://fosdem.org/2019/schedule/event/checkpoint_restore/ The biggest concerns with the JVM is that the restarted JVM has somehow to detect the new environment on the new systems and adapt itself to the new environment (like CPUS, number of GC threads, hostname). |
I have a prototype of a Java API to allow you to call CRIU from Java. Comments welcome Christine |
@adrianreber super excited to see your work on Is there anything similar planned for Docker? From what I understand, |
@kirs I am not familiar with Docker's checkpoint/restore implementation. I cannot say anything about if there are any plans to continue working on it. Just for the record: |
@adrianreber @chflood
It would be good to hear the view of the community on these issues, and whether or not something can be or need to be done. |
@ashu-mehra as you are explicitly tagging me in your question, I will answer, but probably not a very helpful answer. I guess that all your security concerns are true. Depending on the control you have over your application you could make sure that secrets are removed from the memory before doing the checkpoint and you could reseed your random generation algorithms after the restore. |
@adrianreber - thanks for quick response.
Makes sense and we had similar thoughts on tackling these which is why I mentioned |
Do we have any details about how experimental it is ? |
I am not aware of any one working on it actively. At least, as far as I know, no one has contacted the CRIU community about it. As I have implemented the checkpoint/restore feature for Podman I never really looked into the docker code, so I do not no any details, but neither on the github bug tracker or on the CRIU mailing-list are any communications concerning the docker checkpoint integration. Only users having problems with it. If you need checkpoint/restore please try out Podman's checkpoint/restore support. I am closing this ticket as we cannot answer that question here for almost one year so we will probably not have an answer any time soon. Podman's checkpoint/restore support is not marked experimental and if there are any problems I am happy to help. |
Would any one know why is docker checkpoint an experimental feature and not suggested to be used in production as per the official doc - https://docs.docker.com/engine/reference/commandline/checkpoint/
Does it have anything to do with stability of criu or is there any security concern or something else?
And what can be done to be able to use criu in production environments?
The text was updated successfully, but these errors were encountered: