-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
criu couldn't checkpoint program state #1223
Comments
As mentioned in the other ticket, you might be doing it wrong. You should really tell us what exactly you are trying to do. Please include all the commands you are running. |
I do apologize for my misunderstanding. I am trying to checkpoint my program running inside the container and resume at another container. The followings are that I tried previous and that time, all are working well. However, when I try to test today, I cannot checkpoint the state and it normally start from the begging state at another container. docker run -it --name test pollen5005/dcgan:latest docker create -it --name test1 pollen5005/dcgan:latest |
Therefore, I checked with the common example. docker logs looper docker checkpoint create looper checkpoint1 (Restore) docker logs looper
|
Looking at out latest CI run it seems I see a similar error: https://travis-ci.org/github/checkpoint-restore/criu/jobs/732998447 Maybe, I am not sure. In the CI test I see that a docker container is checkpointed and restored. After that we run 'ps axf' but the result is almost always the same, so it seems like the container is not started from the checkpoint, but from the beginning. Looking at our Podman CI run, this seems to be still working as expected. Please try CentOS 8.2 with Podman as described in https://criu.org/Podman to see if it works better for you. I, myself, have never used docker's checkpoint/restore support, but maybe @avagin and @rst0git can have a look at our docker CI run to see if my analysis is correct that it indeed does not seem to be working. |
Thank you for your suggestions. |
Good to hear. |
Thanks a lot @adrianreber |
@upc-distribution What is the output of the following commands? docker run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
docker checkpoint create looper checkpoint1
docker start --checkpoint checkpoint1 looper
docker logs looper |
@rst0git Hello Sir. I do apologize for my late reply. |
Hi @upc-distribution, this issue was caused by commit moby/moby@d4c6372, which was included in v19.03.13. However, there is another issue with upstream checkpoint/restore in moby (moby/moby#41531) and I hope both issues might be fixed in the next release. As a workaround, you can downgrade to v19.03.12 or earlier version. |
@rst0git Can you confirm that our docker tests are broken because of those bugs? Do we need to change our docker tests to catch errors like this? Maybe include the output of |
Yes
I think it would be better to fix the integration test in moby: moby/moby#38963 |
Hello sir,it seems that docker v19.03.12 has the the same problem. |
Hi @Heming-Zhong, could you please try the following?
|
Hello,sir. I have tried your edition of docker and it runs well. It seems that the original version of docker's server I installed is 19.03.12-ce from the manjaro community repos. Is that means the "ce" version can't handle checkpoint/restore well? |
@Heming-Zhong For what it's worth, a quick check of the combination of docker-ce 19.03.13 and containerd 1.2.13 from the download.docker repository seems able to restore. I checked docker down to 19.03.10, and no version could restore with containerd 1.3.7 |
A friendly reminder that this issue had no activity for 30 days. |
The problem has been resolved with containerd v1.5.0-beta.0.
|
Hello adrianreber,
May I ask you
I am using criu 3.14 and docker 19.03.13 for checkpoint and restore my programs. Its working well but unfortunately, when I try to checkpoint and restore my program, it is not check pointing the program state and it always start from the beginning step when resume my container. But, it didn't show any errors. Is it relate as I am checking my kernel version? Previous, I didn't notice which version that I used but now is 4.4.0-190-generic Ubuntu16.04. Please suggest me. Do I need to downgrade or upgrade the kernel? Thank you very much.
The text was updated successfully, but these errors were encountered: