Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpointing containers is failing with docker and podman #1374

Closed
rajbhar opened this issue Feb 24, 2021 · 8 comments
Closed

Checkpointing containers is failing with docker and podman #1374

rajbhar opened this issue Feb 24, 2021 · 8 comments

Comments

@rajbhar
Copy link
Contributor

rajbhar commented Feb 24, 2021

I am trying to checkpoint a simple looper but not able to checkpoint successfully. I tried both docker checkpoint create and podman checkpoint container commands. Earlier i was getting errors related to iptables-restore -w which I could fix by building latest iptables on my test machine.

podman run -d --name looper busybox /bin/sh -c \

     'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'

podman container checkpoint looper
sudo podman container checkpoint looper
2021-02-24T04:02:24.000360396Z: CRIU checkpointing failed -52
Please check CRIU logfile /var/lib/containers/storage/overlay-containers/610fbdcba9e3c5f501d31ca3805d88898797026899c81808a493bd0d1266885b/userdata/dump.log

Error: /usr/bin/crun checkpoint --image-path /var/lib/containers/storage/overlay-containers/610fbdcba9e3c5f501d31ca3805d88898797026899c81808a493bd0d1266885b/userdata/checkpoint --work-path /var/lib/containers/storage/overlay-containers/610fbdcba9e3c5f501d31ca3805d88898797026899c81808a493bd0d1266885b/userdata 610fbdcba9e3c5f501d31ca3805d88898797026899c81808a493bd0d1266885b failed: exit status 1

when using docker, i see similar behavior
sudo docker checkpoint create looper checkpoint1 Error response from daemon: Cannot checkpoint container looper: runc did not terminate successfully: criu failed: type NOTIFY errno 0 path= /run/containerd/io.containerd.runtime.v2.task/moby/ba594c352758682f7717891ef40945c6b8636dd73732485c788e452a515aa90c/criu-dump.log: unknown

I dont see any obvious error / failed message in the log files and observe the same behavior when I use docker instead of podman. I am otherwise able to CR on my test machine but seeing issues with containers even for simple use cases. Any pointers are appreciated!

OS: Ubuntu 18.04
Kernel: 5.9-rc5
iptables v1.8.7 (legacy)
CRIU: 3.15, compiled from sources.

criu_error.log.log

@adrianreber
Copy link
Member

It sounds like you are using a self compiled kernel and it seems to be missing for CRIU necessary DIAG options:

(00.126404) sockets: Sockects collect procedure family AF_INET proto IPPROTO_UDP: -2
(00.128717) sockets: Sockects collect procedure family AF_INET proto IPPROTO_UDPLITE: -2
(00.131282) sockets: Sockects collect procedure family AF_INET proto IPPROTO_RAW: -2
(00.134061) sockets: Sockects collect procedure family AF_INET6 proto IPPROTO_UDP: -2
(00.135498) sockets: Sockects collect procedure family AF_INET6 proto IPPROTO_UDPLITE: -2
(00.136922) sockets: Sockects collect procedure family AF_INET6 proto IPPROTO_RAW: -2

Looking at one of my test systems I see following modules loaded:

raw_diag               16384  0
udp_diag               16384  0
tcp_diag               16384  0
inet_diag              24576  3 tcp_diag,raw_diag,udp_diag
netlink_diag           16384  0
af_packet_diag         16384  0
unix_diag              16384  0

You probably need to turn on a couple of these options:

# grep DIAG /boot/config-5.9.15-200.fc33.x86_64
CONFIG_PACKET_DIAG=m
CONFIG_UNIX_DIAG=m
CONFIG_SMC_DIAG=m
CONFIG_XDP_SOCKETS_DIAG=m
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_INET_UDP_DIAG=m
CONFIG_INET_RAW_DIAG=m
CONFIG_INET_DIAG_DESTROY=y
CONFIG_INET_MPTCP_DIAG=m
CONFIG_INET_SCTP_DIAG=m
CONFIG_TIPC_DIAG=m
CONFIG_VSOCKETS_DIAG=m
CONFIG_NETLINK_DIAG=m

Not all of them are necessary for CRIU but some are.

@rajbhar
Copy link
Contributor Author

rajbhar commented Feb 24, 2021

Thanks Adrian! I had most of them built-in my kernel but not as a module. Remaining ones from this list I added to my config. Now on checkpointing with docker, I see this error which I am trying to debug.

(00.160697) Error (criu/libnetlink.c:55): -22 reported by netlink: Invalid argument
(00.161918) Error (criu/namespaces.c:1145): Namespaces dumping finished with error 59904
(00.162383) Unlock network
(00.162397) Running network-unlock scripts
(00.162408) RPC
(00.178721) Unfreezing tasks into 1
(00.178751) Unseizing 4955 into 1
(00.178775) Unseizing 7996 into 1
(00.178832) Error (criu/cr-dump.c:1780): Dumping FAILED.

PS: I am using libnftnl 1.1.9.

@adrianreber
Copy link
Member

I would still blame your custom kernel. Your are probably missing some NETLINK related kernel configuration options.

@rst0git
Copy link
Member

rst0git commented Feb 24, 2021

@rajbhar you can find on this page the kernel features required by CRIU: https://criu.org/Linux_kernel

@rajbhar
Copy link
Contributor Author

rajbhar commented Feb 24, 2021

Thanks Radostin and Adrian. Though criu check was telling me "Looks good" but I was still missing some of these. After I build kernel, I am able to checkpoint with docker but still seeing some issues which I believe are not related to criu.
"Error response from daemon: custom checkpointdir is not supported" I am doing more debug and testing and will close this issue soon if i don't run into any other criu related error.

If you are aware, could you please confirm whether CR with docker / podman works stable on Ubuntu 18.04 LTS ?
I am using criu 3.15 latest dev branch for my development work with a plugin on bare metal. Next i want to move my test scenario inside a simple container. I am dealing with a device hence using plugin to assist criu deal with device VMAs etc.
Any recommendations for a stable OS/Kernel/Criu/Docker recipe for container migration?

@adrianreber
Copy link
Member

Any recommendations for a stable OS/Kernel/Criu/Docker recipe for container migration?

As long as you do not use the Ubuntu Kernel you should be on the save side. I am the author of the Podman CRIU integration so I am biased, but to easily migrate containers, Podman provides a much friendlier interface if you are using the --export feature. With Podman it is one command to export a container and one command to import a container.

If your problems are solved please close this ticket.

@rst0git
Copy link
Member

rst0git commented Feb 25, 2021

Error response from daemon: custom checkpointdir is not supported

was reported some time ago in moby/moby#37344

Any recommendations for a stable OS/Kernel/Criu/Docker recipe for container migration?

CRIU - the latest version
OS/Kernel - I would recommend the latest Fedora or RHEL
Docker/containerd: You might need to use containerd v1.2.14 (See #1365 and #1223)

I agree with Adrian about Podman. If there are any issues it would be much easier/faster to fix them because it has working continuous integration tests and less abstraction layers.

@rajbhar rajbhar closed this as completed Feb 26, 2021
@rajbhar
Copy link
Contributor Author

rajbhar commented Feb 26, 2021

Thank you for the feedback and help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants