Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util.c Unable to mount tmpfs failed restore on AmazonLinux2 ami #2384

Open
bsmithai opened this issue Apr 9, 2024 · 19 comments
Open

util.c Unable to mount tmpfs failed restore on AmazonLinux2 ami #2384

bsmithai opened this issue Apr 9, 2024 · 19 comments

Comments

@bsmithai
Copy link

bsmithai commented Apr 9, 2024

Description
Dumped a simple count runc container on an AmazonLinux2 ec2 instance (uname -r 5.15.148-97.158.amzn2.x86_64). Dump was successful, but on restore I get the following error:

(00.002249)      1: Found id extRootNetNS (fd 14) in inherit fd list
(00.002399)      1: timens: monotonic -7577 779467676
(00.002410)      1: timens: boottime -7577 779456720
(00.002453) Running setup-namespaces scripts
(00.002456) 	RPC
(00.002655)      1: Calling restore_sid() for init
(00.002660)      1: Restoring 1 to 1 sid
(00.002747)      1: Collecting 44/37 (flags 2)
(00.002755)      1: No tty-info.img image
(00.002758)      1:  `- ... done
(00.002759)      1: Collecting 45/51 (flags 0)
(00.002763)      1: No tty-data.img image
(00.002765)      1:  `- ... done
(00.002766)      1: Restoring namespaces 1 flags 0x6c028000
(00.002988)      1: kernel/hostname nr 24
(00.003068)      1: kernel/domainname nr 6
(00.003295)      1: Restoring IPC namespace
(00.003301)      1: Restoring IPC variables
(00.003451)      1: Restoring IPC shared memory
(00.003457)      1: No ipcns-shm-11.img image
(00.003459)      1: Restoring IPC message queues
(00.003463)      1: No ipcns-msg-11.img image
(00.003464)      1: Restoring IPC semaphores sets
(00.003468)      1: No ipcns-sem-11.img image
(00.003535)      1: No netns-ct-10.img image
(00.003544)      1: No netns-exp-10.img image
(00.003583)      1: mnt: Restoring mount namespace
(00.003745)      1: mnt-v2: 1141 make_yard /tmp/.criu.mntns.EQLivu(00.003748)      1: criu/util.c make_yard path: /tmp/.criu.mntns.EQLivu 
(30.003925)      1: mnt: Move the root to /tmp/.criu.mntns.EQLivu
(30.003975)      1: mnt: mount.c put_root crtools-put-root.I8W2Xc 
(30.004244)      1: mnt-v2: 1199 make_yard /tmp/.criu.mntns.EQLivu 
(30.004247)      1: criu/util.c make_yard path: /tmp/.criu.mntns.EQLivu 
(60.004393)      1: Error (criu/util.c:1141): util.c Unable to mount tmpfs in /tmp/.criu.mntns.EQLivu: No such file or directory
(60.004932) Error (criu/cr-restore.c:1513): 117772 exited, status=1
(60.004964) Warn  (criu/cr-restore.c:2544): Unable to wait 117772: No child processes
(60.004981) Error (criu/cr-restore.c:2557): Restoring FAILED.
(60.005203) Error (criu/cgroup.c:1971): cg: cgroupd: recv req error: No such file or directory

I added some extra debug statements, I apologize for that portion that may be confusing. But from my understanding there is a function in mountv2 pre_create_mount_namespaces(void) which calls get_empty_mntns(). In get_empty_mntns() make_yard(path) is called which mounts a /tmp.mnt directory of type tmpfs and then makes it private. But then later in pre_create_mount_namespaces(void), make_yard is called again getting passed the same value for path, but this time errors out saying that the path DNE.

What I don't understand is the portions between the start of get_empty_mntns() to when we iterate over the ns_id list, filter for mount ns_ids and then we join the namespace created from get_empty_mntns() to then call unshare(CLONE_NEWNS) again and to finally call make_yard which then errors out saying that the path DNE which is the same path from what was passed in to get_empty_mntns().

I also had a sleep call in the make_yard function to verify that the tmp directory exists, it did both times make_yard was called but I am wondering if the directory is somehow not visible to the calling process of unshare.

Steps to reproduce the issue:

  1. Start EC2 instance under uname -r 5.15.148-97.158.amzn2.x86_64
  2. Deploy a pod onto instance
  3. exec into pod with /bin/sh
  4. Mount host ec2 instance to /host and join all namespaces
  5. Chroot onto /host
  6. Start any basic runc container
  7. Conduct a runc checkpoint with these opts:
tcp-close
skip-in-flight
manage-cgroups=ignore

criu leave running on
8. Restore with default opts excluding the runc.conf from above
9. Error

Describe the results you received:
Error in CRIU restore saying that a file directory DNE

Describe the results you expected:
A successful restore of a runc container given the chrooted environment

Additional information you deem important (e.g. issue happens only occasionally):

CRIU logs and information:

CRIU full dump/restore logs:

(00.000000) Parsing config file /etc/criu/runc.conf
(00.000048) Version: 3.19 (gitid v3.19)
(00.000053) Running on ip-192-168-116-129.us-east-2.compute.internal Linux 5.15.148-97.158.amzn2.x86_64 #1 SMP Mon Jan 29 21:29:43 UTC 2024 x86_64
(00.000055) Would overwrite RPC settings with values from /etc/criu/runc.conf
(00.000077) Loaded kdat cache from /run/criu.kdat
(00.000103) Hugetlb size 2 Mb is supported but cannot get dev's number
(00.000413) Added ipc:/proc/3796/ns/ipc join namespace
(00.000423) Added uts:/proc/3796/ns/uts join namespace
(00.000440) Parsing config file /etc/criu/runc.conf
(00.000460) Will skip in-flight TCP connections
(00.000462) Will drop all TCP connections on restore
(00.000470) rlimit: RLIMIT_NOFILE unlimited for self
(00.000529) cpu: x86_family 6 x86_vendor_id GenuineIntel x86_model_id Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
(00.000534) cpu: fpu: xfeatures_mask 0x5 xsave_size 832 xsave_size_max 832 xsaves_size 0
(00.000537) cpu: fpu: x87 floating point registers     xstate_offsets      0 / 0      xstate_sizes    160 / 160   
(00.000540) cpu: fpu: AVX registers                    xstate_offsets    576 / 576    xstate_sizes    256 / 256   
(00.000542) cpu: fpu:1 fxsr:1 xsave:1 xsaveopt:1 xsavec:0 xgetbv1:0 xsaves:0
(00.000570) kernel pid_max=4194304
(00.000572) Reading image tree
(00.000594) Add mnt ns 13 pid 1
(00.000596) Add net ns 10 pid 1
(00.000598) Add pid ns 9 pid 1
(00.000601) pstree pid_max=1
(00.000608) Will restore in 6c020000 namespaces
(00.000610) NS mask to use 6c020000
(00.000693) Collecting 51/56 (flags 3)
(00.000704) No memfd.img image
(00.000706)  `- ... done
(00.000707) Collecting 40/54 (flags 2)
(00.000719) Collected [bin/sh] ID 0x1
(00.000722) Collected [lib/libc.so.6] ID 0x2
(00.000725) Collected [lib/libresolv.so.2] ID 0x3
(00.000727) Collected [lib/libm.so.6] ID 0x4
(00.000729) Collected [lib/ld-linux-x86-64.so.2] ID 0x5
(00.000732) Collected [dev/null] ID 0x6
(00.000735) Collected pipe entry ID 0x7 PIPE ID 0xf0a7
(00.000745) Found id pipe:[61607] (fd 2) in inherit fd list
(00.000749) Collected pipe entry ID 0x8 PIPE ID 0xf0a8
(00.000754) Found id pipe:[61608] (fd 4) in inherit fd list
(00.000757) Collected [.] ID 0x9
(00.000760) Collected [.] ID 0xa
(00.000763)  `- ... done
(00.000765) Collecting 46/68 (flags 0)
(00.000770) No remap-fpath.img image
(00.000771)  `- ... done
(00.000776) No apparmor.img image
(00.000889) Running pre-restore scripts
(00.000899) 	RPC
(00.000964) cg: cgroud: Daemon started
(00.001284) net: Saved netns fd for links restore
(00.001332) mnt: Reading mountpoint images (id 13 pid 1)
(00.001345) mnt: 		Will mount 381 from /
(00.001350) mnt: 		Will mount 381 @ /tmp/.criu.mntns.EQLivu/mnt-0000000381 /sys/firmware
(00.001352) mnt: 	Read 381 mp @ /sys/firmware
(00.001357) mnt: 		Will mount 329 from /dev/null (E)
(00.001359) mnt: 		Will mount 329 @ /tmp/.criu.mntns.EQLivu/mnt-0000000329 /proc/timer_list
(00.001361) mnt: 	Read 329 mp @ /proc/timer_list
(00.001366) mnt: 		Will mount 258 from /dev/null (E)
(00.001368) mnt: 		Will mount 258 @ /tmp/.criu.mntns.EQLivu/mnt-0000000258 /proc/latency_stats
(00.001369) mnt: 	Read 258 mp @ /proc/latency_stats
(00.001372) mnt: 		Will mount 257 from /dev/null (E)
(00.001374) mnt: 		Will mount 257 @ /tmp/.criu.mntns.EQLivu/mnt-0000000257 /proc/keys
(00.001375) mnt: 	Read 257 mp @ /proc/keys
(00.001378) mnt: 		Will mount 256 from /dev/null (E)
(00.001379) mnt: 		Will mount 256 @ /tmp/.criu.mntns.EQLivu/mnt-0000000256 /proc/kcore
(00.001381) mnt: 	Read 256 mp @ /proc/kcore
(00.001383) mnt: 		Will mount 255 from /
(00.001385) mnt: 		Will mount 255 @ /tmp/.criu.mntns.EQLivu/mnt-0000000255 /proc/acpi
(00.001386) mnt: 	Read 255 mp @ /proc/acpi
(00.001391) mnt: 		Will mount 254 from /sysrq-trigger
(00.001393) mnt: 		Will mount 254 @ /tmp/.criu.mntns.EQLivu/mnt-0000000254 /proc/sysrq-trigger
(00.001394) mnt: 	Read 254 mp @ /proc/sysrq-trigger
(00.001396) mnt: 		Will mount 227 from /sys
(00.001398) mnt: 		Will mount 227 @ /tmp/.criu.mntns.EQLivu/mnt-0000000227 /proc/sys
(00.001399) mnt: 	Read 227 mp @ /proc/sys
(00.001402) mnt: 		Will mount 223 from /irq
(00.001408) mnt: 		Will mount 223 @ /tmp/.criu.mntns.EQLivu/mnt-0000000223 /proc/irq
(00.001410) mnt: 	Read 223 mp @ /proc/irq
(00.001412) mnt: 		Will mount 222 from /fs
(00.001414) mnt: 		Will mount 222 @ /tmp/.criu.mntns.EQLivu/mnt-0000000222 /proc/fs
(00.001415) mnt: 	Read 222 mp @ /proc/fs
(00.001417) mnt: 		Will mount 221 from /bus
(00.001421) mnt: 		Will mount 221 @ /tmp/.criu.mntns.EQLivu/mnt-0000000221 /proc/bus
(00.001423) mnt: 	Read 221 mp @ /proc/bus
(00.001426) mnt: 		Will mount 502 from /run/containerd/io.containerd.grpc.v1.cri/sandboxes/2012d57a0306eae3cc97f82ff3467c096048cb36aac91ad27bdb813f335b447d/shm (E)
(00.001427) mnt: 		Will mount 502 @ /tmp/.criu.mntns.EQLivu/mnt-0000000502 /dev/shm
(00.001429) mnt: 	Read 502 mp @ /dev/shm
(00.001432) mnt: 		Will mount 501 from /var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/2012d57a0306eae3cc97f82ff3467c096048cb36aac91ad27bdb813f335b447d/resolv.conf (E)
(00.001434) mnt: 		Will mount 501 @ /tmp/.criu.mntns.EQLivu/mnt-0000000501 /etc/resolv.conf
(00.001435) mnt: 	Read 501 mp @ /etc/resolv.conf
(00.001438) mnt: 		Will mount 500 from /var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/2012d57a0306eae3cc97f82ff3467c096048cb36aac91ad27bdb813f335b447d/hostname (E)
(00.001439) mnt: 		Will mount 500 @ /tmp/.criu.mntns.EQLivu/mnt-0000000500 /etc/hostname
(00.001441) mnt: 	Read 500 mp @ /etc/hostname
(00.001447) mnt: 		Will mount 499 from /var/lib/kubelet/pods/dc845343-b8d1-4988-8d90-ec77b6b6597c/containers/count-up-container/91b32408 (E)
(00.001449) mnt: 		Will mount 499 @ /tmp/.criu.mntns.EQLivu/mnt-0000000499 /dev/termination-log
(00.001450) mnt: 	Read 499 mp @ /dev/termination-log
(00.001453) mnt: 		Will mount 498 from /var/lib/kubelet/pods/dc845343-b8d1-4988-8d90-ec77b6b6597c/etc-hosts (E)
(00.001455) mnt: 		Will mount 498 @ /tmp/.criu.mntns.EQLivu/mnt-0000000498 /etc/hosts
(00.001456) mnt: 	Read 498 mp @ /etc/hosts
(00.001458) mnt: 		Will mount 497 from /
(00.001460) mnt: 		Will mount 497 @ /tmp/.criu.mntns.EQLivu/mnt-0000000497 /sys/fs/cgroup
(00.001462) mnt: 	Read 497 mp @ /sys/fs/cgroup
(00.001464) mnt: 		Will mount 496 from /
(00.001465) mnt: 		Will mount 496 @ /tmp/.criu.mntns.EQLivu/mnt-0000000496 /sys
(00.001467) mnt: 	Read 496 mp @ /sys
(00.001471) mnt: 		Will mount 495 from /
(00.001473) mnt: 		Will mount 495 @ /tmp/.criu.mntns.EQLivu/mnt-0000000495 /dev/mqueue
(00.001475) mnt: 	Read 495 mp @ /dev/mqueue
(00.001477) mnt: 		Will mount 494 from /
(00.001478) mnt: 		Will mount 494 @ /tmp/.criu.mntns.EQLivu/mnt-0000000494 /dev/pts
(00.001480) mnt: 	Read 494 mp @ /dev/pts
(00.001482) mnt: 		Will mount 493 from /
(00.001484) mnt: 		Will mount 493 @ /tmp/.criu.mntns.EQLivu/mnt-0000000493 /dev
(00.001485) mnt: 	Read 493 mp @ /dev
(00.001487) mnt: 		Will mount 492 from /
(00.001489) mnt: 		Will mount 492 @ /tmp/.criu.mntns.EQLivu/mnt-0000000492 /proc
(00.001490) mnt: 	Read 492 mp @ /proc
(00.001496) mnt: 		Will mount 491 from /
(00.001498) mnt: 		Will mount 491 @ /tmp/.criu.mntns.EQLivu/mnt-0000000491 /
(00.001499) mnt: 	Read 491 mp @ /
(00.001504) mnt: Building mountpoints tree
(00.001505) mnt: 	Building plain mount tree
(00.001507) mnt: 		Working on 491->220
(00.001508) mnt: 		Working on 492->491
(00.001510) mnt: 		Working on 493->491
(00.001511) mnt: 		Working on 494->493
(00.001512) mnt: 		Working on 495->493
(00.001514) mnt: 		Working on 496->491
(00.001515) mnt: 		Working on 497->496
(00.001516) mnt: 		Working on 498->491
(00.001518) mnt: 		Working on 499->493
(00.001519) mnt: 		Working on 500->491
(00.001520) mnt: 		Working on 501->491
(00.001522) mnt: 		Working on 502->493
(00.001523) mnt: 		Working on 221->492
(00.001524) mnt: 		Working on 222->492
(00.001526) mnt: 		Working on 223->492
(00.001527) mnt: 		Working on 227->492
(00.001528) mnt: 		Working on 254->492
(00.001530) mnt: 		Working on 255->492
(00.001531) mnt: 		Working on 256->492
(00.001532) mnt: 		Working on 257->492
(00.001534) mnt: 		Working on 258->492
(00.001535) mnt: 		Working on 329->492
(00.001536) mnt: 		Working on 381->496
(00.001538) mnt: 	Resorting children of 491 in mount order
(00.001542) mnt: 	Resorting children of 498 in mount order
(00.001544) mnt: 	Resorting children of 500 in mount order
(00.001545) mnt: 	Resorting children of 501 in mount order
(00.001546) mnt: 	Resorting children of 492 in mount order
(00.001549) mnt: 	Resorting children of 221 in mount order
(00.001550) mnt: 	Resorting children of 222 in mount order
(00.001551) mnt: 	Resorting children of 223 in mount order
(00.001553) mnt: 	Resorting children of 227 in mount order
(00.001554) mnt: 	Resorting children of 254 in mount order
(00.001555) mnt: 	Resorting children of 255 in mount order
(00.001556) mnt: 	Resorting children of 256 in mount order
(00.001558) mnt: 	Resorting children of 257 in mount order
(00.001559) mnt: 	Resorting children of 258 in mount order
(00.001560) mnt: 	Resorting children of 329 in mount order
(00.001561) mnt: 	Resorting children of 493 in mount order
(00.001563) mnt: 	Resorting children of 494 in mount order
(00.001564) mnt: 	Resorting children of 495 in mount order
(00.001566) mnt: 	Resorting children of 499 in mount order
(00.001567) mnt: 	Resorting children of 502 in mount order
(00.001568) mnt: 	Resorting children of 496 in mount order
(00.001570) mnt: 	Resorting children of 497 in mount order
(00.001571) mnt: 	Resorting children of 381 in mount order
(00.001572) mnt: Done:
(00.001574) mnt: [/](491->220)
(00.001575) mnt:  [/etc/hosts](498->491)
(00.001577) mnt:  <--
(00.001578) mnt:  [/etc/hostname](500->491)
(00.001580) mnt:  <--
(00.001581) mnt:  [/etc/resolv.conf](501->491)
(00.001583) mnt:  <--
(00.001584) mnt:  [/proc](492->491)
(00.001585) mnt:   [/proc/bus](221->492)
(00.001587) mnt:   <--
(00.001588) mnt:   [/proc/fs](222->492)
(00.001590) mnt:   <--
(00.001591) mnt:   [/proc/irq](223->492)
(00.001592) mnt:   <--
(00.001594) mnt:   [/proc/sys](227->492)
(00.001595) mnt:   <--
(00.001596) mnt:   [/proc/sysrq-trigger](254->492)
(00.001598) mnt:   <--
(00.001599) mnt:   [/proc/acpi](255->492)
(00.001601) mnt:   <--
(00.001602) mnt:   [/proc/kcore](256->492)
(00.001603) mnt:   <--
(00.001605) mnt:   [/proc/keys](257->492)
(00.001606) mnt:   <--
(00.001607) mnt:   [/proc/latency_stats](258->492)
(00.001609) mnt:   <--
(00.001610) mnt:   [/proc/timer_list](329->492)
(00.001611) mnt:   <--
(00.001613) mnt:  <--
(00.001614) mnt:  [/dev](493->491)
(00.001616) mnt:   [/dev/pts](494->493)
(00.001617) mnt:   <--
(00.001618) mnt:   [/dev/mqueue](495->493)
(00.001620) mnt:   <--
(00.001621) mnt:   [/dev/termination-log](499->493)
(00.001622) mnt:   <--
(00.001624) mnt:   [/dev/shm](502->493)
(00.001625) mnt:   <--
(00.001626) mnt:  <--
(00.001628) mnt:  [/sys](496->491)
(00.001629) mnt:   [/sys/fs/cgroup](497->496)
(00.001631) mnt:   <--
(00.001632) mnt:   [/sys/firmware](381->496)
(00.001633) mnt:   <--
(00.001635) mnt:  <--
(00.001636) mnt: <--
(00.001641) mnt: 	The mount 221 is bind for 492 (@/proc/bus -> @/proc)
(00.001643) mnt: 	The mount 222 is bind for 492 (@/proc/fs -> @/proc)
(00.001644) mnt: 	The mount 223 is bind for 492 (@/proc/irq -> @/proc)
(00.001646) mnt: 	The mount 227 is bind for 492 (@/proc/sys -> @/proc)
(00.001647) mnt: 	The mount 254 is bind for 492 (@/proc/sysrq-trigger -> @/proc)
(00.001649) mnt: 	The mount 256 is bind for 493 (@/proc/kcore -> @/dev)
(00.001651) mnt: 	The mount 257 is bind for 493 (@/proc/keys -> @/dev)
(00.001652) mnt: 	The mount 258 is bind for 493 (@/proc/latency_stats -> @/dev)
(00.001654) mnt: 	The mount 329 is bind for 493 (@/proc/timer_list -> @/dev)
(00.001655) mnt: 	The mount 499 is bind for 498 (@/dev/termination-log -> @/etc/hosts)
(00.001657) mnt: 	The mount 500 is bind for 498 (@/etc/hostname -> @/etc/hosts)
(00.001658) mnt: 	The mount 501 is bind for 498 (@/etc/resolv.conf -> @/etc/hosts)
(00.001660) mnt: Start with 491:/
(00.001665) mnt-v2: Inspecting sharing on 491 shared_id 0 master_id 160 (@/)
(00.001672) mnt-v2: Detected external slavery for shared group (0, 160) with source /run/containerd/runc/k8s.io/runc-restore/criu-root
(00.001674) mnt: Mountpoint 491 (@/) moved to the root yard
(00.001689) No pidns-9.img image
(00.001745) Warn  (criu/cr-restore.c:1301): Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,because not all kernels support such clone flags combinations!
(00.001748) Forking task with 1 pid (flags 0x6c028000)
(00.001750) Creating process using clone3()
(00.002055) PID: real 117772 virt 1
(00.002166) Wait until namespaces are created
(00.002249)      1: Found id extRootNetNS (fd 14) in inherit fd list
(00.002399)      1: timens: monotonic -7577 779467676
(00.002410)      1: timens: boottime -7577 779456720
(00.002453) Running setup-namespaces scripts
(00.002456) 	RPC
(00.002655)      1: Calling restore_sid() for init
(00.002660)      1: Restoring 1 to 1 sid
(00.002747)      1: Collecting 44/37 (flags 2)
(00.002755)      1: No tty-info.img image
(00.002758)      1:  `- ... done
(00.002759)      1: Collecting 45/51 (flags 0)
(00.002763)      1: No tty-data.img image
(00.002765)      1:  `- ... done
(00.002766)      1: Restoring namespaces 1 flags 0x6c028000
(00.002988)      1: kernel/hostname nr 24
(00.003068)      1: kernel/domainname nr 6
(00.003295)      1: Restoring IPC namespace
(00.003301)      1: Restoring IPC variables
(00.003451)      1: Restoring IPC shared memory
(00.003457)      1: No ipcns-shm-11.img image
(00.003459)      1: Restoring IPC message queues
(00.003463)      1: No ipcns-msg-11.img image
(00.003464)      1: Restoring IPC semaphores sets
(00.003468)      1: No ipcns-sem-11.img image
(00.003535)      1: No netns-ct-10.img image
(00.003544)      1: No netns-exp-10.img image
(00.003583)      1: mnt: Restoring mount namespace
(00.003745)      1: mnt-v2: 1141 make_yard /tmp/.criu.mntns.EQLivu(00.003748)      1: criu/util.c make_yard path: /tmp/.criu.mntns.EQLivu 
(30.003925)      1: mnt: Move the root to /tmp/.criu.mntns.EQLivu
(30.003975)      1: mnt: mount.c put_root crtools-put-root.I8W2Xc 
(30.004244)      1: mnt-v2: 1199 make_yard /tmp/.criu.mntns.EQLivu 
(30.004247)      1: criu/util.c make_yard path: /tmp/.criu.mntns.EQLivu 
(60.004393)      1: Error (criu/util.c:1141): util.c Unable to mount tmpfs in /tmp/.criu.mntns.EQLivu: No such file or directory
(60.004932) Error (criu/cr-restore.c:1513): 117772 exited, status=1
(60.004964) Warn  (criu/cr-restore.c:2544): Unable to wait 117772: No child processes
(60.004981) Error (criu/cr-restore.c:2557): Restoring FAILED.
(60.005203) Error (criu/cgroup.c:1971): cg: cgroupd: recv req error: No such file or directory

Output of `criu --version`:

criu --version
Version: 3.19
GitID: v3.19

Output of `criu check --all`:

criu check --all
Warn  (criu/cr-check.c:1346): Nftables based locking requires libnftables and set concatenations support
Looks good but some kernel features are missing
which, depending on your process tree, may cause
dump or restore failure.

Additional environment details:

@rst0git
Copy link
Member

rst0git commented Apr 12, 2024

(00.003583)      1: mnt: Restoring mount namespace
(00.003745)      1: mnt-v2: 1141 make_yard /tmp/.criu.mntns.EQLivu(00.003748)      1: criu/util.c make_yard path: /tmp/.criu.mntns.EQLivu 
(30.003925)      1: mnt: Move the root to /tmp/.criu.mntns.EQLivu
(30.003975)      1: mnt: mount.c put_root crtools-put-root.I8W2Xc 
(30.004244)      1: mnt-v2: 1199 make_yard /tmp/.criu.mntns.EQLivu 
(30.004247)      1: criu/util.c make_yard path: /tmp/.criu.mntns.EQLivu 
(60.004393)      1: Error (criu/util.c:1141): util.c Unable to mount tmpfs in /tmp/.criu.mntns.EQLivu: No such file or directory

@bsmithai Does enabling MntnsCompatMode fix this problem?

cc @Snorch

@bsmithai
Copy link
Author

Yes enabling MtnsCompatMode seemed to have been a temporary fix to the issue.

@adrianreber
Copy link
Member

In combination with Kubernetes I also had problems with the v2 mount code (#2023). In the end it was not a error in the v2 code, but just something not correctly set up.

I would also be curious if a newer kernel might have additional fixes which might be missing in your 5.15.

@nravic
Copy link

nravic commented Apr 12, 2024

@adrianreber is infrastructure container in that issue the same thing as a pause container?

@adrianreber
Copy link
Member

@adrianreber is infrastructure container in that issue the same thing as a pause container?

Yes.

@Snorch
Copy link
Member

Snorch commented Apr 16, 2024

I added some extra debug statements, I apologize for that portion that may be confusing. But from my understanding there is a function in mountv2 pre_create_mount_namespaces(void) which calls get_empty_mntns(). In get_empty_mntns() make_yard(path) is called which mounts a /tmp.mnt directory of type tmpfs and then makes it private. But then later in pre_create_mount_namespaces(void), make_yard is called again getting passed the same value for path, but this time errors out saying that the path DNE.

The code might be a bit hard to read, but you did a great job at narrowing down the problematic stack.

First let me explain how pre_create_mount_namespaces works:

  • get_empty_mntns just creates a new mount namespace, which is almost empty

note: if we change mnt_roots path from make_yard(mnt_roots); cr_pivot_root(mnt_roots); to any other available temporary directory path this function will still create exactly the same namespace, I just reused mnt_roots path there as it is always easily available at this point.

  • almost empty means that it has tmpfs root mount, just created empty by make_yard and "chrooted" to with cr_pivot_root, with mnt_roots path pre-created in it with mkdirpath (here it is really important that mnt_roots is used and not some random other path)

  • next in the loop over ns_ids we enter empty mntns and clone it, then we mount private tmpfs to mnt_roots path which should've been there already

So that looks really strange that you hit problem on make_yard in the loop over ns_ids. I would recommend adding explicit debug messages everywhere to re-confirm: Snorch@7c6d053

@bsmithai
Copy link
Author

@Snorch This makes a lot of sense! Thank you for the break down. And yes, I can add some more explicit debug messages now that I understand the idea. I'll get back to you with those statements sometime this week!

@bsmithai
Copy link
Author

bsmithai commented Apr 17, 2024

@Snorch Oh I guess a few questions I had are,

  1. in get_empty_mntns when we call unshare(CLONE_NEWNS), is the calling process already joining the mntns through the unshare call? And if it does, why do we need to call ret = setns(nsfd, nd->cflag); in the iteration?
  2. And then to subsequently call unshare(CLONE_NEWNS) again serves what purpose since it seems that the initial empty mount namespace was already created?
  3. When would nsid have multiple mount namespaces? Just if the process tree has multiple processes that had unshared their mount namespaces?

Sorry if you have to reiterate some of what you've already said

@Snorch
Copy link
Member

Snorch commented Apr 19, 2024

is the calling process already joining the mntns through the unshare call

you can read man 2 unshare for more info, unshare(CLONE_NEWMNT) it is not joining, but creating a new copy of the current mntns

why do we need to call ret = setns(nsfd, nd->cflag); in the iteration?

because on each iteration we create a new copy of the empty mntns with it's own private mount with own tmpts in mnt_roots and we need to return to empty mntns to make next copy

unshare(CLONE_NEWNS) again serves what purpose

create one more mntns

When would nsid have multiple mount namespaces? Just if the process tree has multiple processes that had unshared their mount namespaces?

yes, nowdays tons of apps create their own mount namespaces here and there =) (basically we support nested containerization for mount namespaces)

@bsmithai
Copy link
Author

bsmithai commented May 5, 2024

So that looks really strange that you hit problem on make_yard in the loop over ns_ids. I would recommend adding explicit debug messages everywhere to re-confirm: Snorch@7c6d053

@Snorch I added the debug statements:

The only one that trips is pre_create_mount_namespaces

(00.038663) mnt-v2: Detected external slavery for shared group (0, 16) with source /sys/fs/cgroup/memory/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope
(00.038665) mnt-v2: Detected external slavery for shared group (0, 15) with source /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope
(00.038667) mnt-v2: Detected external slavery for shared group (0, 14) with source /sys/fs/cgroup/blkio/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope
(00.038669) mnt-v2: Detected external slavery for shared group (0, 13) with source /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope
(00.038671) mnt-v2: Detected external slavery for shared group (0, 12) with source /sys/fs/cgroup/pids/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope
(00.038673) mnt-v2: Detected external slavery for shared group (0, 11) with source /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope
(00.038675) mnt-v2: Detected external slavery for shared group (0, 10) with source /sys/fs/cgroup/perf_event/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope
(00.038677) mnt-v2: Detected external slavery for shared group (0, 9) with source /sys/fs/cgroup/systemd/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope
(00.038679) mnt-v2: Detected external slavery for shared group (0, 139) with source /run/containerd/runc/k8s.io/ajsdjkabsdadasfasd/criu-root
(00.038681) mnt: Mountpoint 1117 (@/) moved to the root yard
(00.038696) No pidns-9.img image
(00.038752) Warn  (criu/cr-restore.c:1301): Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,because not all kernels support such clone flags combinations!
(00.038755) Forking task with 1 pid (flags 0x6c028000)
(00.038757) Creating process using clone3()
(00.039114) PID: real 476056 virt 1
(00.039220) cg: cgroud: Daemon started
(00.039301) Wait until namespaces are created
(00.039389)      1: Found id extRootNetNS (fd 15) in inherit fd list
(00.039535)      1: timens: monotonic -25 364970289
(00.039546)      1: timens: boottime -25 364958937
(00.039592) Running setup-namespaces scripts
(00.039595)     RPC
(00.039789)      1: Calling restore_sid() for init
(00.039794)      1: Restoring 1 to 1 sid
(00.057726)      1: Collecting 44/37 (flags 2)
(00.057749)      1: No tty-info.img image
(00.057753)      1:  `- ... done
(00.057754)      1: Collecting 45/51 (flags 0)
(00.057757)      1: No tty-data.img image
(00.057759)      1:  `- ... done
(00.057761)      1: Restoring namespaces 1 flags 0x6c028000
(00.058025)      1: kernel/hostname nr 23
(00.058099)      1: kernel/domainname nr 6
(00.058309)      1: Restoring IPC namespace
(00.058315)      1: Restoring IPC variables
(00.058445)      1: Restoring IPC shared memory
(00.058449)      1: No ipcns-shm-11.img image
(00.058454)      1: Restoring IPC message queues
(00.058457)      1: No ipcns-msg-11.img image
(00.058459)      1: Restoring IPC semaphores sets
(00.058462)      1: No ipcns-sem-11.img image
(00.058530)      1: No netns-ct-10.img image
(00.058549)      1: No netns-exp-10.img image
(00.058590)      1: mnt: Restoring mount namespace
(00.058855)      1: mnt: Move the root to /tmp/.criu.mntns.du1Clc
(00.059102)      1: Error (criu/util.c:1137): Unable to mount tmpfs in /tmp/.criu.mntns.du1Clc: No such file or directory
(00.059107)      1: Error (criu/mount-v2.c:1202): mnt-v2: DEBUG[pre_create_mount_namespaces]: Failed to mount tmpfs on /tmp/.criu.mntns.du1Clc
(00.078356) Error (criu/cr-restore.c:1513): 476056 exited, status=1
(00.078410) Warn  (criu/cr-restore.c:2544): Unable to wait 476056: No child processes
(00.078473) Error (criu/cr-restore.c:2557): Restoring FAILED.
(00.094459) Error (criu/cgroup.c:1972): cg: cgroupd: recv req error: No such file or directory```

@Snorch
Copy link
Member

Snorch commented May 6, 2024

Ok, that is really strange, next steps to debug:

  1. Add sleep(100) to get_empty_mntns just after mkdirpat
  2. run CRIU restore
  3. catch it with https://github.com/Snorch/linux-helpers/blob/master/catch_sleeping_with_gdb.sh
./catch_sleeping_with_gdb.sh criu

(it will print catched pid for convenience)
4) save cat /proc/<pid>/mountinfo, ls /proc/<pid>/root/tmp, ls -l /proc/<pid>/{cwd,root} and attach here
5) in gdb p mnt_roots for verification
6) exit gdb

1') Add sleep(100) (remove old sleep) to pre_create_mount_namespaces to make_yard error path before goto err
2') do (2, 3, 4, 5, 6) for it

Maybe we will understand some more with this information...

@bsmithai
Copy link
Author

bsmithai commented May 6, 2024

@Snorch Here are the results

sleep in get_empty_mntns

cat /proc/736549/mountinfo
1457 1338 0:273 / / rw,relatime - tmpfs none rw

ls /proc/736549/root/tmp

/proc/736549/{cwd,root}
lrwxrwxrwx 1 root root 0 May  6 15:20 /proc/736549/cwd -> /host
lrwxrwxrwx 1 root root 0 May  6 15:20 /proc/736549/root -> /host

(gdb) p mnt_roots
$1 = 0x0

sleep in pre_create_mount_namespaces in make_yard error

cat /proc/741001/mountinfo
1360 1359 0:143 / / rw,relatime master:111 - overlay overlay rw,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/190/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/189/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/188/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/187/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/186/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/185/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/172/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/203/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/203/work
1361 1360 0:268 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
1362 1360 0:269 / /dev rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755
1363 1362 0:270 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666
1364 1362 0:16 / /dev/mqueue rw,nosuid,nodev,noexec,relatime - mqueue mqueue rw
1365 1362 202:1 /var/lib/kubelet/pods/107ba602-7d78-462d-a84e-6173f35a33b7/containers/binary-container/f118e8cc /dev/termination-log rw,noatime - xfs /dev/xvda1 rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
1366 1362 0:20 / /dev/shm rw,nosuid,nodev - tmpfs tmpfs rw
1367 1360 0:18 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
1368 1367 0:271 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw,mode=755
1369 1368 0:24 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime master:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
1370 1368 0:26 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime master:10 - cgroup cgroup rw,perf_event
1371 1368 0:27 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime master:11 - cgroup cgroup rw,net_cls,net_prio
1372 1368 0:28 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime master:12 - cgroup cgroup rw,pids
1373 1368 0:29 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime master:13 - cgroup cgroup rw,cpu,cpuacct
1374 1368 0:30 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime master:14 - cgroup cgroup rw,blkio
1375 1368 0:31 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime master:15 - cgroup cgroup rw,cpuset
1376 1368 0:32 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime master:16 - cgroup cgroup rw,memory
1377 1368 0:33 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/misc rw,nosuid,nodev,noexec,relatime master:17 - cgroup cgroup rw,misc
1378 1368 0:34 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime master:18 - cgroup cgroup rw,hugetlb
1379 1368 0:35 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime master:19 - cgroup cgroup rw,devices
1380 1368 0:36 /kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod107ba602_7d78_462d_a84e_6173f35a33b7.slice/cri-containerd-ffd8567a6964d8b179a7a4d05b67a522a09945be29f9cbb63a85fd14e12f8423.scope /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime master:20 - cgroup cgroup rw,freezer
1381 1360 202:1 /var/lib/kubelet/pods/107ba602-7d78-462d-a84e-6173f35a33b7/etc-hosts /etc/hosts rw,noatime - xfs /dev/xvda1 rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
1382 1360 202:1 /var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/ce3706fe77ebb5225f1f4855f4f190a07cbdefe04d05f3051009e12064b900da/hostname /etc/hostname rw,noatime - xfs /dev/xvda1 rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
1383 1360 202:1 /var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/ce3706fe77ebb5225f1f4855f4f190a07cbdefe04d05f3051009e12064b900da/resolv.conf /etc/resolv.conf rw,noatime - xfs /dev/xvda1 rw,attr2,inode64,logbufs=8,logbsize=32k,noquota
1384 1360 0:134 / /run/secrets/kubernetes.io/serviceaccount ro,relatime - tmpfs tmpfs rw,size=1518252k
1385 1360 0:273 / /host rw,relatime - tmpfs none rw

ls /proc/741001/root/tmp

ls -l /proc/741001/{cwd,root}
lrwxrwxrwx 1 root root 0 May  6 15:33 /proc/741001/cwd -> /
lrwxrwxrwx 1 root root 0 May  6 15:33 /proc/741001/root -> /

(gdb) p mnt_roots
$1 = 0x0

@Snorch
Copy link
Member

Snorch commented May 7, 2024

Error (criu/util.c:1137): Unable to mount tmpfs in /tmp/.criu.mntns.du1Clc: No such file or directory

and

# after mkdirpat in get_empty_mntns
(gdb) p mnt_roots
$1 = 0x0

should not be happening at the same time, that is just impossible... Didn't you change something else since #2384 (comment) which leads to zero mnt_roots?

/proc/736549/{cwd,root}
lrwxrwxrwx 1 root root 0 May  6 15:20 /proc/736549/cwd -> /host
lrwxrwxrwx 1 root root 0 May  6 15:20 /proc/736549/root -> /host

The function cr_pivot_root explicitly does chdir to new root if root argument is not NULL, the info above clearly states that it does not, confirming that mnt_roots is NULL...

1+2 - Function get_empty_mntns is not ready for zero mnt_roots and obviously will do something completely undefined in this case. AFAICS create_mnt_roots should always be called before get_empty_mntns(), so either we'd failed earlier or mnt_roots is non zero.

cat /proc/741001/mountinfo
1360 1359 0:143 / / rw,relatime master:111 - overlay overlay rw,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/190/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/189/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/188/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/187/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/186/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/185/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/172/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/203/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/203/work
1361 1360 0:268 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
1362 1360 0:269 / /dev rw,nosuid - tmpfs tmpfs rw,size=65536k,mode=755

It should be in empty mntns copy at this point having something similar to:

cat /proc/736549/mountinfo
1457 1338 0:273 / / rw,relatime - tmpfs none rw

Sorry, but I don't understand anything now.

@bsmithai
Copy link
Author

bsmithai commented May 7, 2024

I am still getting this result:

(100.05186)      1: Error (criu/util.c:1137): Unable to mount tmpfs in /tmp/.criu.mntns.N57r6S: No such file or directory
(100.05187)      1: Error (criu/mount-v2.c:1203): mnt-v2: DEBUG[pre_create_mount_namespaces]: Failed to mount tmpfs on /tmp/.criu.mntns.N57r6S

at one point I changed and went back to mountv1 using the mntnscompat criu opt but I changed back to mountv2 for this. Both are failing with similar issues.

I just did it again with the sleep after mkdirpat and the mnt_roots is null but when I pr_debug it's not?

The resulting restore log updates though and has

(100.05186)      1: Error (criu/util.c:1137): Unable to mount tmpfs in /tmp/.criu.mntns.N57r6S: No such file or directory
(100.05187)      1: Error (criu/mount-v2.c:1203): mnt-v2: DEBUG[pre_create_mount_namespaces]: Failed to mount tmpfs on /tmp/.criu.mntns.N57r6S

@Snorch
Copy link
Member

Snorch commented May 7, 2024

I just did it again with the sleep after mkdirpat and the mnt_roots is null but when I pr_debug it's not?

It's really strange...

maybe my script catch_sleeping_with_gdb.sh fails to resolve proper binary path and gdb gets wrong symbols... try:

./catch_sleeping_with_gdb.sh criu </full/path/to/your/self-compiled/criu/binary>

Also can you check

# sleep in get_empty_mntns
ls /proc/<pid>/{cwd,root}
ls /proc/<pid>/{cwd,root}/tmp
ls /host/
# sleep in pre_create_mount_namespaces in make_yard error
ls /proc/<pid>/root/host/
ls /proc/<pid>/root/host/tmp

Also maybe it's worth trying on top of git checkout d553fad2dcbaf3aff0c41c4a7de61b4cb203784c without shadow stack (not sure if it's involved, just blind guess) or even git checkout v3.19.

@nravic
Copy link

nravic commented May 7, 2024

@Snorch just for some more info - we had no issues with this about two weeks ago. I'm not sure how relevant it is that between now and then AWS auto-updated the cluster. Here's the most recent patch: https://github.com/aws/eks-distro/releases/tag/v1-29-eks-10

Unfortunately there's no easy way to use an older version to test the regression hypothesis. FWIW, this same container and workload checkpoint/restores fine on our regular machines and inside a virtualized k3s cluster.

@Snorch
Copy link
Member

Snorch commented May 9, 2024

jfyi: I tried to build and install amazonlinux kernel on my node and all zdtm pass fine on it. But maybe you just have a different config.

You can try running zdtm in your amazon environment:

./test/zdtm.py run -a --ignore-taint --keep-going

to see if it passes mount tests or not...

@bsmithai
Copy link
Author

bsmithai commented May 9, 2024

I am getting the following error:

./test/zdtm.py run -a --ignore-taint --keep-going
Traceback (most recent call last):
  File "./test/zdtm.py", line 29, in <module>
    import pycriu as crpc
  File "/criu/test/pycriu/__init__.py", line 1, in <module>
    from . import rpc_pb2 as rpc
  File "/criu/test/pycriu/rpc_pb2.py", line 17, in <module>
    serialized_pb='\n\trpc.proto\"O\n\x15\x63riu_page_server_info\x12\x0f\n\x07\x61\x64\x64ress\x18\x01 \x01(\t\x12\x0c\n\x04port\x18\x02 \x01(\x05\x12\x0b\n\x03pid\x18\x03 \x01(\x05\x12\n\n\x02\x66\x64\x18\x04 \x01(\x05\"/\n\x0e\x63riu_veth_pair\x12\r\n\x05if_in\x18\x01 \x02(\t\x12\x0e\n\x06if_out\x18\x02 \x02(\t\")\n\rext_mount_map\x12\x0b\n\x03key\x18\x01 \x02(\t\x12\x0b\n\x03val\x18\x02 \x02(\t\"@\n\x0ejoin_namespace\x12\n\n\x02ns\x18\x01 \x02(\t\x12\x0f\n\x07ns_file\x18\x02 \x02(\t\x12\x11\n\textra_opt\x18\x03 \x01(\t\"%\n\ninherit_fd\x12\x0b\n\x03key\x18\x01 \x02(\t\x12\n\n\x02\x66\x64\x18\x02 \x02(\x05\")\n\x0b\x63group_root\x12\x0c\n\x04\x63trl\x18\x01 \x01(\t\x12\x0c\n\x04path\x18\x02 \x02(\t\"\x18\n\x07unix_sk\x12\r\n\x05inode\x18\x01 \x02(\r\"\xa7\r\n\tcriu_opts\x12\x19\n\rimages_dir_fd\x18\x01 \x02(\x05:\x02-1\x12\x12\n\nimages_dir\x18\x44 \x01(\t\x12\x0b\n\x03pid\x18\x02 \x01(\x05\x12\x15\n\rleave_running\x18\x03 \x01(\x08\x12\x13\n\x0b\x65xt_unix_sk\x18\x04 \x01(\x08\x12\x17\n\x0ftcp_established\x18\x05 \x01(\x08\x12\x17\n\x0f\x65vasive_devices\x18\x06 \x01(\x08\x12\x11\n\tshell_job\x18\x07 \x01(\x08\x12\x12\n\nfile_locks\x18\x08 \x01(\x08\x12\x14\n\tlog_level\x18\t \x01(\x05:\x01\x32\x12\x10\n\x08log_file\x18\n \x01(\t\x12\"\n\x02ps\x18\x0b \x01(\x0b\x32\x16.criu_page_server_info\x12\x16\n\x0enotify_scripts\x18\x0c \x01(\x08\x12\x0c\n\x04root\x18\r \x01(\t\x12\x12\n\nparent_img\x18\x0e \x01(\t\x12\x11\n\ttrack_mem\x18\x0f \x01(\x08\x12\x12\n\nauto_dedup\x18\x10 \x01(\x08\x12\x13\n\x0bwork_dir_fd\x18\x11 \x01(\x05\x12\x12\n\nlink_remap\x18\x12 \x01(\x08\x12\x1e\n\x05veths\x18\x13 \x03(\x0b\x32\x0f.criu_veth_pair\x12\x1b\n\x07\x63pu_cap\x18\x14 \x01(\r:\n4294967295\x12\x13\n\x0b\x66orce_irmap\x18\x15 \x01(\x08\x12\x10\n\x08\x65xec_cmd\x18\x16 \x03(\t\x12\x1f\n\x07\x65xt_mnt\x18\x17 \x03(\x0b\x32\x0e.ext_mount_map\x12\x16\n\x0emanage_cgroups\x18\x18 \x01(\x08\x12\x1d\n\x07\x63g_root\x18\x19 \x03(\x0b\x32\x0c.cgroup_root\x12\x13\n\x0brst_sibling\x18\x1a \x01(\x08\x12\x1f\n\ninherit_fd\x18\x1b \x03(\x0b\x32\x0b.inherit_fd\x12\x14\n\x0c\x61uto_ext_mnt\x18\x1c \x01(\x08\x12\x13\n\x0b\x65xt_sharing\x18\x1d \x01(\x08\x12\x13\n\x0b\x65xt_masters\x18\x1e \x01(\x08\x12\x10\n\x08skip_mnt\x18\x1f \x03(\t\x12\x11\n\tenable_fs\x18  \x03(\t\x12\x1d\n\x0bunix_sk_ino\x18! \x03(\x0b\x32\x08.unix_sk\x12*\n\x13manage_cgroups_mode\x18\" \x01(\x0e\x32\r.criu_cg_mode\x12\x1c\n\x0bghost_limit\x18# \x01(\r:\x07\x31\x30\x34\x38\x35\x37\x36\x12\x18\n\x10irmap_scan_paths\x18$ \x03(\t\x12\x10\n\x08\x65xternal\x18% \x03(\t\x12\x10\n\x08\x65mpty_ns\x18& \x01(\r\x12 \n\x07join_ns\x18\' \x03(\x0b\x32\x0f.join_namespace\x12\x14\n\x0c\x63group_props\x18) \x01(\t\x12\x19\n\x11\x63group_props_file\x18* \x01(\t\x12\x1e\n\x16\x63group_dump_controller\x18+ \x03(\t\x12\x15\n\rfreeze_cgroup\x18, \x01(\t\x12\x0f\n\x07timeout\x18- \x01(\r\x12\x1a\n\x12tcp_skip_in_flight\x18. \x01(\x08\x12\x14\n\x0cweak_sysctls\x18/ \x01(\x08\x12\x12\n\nlazy_pages\x18\x30 \x01(\x08\x12\x11\n\tstatus_fd\x18\x31 \x01(\x05\x12\x19\n\x11orphan_pts_master\x18\x32 \x01(\x08\x12\x13\n\x0b\x63onfig_file\x18\x33 \x01(\t\x12\x11\n\ttcp_close\x18\x34 \x01(\x08\x12\x13\n\x0blsm_profile\x18\x35 \x01(\t\x12\x12\n\ntls_cacert\x18\x36 \x01(\t\x12\x11\n\ttls_cacrl\x18\x37 \x01(\t\x12\x10\n\x08tls_cert\x18\x38 \x01(\t\x12\x0f\n\x07tls_key\x18\x39 \x01(\t\x12\x0b\n\x03tls\x18: \x01(\x08\x12\x18\n\x10tls_no_cn_verify\x18; \x01(\x08\x12\x13\n\x0b\x63group_yard\x18< \x01(\t\x12\x32\n\rpre_dump_mode\x18= \x01(\x0e\x32\x13.criu_pre_dump_mode:\x06SPLICE\x12\x16\n\x0epidfd_store_sk\x18> \x01(\x05\x12\x19\n\x11lsm_mount_context\x18? \x01(\t\x12\x39\n\x0cnetwork_lock\x18@ \x01(\x0e\x32\x19.criu_network_lock_method:\x08IPTABLES\x12\x19\n\x11mntns_compat_mode\x18\x41 \x01(\x08\x12\x1b\n\x13skip_file_rwx_check\x18\x42 \x01(\x08\x12\x14\n\x0cunprivileged\x18\x43 \x01(\x08\x12\x15\n\rleave_stopped\x18\x45 \x01(\x08\x12\x15\n\rdisplay_stats\x18\x46 \x01(\x08\x12\x15\n\rlog_to_stderr\x18G \x01(\x08\"\"\n\x0e\x63riu_dump_resp\x12\x10\n\x08restored\x18\x01 \x01(\x08\" \n\x11\x63riu_restore_resp\x12\x0b\n\x03pid\x18\x01 \x02(\x05\"*\n\x0b\x63riu_notify\x12\x0e\n\x06script\x18\x01 \x01(\t\x12\x0b\n\x03pid\x18\x02 \x01(\x05\"K\n\rcriu_features\x12\x11\n\tmem_track\x18\x01 \x01(\x08\x12\x12\n\nlazy_pages\x18\x02 \x01(\x08\x12\x13\n\x0bpidfd_store\x18\x03 \x01(\x08\"\x9c\x01\n\x08\x63riu_req\x12\x1c\n\x04type\x18\x01 \x02(\x0e\x32\x0e.criu_req_type\x12\x18\n\x04opts\x18\x02 \x01(\x0b\x32\n.criu_opts\x12\x16\n\x0enotify_success\x18\x03 \x01(\x08\x12\x11\n\tkeep_open\x18\x04 \x01(\x08\x12 \n\x08\x66\x65\x61tures\x18\x05 \x01(\x0b\x32\x0e.criu_features\x12\x0b\n\x03pid\x18\x06 \x01(\r\"\xb7\x02\n\tcriu_resp\x12\x1c\n\x04type\x18\x01 \x02(\x0e\x32\x0e.criu_req_type\x12\x0f\n\x07success\x18\x02 \x02(\x08\x12\x1d\n\x04\x64ump\x18\x03 \x01(\x0b\x32\x0f.criu_dump_resp\x12#\n\x07restore\x18\x04 \x01(\x0b\x32\x12.criu_restore_resp\x12\x1c\n\x06notify\x18\x05 \x01(\x0b\x32\x0c.criu_notify\x12\"\n\x02ps\x18\x06 \x01(\x0b\x32\x16.criu_page_server_info\x12\x10\n\x08\x63r_errno\x18\x07 \x01(\x05\x12 \n\x08\x66\x65\x61tures\x18\x08 \x01(\x0b\x32\x0e.criu_features\x12\x11\n\tcr_errmsg\x18\t \x01(\t\x12\x1e\n\x07version\x18\n \x01(\x0b\x32\r.criu_version\x12\x0e\n\x06status\x18\x0b \x01(\x05\"x\n\x0c\x63riu_version\x12\x14\n\x0cmajor_number\x18\x01 \x02(\x05\x12\x14\n\x0cminor_number\x18\x02 \x02(\x05\x12\r\n\x05gitid\x18\x03 \x01(\t\x12\x10\n\x08sublevel\x18\x04 \x01(\x05\x12\r\n\x05\x65xtra\x18\x05 \x01(\x05\x12\x0c\n\x04name\x18\x06 \x01(\t*_\n\x0c\x63riu_cg_mode\x12\n\n\x06IGNORE\x10\x00\x12\x0b\n\x07\x43G_NONE\x10\x01\x12\t\n\x05PROPS\x10\x02\x12\x08\n\x04SOFT\x10\x03\x12\x08\n\x04\x46ULL\x10\x04\x12\n\n\x06STRICT\x10\x05\x12\x0b\n\x07\x44\x45\x46\x41ULT\x10\x06*@\n\x18\x63riu_network_lock_method\x12\x0c\n\x08IPTABLES\x10\x01\x12\x0c\n\x08NFTABLES\x10\x02\x12\x08\n\x04SKIP\x10\x03*-\n\x12\x63riu_pre_dump_mode\x12\n\n\x06SPLICE\x10\x01\x12\x0b\n\x07VM_READ\x10\x02*\xe5\x01\n\rcriu_req_type\x12\t\n\x05\x45MPTY\x10\x00\x12\x08\n\x04\x44UMP\x10\x01\x12\x0b\n\x07RESTORE\x10\x02\x12\t\n\x05\x43HECK\x10\x03\x12\x0c\n\x08PRE_DUMP\x10\x04\x12\x0f\n\x0bPAGE_SERVER\x10\x05\x12\n\n\x06NOTIFY\x10\x06\x12\x10\n\x0c\x43PUINFO_DUMP\x10\x07\x12\x11\n\rCPUINFO_CHECK\x10\x08\x12\x11\n\rFEATURE_CHECK\x10\t\x12\x0b\n\x07VERSION\x10\n\x12\x0c\n\x08WAIT_PID\x10\x0b\x12\x14\n\x10PAGE_SERVER_CHLD\x10\x0c\x12\x13\n\x0fSINGLE_PRE_DUMP\x10\r')
  File "/usr/local/lib64/python3.7/site-packages/google/protobuf/descriptor.py", line 1066, in __new__
    return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: expected bytes, str found

@Snorch
Copy link
Member

Snorch commented May 11, 2024

./lib/pycriu/rpc_pb2.py is an autogenerated file on my system it has bytes explicityly:

DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\trpc.proto ...

Maybe just need to rebuild everything clenanly git clean -dxf; make; ./test/zdtm.py run -a --ignore-taint --keep-going or maybe there is some real error with python3-protobuf package on your system...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants