-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mnt-v2 errors with Kubernetes #2023
Comments
Thanks! Here is what we might also need to understand the problem:
Can you please look on
Can you please check on that system if later functions called from do_set_group are inlined or not, so we can say for sure if we didn't reach them or they were just not visible to gftrace:
For now I can assume (see
|
The whole setup looks like this. Initially we have a pod with two containers. One container is the so-called infrastructure container and the other container is being checkpointed. The infrastructure container and the target container both bind mount the .containerenv file into their own mount namespace from the host. During restore a new pod is created with a new infrastructure container which bind mounts the new .containerenv file. For the restored container I am changing the source of the .containerenv mount to point to the new file and CRIU should remount that new file to the old location. This is why the things are set up the way they are. Changing the source of a bind mount happens a lot during container restore for Kubernetes as there are multiple mount points which are mounted from the host into the container which are pod specific. As the restore is done into a new pod all these bind mounts are changed. I hope that makes some sense.
|
I don't think it's first two, so the most promising is superblock check fail.
From what you are explaining and what I see in logs:
mount-v2 engine is trying to copy sharing from external source path to mount 887
where mount 887 has mountpoint /etc/hostname in container. Here we see inconsistency, why from '.containerenv' to 'hostname'? From these other lines I can assume that same file was bindmounted in both places on dump:
That's why copying sharing between '.containerenv' and 'hostname' is reasonable and should've worked if external mounts were provided right to CRIU. But it seems that on restore you try to put different files to them -> don't do that. /run/containers/storage/overlay-containers/bd2b0db91a38663ed6e13aeb15709be5d9598f95d17aa8904ce8ad1d2f8bc564/userdata/hostname and /var/lib/containers/storage/overlay-containers/bd2b0db91a38663ed6e13aeb15709be5d9598f95d17aa8904ce8ad1d2f8bc564/userdata/.containerenv should be same file in same path in it's filesystem. If one file was externaly mounted in several places in CT on restore external options should point all to the same file from the same filesystem, with same inode number else even file restore code would fail. |
Thanks for your analysis. Looks like I might be doing something wrong. I will take a look at the higher level code and report back here. |
It looks like the container definition I am creating is correctly setup: {
"destination": "/etc/resolv.conf",
"type": "bind",
"source": "/run/containers/storage/overlay-containers/f3eea75d926796a9d76a2fbe4c1a3b6632d3b06c9caeca015fc44c8f05080205/userdata/resolv.conf",
"options": [
"rw",
"bind",
"nodev",
"nosuid",
"noexec"
]
},
{
"destination": "/etc/hostname",
"type": "bind",
"source": "/run/containers/storage/overlay-containers/f3eea75d926796a9d76a2fbe4c1a3b6632d3b06c9caeca015fc44c8f05080205/userdata/hostname",
"options": [
"rw",
"bind"
]
},
{
"destination": "/run/.containerenv",
"type": "bind",
"source": "/var/lib/containers/storage/overlay-containers/f3eea75d926796a9d76a2fbe4c1a3b6632d3b06c9caeca015fc44c8f05080205/userdata/.containerenv",
"options": [
"rw",
"bind"
]
}, and the
So it seems I am telling runc the right new bind mount mappings and CRIU also lists them in the log file. All three files exist on the file system:
But I do not understand the part about the sharing. What does this mean:
what does it mean pointing to |
I tried to collect some more data and now I found something interesting. If I restore the container in Kubernetes it works if I restore the container just in CRI-O it fails. Attached are both log files. Do you see why one works and the other doesn't. Maybe I am not correctly setting up a resource, mount or directory. Let me know if you see any differences. |
I encountered the same error while trying to restore with Kubernetes. My mountinfo while restoring shows different file systems for the different files:
If I understand @Snorch correctly, this is the cause of the failure. If I look into mountinfo of the running container(switching to the mount namespace with nsenter) it looks like this:
So they all come from the same tmpfs, but on restore the mount setup is somehow different. Is this actually a bug in runc? |
Thanks for highlighting this. That helps.
No, not really. I need to think about this a bit. |
So this seems to be related to pods with or without infrastructure containers. If you checkpoint a container out of a pod without an infrastructure container you have to restore the container in a pod without an infrastructure container. Or if the original pod had an infrastructure container then the restored containers also needs to be in a pod with an infrastructure container. It is not clear why the location of @Snorch thanks for your time looking at this. @hesch thanks for chiming in. Closing as it does not seem to be a CRIU problem. |
@Snorch Some time ago we talked about this in the chat. Trying to restore a container with Kubernetes fails with:
Using
mntns-compat-mode
in the configuration file during restore I can restore my container.You said you needed a trace with https://github.com/Snorch/linux-helpers/blob/master/gftrace.sh:
And finally here is the trace:
This happens with criu 3.17.1 and 6.0.11-300.fc37.x86_64
Let me know if you need more debug data.
The text was updated successfully, but these errors were encountered: