-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods with memory requests/limits set cannot start on 1.23.1 + ubuntu 2004 + cri-o 1.23.0 #5527
Comments
that is bizarre. @giuseppe do you have any insight? we use the podman IsCgroup2UnifiedMode to verify we're on cgroupv2. @glitchcrab can you run a sanity check |
@haircommander sure, this is the output:
|
interesting, where is your systemd cgroup mounted at? and what is the output of |
oh and what kernel args did you use to set cgroupv2? |
the system is not using the cgroupv2 unified hierarchy. With cgroupv2 the output should look like:
Please make sure to pass |
apologies, i didn't realise that i hadn't configured the machines properly. I've now done that:
And the same issue occurs, albeit with a slightly different error:
this is with cri-o 1.23 installed again. node configuration remains the same as mentioned in my original issue details, and this only affects pods with requests/limits set. my coredns deployment is configured like this:
|
can we have the crio logs now? it seems crio is trying to set swap for the container (maybe swap is on) but the memsw cgroup isn't configured. We should gracefully handle this situation but clearly aren't |
These are the debug logs from about 1-2s before I attempted to schedule a pod on that node: crio-debug.log |
Though the issue opened for upgrading to 1.23.1 the same issue arises when creating the cluster from scratch. I'm new to the kubernetes world and learning how to create the cluster following the guides on kubernetes.io. I was struggling with creating a cluster on my test lab with 1.23 until I've found this issue and using the version 1.22 solved my issue. First it was related to the cgroups configuration on stock ubuntu 20.04.3 and later to swap memory as stated above by topicstarter |
sorry should have been more clear. I'll need the logs from the beginning of the cri-o run (specifically, I want to see if we're correctly seeing that memsw is not setup) |
cgroupv1 or cgroupv2? |
I guess it is cgroupv2, I've set the systemd.unified_cgroup_hierarchy=1 in grub and it showed cgroup2fs for the command: stat -f -c%T /sys/fs/cgroup |
yeah that sounds like cgroupv2. I am guessing CRI-O reports a log line like |
also just to check, is swap enabled? |
I've disabled swap as a first step for kubernetes preparation. ;) |
I may try again on in a few days. I'll let you know the outcome. |
Ok, it was a quick test :) kubectl get pods --all-namespaces And the relevant logs are:
|
this debug log is from the node booting with debug logging enabled: debug.log |
@glitchcrab what's the output of |
|
oopsies, this is definitely just a bug in cri-o: #5539 (we used to do this but accidentally dropped it when swap support was added) |
I am building a kubernetes cluster for the first time with the same configuration and was baffled by the error for several days Extremely lucky to find this post |
fix is merged in main branch, I'm backporting to 1.23 and intend on cutting a 1.23.1 soon |
Description
This cluster was originally created with k8s 1.22.2 on ubuntu 2004 vms using kubeadm with no special config. When upgrading to 1.23.1, pods with resource requests and/or limits set fail to start with the following error:
cri-o logs show the following:
Steps to reproduce the issue:
Describe the results you received:
Pods with resource limits/requests set fail to start.
Describe the results you expected:
Pods should start.
Additional information you deem important (e.g. issue happens only occasionally):
I note that when running crio manually, I see the following logs:
This feels somewhat relevant because the reason the pod cannot start is the lack of
memory.memsw.limit_in_bytes
- as I understand it this is related to swap (which is disabled). I'm also puzzled by the log about cgroupv2 configuration being false - crio is configured to use systemd as the cgroup manager and systemd is using cgroupv2.Downgrading cri-o to 1.22 allows pods to start as normal.
Output of
crio --version
:Additional environment details (AWS, VirtualBox, physical, etc.):
I'm unsure if it's related, but
containers-common
was also upgraded at the same time from 1-21 to 1-22.The text was updated successfully, but these errors were encountered: