New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubectl top pods doesn't work with cri-o on K8s v1.30.0-rc.2 #8034
Comments
The problem is the same with K8s |
@uablrek thank you for the report! Can I use the metrics server with When using kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml Then I have to edit the deployment But then I still see: > kubectl logs -f metrics-server-d994c478f-9zs5k
…
E0418 08:09:07.743449 1 scraper.go:149] "Failed to scrape node" err="request failed, status: \"403 Forbidden\"" node="127.0.0.1"
I0418 08:09:12.047455 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0418 08:09:22.049166 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0418 08:09:22.738901 1 scraper.go:149] "Failed to scrape node" err="request failed, status: \"403 Forbidden\"" node="127.0.0.1"
I0418 08:09:32.047754 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve" Do I have to adjust anything else? 🤔 |
Not sure with hack/local-up-cluster.sh. But I had to add a number of options to the api-server. I copied from kubernetes-sigs/metrics-server#1014. Here is my commit Nordix/xcluster@c0ff147. It's for xcluster, but you can take it as a hint 😃 |
I tried it in KinD, and the metrics-server starts ok. But I used K8s v1.29.1, and KinD uses containerd so "kubectl top pods -n kube-system" worked |
Naturally @aojea has made a gist on howto use crio in KinD 😄 |
Oh, you had commented that already. I got to go now, but if I get the time I will try if I get the error with KinD later |
I made a mistake testing crio v1.29.2: I used my old crio config and use |
what is your crio and kubelet config? I wonder if CRI stats was turned (where cri-o doesn't currently report metrics, but will in 1.30.0) |
No kubelet config, but started with:
Crio config: crio.conf.txt Please note that it works with the same start/config on K8s v1.29.3. And it also works with K8s v1.30.0 and containerd. A difference is that I use |
try get this kubectl -n kube-system get apiservices.apiregistration.k8s.io |grep metrics-server just see there has api for metrics, if not reinstall metrics-server |
Please note that since "kubectl top pods" works with K8s v1.30.0 and containerd, it is unlikely that it is something wrong with my K8s config. And the metric-server is running and ready with k8s v1.30.0 and crio 1.29.2, but getting stats for pods fails. "kubectl top nodes" works fine always.
|
|
not see api in this crio |
|
i try it and ok [root@master ~]# kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
[root@master ~]# crio version
INFO[2024-04-19 15:45:34.034627675+08:00] Starting CRI-O, version: 1.28.4, git: c5fc2a463053cf988db2aebe9b762700484922e5(clean)
Version: 1.28.4
GitCommit: c5fc2a463053cf988db2aebe9b762700484922e5
GitCommitDate: 2024-02-22T19:17:55Z
GitTreeState: clean
BuildDate: 1970-01-01T00:00:00Z
GoVersion: go1.20.10
Compiler: gc
Platform: linux/amd64
Linkmode: static
BuildTags:
static
netgo
osusergo
exclude_graphdriver_btrfs
exclude_graphdriver_devicemapper
seccomp
apparmor
selinux
LDFlags: unknown
SeccompEnabled: true
AppArmorEnabled: false
[root@master ~]# kubectl top pods -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-7db6d8ff4d-7ks25 3m 14Mi
coredns-7db6d8ff4d-gf8s4 2m 12Mi
etcd-master 21m 50Mi
kube-apiserver-master 41m 346Mi
kube-controller-manager-master 16m 60Mi
kube-proxy-7j8b2 1m 17Mi
kube-scheduler-master 2m 23Mi
metrics-server-7bcfbbf584-z5fw2 3m 19Mi
[root@master ~]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 267m 6% 1359Mi 19%
[root@master ~]# |
I have now installed K8s v1.30.0 with |
I should mention that I am only testing, no production cluster. So, no urgency for my part |
Works fine here in a 1-node test cluster bootstrapped with $ crio version
INFO[2024-04-25 07:34:27.402979362Z] Starting CRI-O, version: 1.29.3, git: 12c618780c42414d92d6a8dc8d09c16337668eb2(clean)
Version: 1.29.3
GitCommit: 12c618780c42414d92d6a8dc8d09c16337668eb2
GitCommitDate: 2024-04-19T14:33:22Z
GitTreeState: clean
BuildDate: 1970-01-01T00:00:00Z
GoVersion: go1.21.7
Compiler: gc
Platform: linux/amd64
Linkmode: static
BuildTags:
static
netgo
osusergo
exclude_graphdriver_btrfs
exclude_graphdriver_devicemapper
seccomp
apparmor
selinux
LDFlags: unknown
SeccompEnabled: true
AppArmorEnabled: false
$ kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
$ helm list -n kube-system
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
metrics-server kube-system 2 2024-04-25 07:32:03.844304125 +0000 UTC deployed metrics-server-3.12.1 0.7.1
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ubuntu-test 41m 2% 1497Mi 39%
$ kubectl top pod -A
NAMESPACE NAME CPU(cores) MEMORY(bytes)
default nginx 0m 3Mi
kube-system calico-kube-controllers-ddf655445-6m4g9 1m 10Mi
kube-system calico-node-tzwb5 8m 114Mi
kube-system coredns-7db6d8ff4d-9wc2f 1m 12Mi
kube-system coredns-7db6d8ff4d-zs9cb 1m 12Mi
kube-system etcd-ubuntu-test 7m 36Mi
kube-system kube-apiserver-ubuntu-test 17m 227Mi
kube-system kube-controller-manager-ubuntu-test 4m 44Mi
kube-system kube-proxy-jzb9c 1m 11Mi
kube-system kube-scheduler-ubuntu-test 1m 16Mi
kube-system metrics-server-68cfccbdf6-8vh97 1m 15Mi |
If you can't reproduce, please close this issue with "not reproducible". It can be a problem with my environment, but it is very clear that something has happened in K8s v1.30, so there may be others who encounter this problem in the future. My aim was testing the horizontal autoscaler, and I can do that with containerd. No problem |
There seem to be a general "get resources" problem with the combination K8s-v1.30+crio. Eviction test with: apiVersion: v1
kind: Pod
metadata:
name: test-pod-evicted
spec:
containers:
- name: alpine
image: alpine
imagePullPolicy: IfNotPresent
command: ["/bin/sh", "-c", "sleep 10; dd if=/dev/zero of=file bs=1M count=10; sleep 10000"]
resources:
limits:
ephemeral-storage: 5Mi
requests:
ephemeral-storage: 5Mi doesn't work. The pod runs forever. However, the pod get's evicted with K8s-v1.29.4+crio, or with containerd and any K8s version (including master) |
@uablrek I cant reproduce that with the mentioned pod using CRI-O d5666af and Kubernetes c6b6163e2e8:
And the kubelet logs:
|
It must be something is my environment then, but I can't understand what. Since I get it working with other combinations, it must be something that happened in K8s v1.30.0, but may be unrelated to crio. My run differs from yours:
It gets evicted after a minute or so, goes to "Error" and never restarts. BTW, I built crio from source (on master), but get the same problem. But I couldn't even start containers with "crun" (my default), but "runc" worked. My guess was that new crio needs a newer crun, but I didn't investigate further. |
do you use a non standard crio socket path? it seems this patch fixes this problem for cadvisor metrics https://github.com/kubernetes/kubernetes/pull/118704/files |
I don't think so:
|
What happened?
When using K8s v1.30.0-rc.2 and crio then pod metrics never become available:
Tested with crio
1.28.1
and1.29.2
with the same result.Metric-server
v0.7.1
is used, and seem to run without problems:No errors in the metrics-server logs.
What did you expect to happen?
"kubectl top pods" should work. It works with K8s v1.29.x
How can we reproduce it (as minimally and precisely as possible)?
components.yaml
methodAnything else we need to know?
It works with K8s v1.29.3 and cri-o.
It also works with K8s v1.30.0-rc.2 and containerd
1.7.11
and2.0.0-beta.2
.CRI-O and Kubernetes version
See above
OS version
Linux 6.8.0
Additional environment details (AWS, VirtualBox, physical, etc.)
The text was updated successfully, but these errors were encountered: