Kubectl top pods doesn't work with cri-o on K8s v1.30.0-rc.2 #8034

uablrek · 2024-04-17T16:51:57Z

What happened?

When using K8s v1.30.0-rc.2 and crio then pod metrics never become available:

# kubectl top pods
error: Metrics not available for pod default/tserver-6854544b4b-fjmmg, age: 8m4.082260059s

Tested with crio 1.28.1 and 1.29.2 with the same result.

Metric-server v0.7.1 is used, and seem to run without problems:

# kubectl get pod -n kube-system   metrics-server-d994c478f-sxbf4
NAME                             READY   STATUS    RESTARTS   AGE
metrics-server-d994c478f-sxbf4   1/1     Running   0          12m

No errors in the metrics-server logs.

What did you expect to happen?

"kubectl top pods" should work. It works with K8s v1.29.x

How can we reproduce it (as minimally and precisely as possible)?

Install K8s v1.30.0-rc.2 (or K8s built on master)
Install the metrics-server. I use the components.yaml method
Wait until the metrics-server is running 1/1, and yet some time
Do "kubectl top pods" in some namespace with pods

Anything else we need to know?

It works with K8s v1.29.3 and cri-o.
It also works with K8s v1.30.0-rc.2 and containerd 1.7.11 and 2.0.0-beta.2.

CRI-O and Kubernetes version

See above

OS version

Linux 6.8.0

Additional environment details (AWS, VirtualBox, physical, etc.)

The text was updated successfully, but these errors were encountered:

uablrek · 2024-04-18T05:42:43Z

The problem is the same with K8s v1.30.0, released yesterday Apr 17.

saschagrunert · 2024-04-18T08:09:53Z

@uablrek thank you for the report! Can I use the metrics server with hack/local-up-cluster.sh?

When using hack/local-up-cluster.sh and installing the metrics server like that:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Then I have to edit the deployment metrics-server to add --kubelet-insecure-tls to the container args.

But then I still see:

> kubectl logs -f metrics-server-d994c478f-9zs5k
…
E0418 08:09:07.743449       1 scraper.go:149] "Failed to scrape node" err="request failed, status: \"403 Forbidden\"" node="127.0.0.1"
I0418 08:09:12.047455       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
I0418 08:09:22.049166       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
E0418 08:09:22.738901       1 scraper.go:149] "Failed to scrape node" err="request failed, status: \"403 Forbidden\"" node="127.0.0.1"
I0418 08:09:32.047754       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"

Do I have to adjust anything else? 🤔

uablrek · 2024-04-18T08:37:40Z

Not sure with hack/local-up-cluster.sh. But I had to add a number of options to the api-server. I copied from kubernetes-sigs/metrics-server#1014. Here is my commit Nordix/xcluster@c0ff147. It's for xcluster, but you can take it as a hint 😃

uablrek · 2024-04-18T08:43:47Z

I tried it in KinD, and the metrics-server starts ok. But I used K8s v1.29.1, and KinD uses containerd so "kubectl top pods -n kube-system" worked

uablrek · 2024-04-18T08:50:14Z

Naturally @aojea has made a gist on howto use crio in KinD 😄

uablrek · 2024-04-18T08:53:44Z

Oh, you had commented that already. I got to go now, but if I get the time I will try if I get the error with KinD later

uablrek · 2024-04-18T15:03:57Z

I made a mistake testing crio v1.29.2: I used my old crio config and use crun instead of crio-crun. I corrected it but it doesn't seem to matter

haircommander · 2024-04-19T03:41:53Z

what is your crio and kubelet config? I wonder if CRI stats was turned (where cri-o doesn't currently report metrics, but will in 1.30.0)

uablrek · 2024-04-19T05:10:25Z

No kubelet config, but started with:

kubelet --address=:: --container-runtime-endpoint=unix:///var/run/crio/crio.sock --image-service-endpoint=unix:///var/run/crio/ \
crio.sock --node-ip=192.168.1.2,fd00::192.168.1.2 --register-node=true --kubeconfig /etc/kubernetes/kubeconfig.token \
--cluster-dns=192.168.1.2 --cluster-domain=xcluster --runtime-cgroups=/ --kubelet-cgroups=/

Crio config: crio.conf.txt

Please note that it works with the same start/config on K8s v1.29.3. And it also works with K8s v1.30.0 and containerd.

A difference is that I use crun for cri-o, and runc for containerd. I will see if I can check that

pycgo · 2024-04-19T06:54:01Z

try get this

kubectl -n kube-system get apiservices.apiregistration.k8s.io |grep metrics-server
v1beta1.metrics.k8s.io kube-system/metrics-server

just see there has api for metrics, if not reinstall metrics-server

uablrek · 2024-04-19T07:11:00Z

# kubectl -n kube-system get apiservices.apiregistration.k8s.io |grep m
etrics-server
v1beta1.metrics.k8s.io                  kube-system/metrics-server   True        2m2s

Please note that since "kubectl top pods" works with K8s v1.30.0 and containerd, it is unlikely that it is something wrong with my K8s config. And the metric-server is running and ready with k8s v1.30.0 and crio 1.29.2, but getting stats for pods fails. "kubectl top nodes" works fine always.

# kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
# containerd -v
containerd github.com/containerd/containerd v1.7.11 64b8a811b07ba6288238eefc14d898ee0b5b99ba
# kubectl top pods
NAME                       CPU(cores)   MEMORY(bytes)   
tserver-6854544b4b-8lv6j   0m           3Mi             
tserver-6854544b4b-b8qv9   0m           3Mi             
tserver-6854544b4b-kcxcs   0m           3Mi             
tserver-6854544b4b-vfq9x   0m           3Mi

uablrek · 2024-04-19T07:20:37Z

# kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
# crio version
INFO[2024-04-19 07:13:00.852114859Z] Starting CRI-O, version: 1.29.2, git: d317b5dc918bbfbc78481072a0d93e572aa8d0e8(clean) 
Version:        1.29.2
GitCommit:      d317b5dc918bbfbc78481072a0d93e572aa8d0e8
GitCommitDate:  2024-02-22T19:23:38Z
GitTreeState:   clean
BuildDate:      1970-01-01T00:00:00Z
GoVersion:      go1.21.1
Compiler:       gc
Platform:       linux/amd64
Linkmode:       static
BuildTags:      
  static
  netgo
  osusergo
  exclude_graphdriver_btrfs
  exclude_graphdriver_devicemapper
  seccomp
  apparmor
  selinux
LDFlags:          unknown
SeccompEnabled:   true
AppArmorEnabled:  false
# kubectl get pod -n kube-system  -l k8s-app=metrics-server
NAME                             READY   STATUS    RESTARTS   AGE
metrics-server-d994c478f-mwkjf   1/1     Running   0          2m59s
# kubectl top pods
error: metrics not available yet
# (for some minutes, then...)
# kubectl top pods
error: Metrics not available for pod default/tserver-6854544b4b-2nmgw, age: 2m36.007635117s
# (forever. well at least >10m)

pycgo · 2024-04-19T07:23:55Z

# kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
# crio version
INFO[2024-04-19 07:13:00.852114859Z] Starting CRI-O, version: 1.29.2, git: d317b5dc918bbfbc78481072a0d93e572aa8d0e8(clean) 
Version:        1.29.2
GitCommit:      d317b5dc918bbfbc78481072a0d93e572aa8d0e8
GitCommitDate:  2024-02-22T19:23:38Z
GitTreeState:   clean
BuildDate:      1970-01-01T00:00:00Z
GoVersion:      go1.21.1
Compiler:       gc
Platform:       linux/amd64
Linkmode:       static
BuildTags:      
  static
  netgo
  osusergo
  exclude_graphdriver_btrfs
  exclude_graphdriver_devicemapper
  seccomp
  apparmor
  selinux
LDFlags:          unknown
SeccompEnabled:   true
AppArmorEnabled:  false
# kubectl get pod -n kube-system  -l k8s-app=metrics-server
NAME                             READY   STATUS    RESTARTS   AGE
metrics-server-d994c478f-mwkjf   1/1     Running   0          2m59s
# kubectl top pods
error: metrics not available yet
# (for some minutes, then...)
# kubectl top pods
error: Metrics not available for pod default/tserver-6854544b4b-2nmgw, age: 2m36.007635117s
# (forever. well at least >10m)

not see api in this crio
kubectl -n kube-system get apiservices.apiregistration.k8s.io |grep m

uablrek · 2024-04-19T07:26:56Z

# kubectl -n kube-system get apiservices.apiregistration.k8s.io |grep m
v1.                                     Local                        True        9m14s
v1.admissionregistration.k8s.io         Local                        True        9m14s
v1.apiextensions.k8s.io                 Local                        True        9m14s
v1.apps                                 Local                        True        9m14s
v1.authentication.k8s.io                Local                        True        9m14s
v1.authorization.k8s.io                 Local                        True        9m14s
v1.autoscaling                          Local                        True        9m14s
v1.batch                                Local                        True        9m14s
v1.certificates.k8s.io                  Local                        True        9m14s
v1.coordination.k8s.io                  Local                        True        9m14s
v1.discovery.k8s.io                     Local                        True        9m14s
v1.events.k8s.io                        Local                        True        9m14s
v1.flowcontrol.apiserver.k8s.io         Local                        True        9m14s
v1.networking.k8s.io                    Local                        True        9m14s
v1.node.k8s.io                          Local                        True        9m14s
v1.policy                               Local                        True        9m14s
v1.rbac.authorization.k8s.io            Local                        True        9m14s
v1.scheduling.k8s.io                    Local                        True        9m14s
v1.storage.k8s.io                       Local                        True        9m14s
v1alpha1.admissionregistration.k8s.io   Local                        True        9m14s
v1alpha1.authentication.k8s.io          Local                        True        9m14s
v1alpha1.internal.apiserver.k8s.io      Local                        True        9m14s
v1alpha1.networking.k8s.io              Local                        True        9m14s
v1alpha1.storage.k8s.io                 Local                        True        9m14s
v1alpha2.resource.k8s.io                Local                        True        9m14s
v1beta1.admissionregistration.k8s.io    Local                        True        9m14s
v1beta1.authentication.k8s.io           Local                        True        9m14s
v1beta1.metrics.k8s.io                  kube-system/metrics-server   True        9m9s
v1beta3.flowcontrol.apiserver.k8s.io    Local                        True        9m14s
v2.autoscaling                          Local                        True        9m14s

pycgo · 2024-04-19T07:47:16Z

i try it and ok

[root@master ~]# kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
[root@master ~]# crio version
INFO[2024-04-19 15:45:34.034627675+08:00] Starting CRI-O, version: 1.28.4, git: c5fc2a463053cf988db2aebe9b762700484922e5(clean) 
Version:        1.28.4
GitCommit:      c5fc2a463053cf988db2aebe9b762700484922e5
GitCommitDate:  2024-02-22T19:17:55Z
GitTreeState:   clean
BuildDate:      1970-01-01T00:00:00Z
GoVersion:      go1.20.10
Compiler:       gc
Platform:       linux/amd64
Linkmode:       static
BuildTags:      
  static
  netgo
  osusergo
  exclude_graphdriver_btrfs
  exclude_graphdriver_devicemapper
  seccomp
  apparmor
  selinux
LDFlags:          unknown
SeccompEnabled:   true
AppArmorEnabled:  false

[root@master ~]# kubectl top pods -n kube-system 
NAME                              CPU(cores)   MEMORY(bytes)   
coredns-7db6d8ff4d-7ks25          3m           14Mi            
coredns-7db6d8ff4d-gf8s4          2m           12Mi            
etcd-master                       21m          50Mi            
kube-apiserver-master             41m          346Mi           
kube-controller-manager-master    16m          60Mi            
kube-proxy-7j8b2                  1m           17Mi            
kube-scheduler-master             2m           23Mi            
metrics-server-7bcfbbf584-z5fw2   3m           19Mi            
[root@master ~]# kubectl top nodes
NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
master   267m         6%     1359Mi          19%       
[root@master ~]#

uablrek · 2024-04-19T20:18:30Z

I have now installed K8s v1.30.0 with kubeadm, and I still get this problem (containerd works).

uablrek · 2024-04-23T09:55:46Z

I should mention that I am only testing, no production cluster. So, no urgency for my part

lbogdan · 2024-04-25T07:38:18Z

Works fine here in a 1-node test cluster bootstrapped with kubeadm, with cri-o 1.29.3, kubernetes 1.30.0, and metrics-server v0.7.1 (chart version 3.12.1):

$ crio version
INFO[2024-04-25 07:34:27.402979362Z] Starting CRI-O, version: 1.29.3, git: 12c618780c42414d92d6a8dc8d09c16337668eb2(clean)
Version:        1.29.3
GitCommit:      12c618780c42414d92d6a8dc8d09c16337668eb2
GitCommitDate:  2024-04-19T14:33:22Z
GitTreeState:   clean
BuildDate:      1970-01-01T00:00:00Z
GoVersion:      go1.21.7
Compiler:       gc
Platform:       linux/amd64
Linkmode:       static
BuildTags:
  static
  netgo
  osusergo
  exclude_graphdriver_btrfs
  exclude_graphdriver_devicemapper
  seccomp
  apparmor
  selinux
LDFlags:          unknown
SeccompEnabled:   true
AppArmorEnabled:  false

$ kubectl version
Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0

$ helm list -n kube-system
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
metrics-server  kube-system     2               2024-04-25 07:32:03.844304125 +0000 UTC deployed        metrics-server-3.12.1   0.7.1

$ kubectl top node
NAME          CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ubuntu-test   41m          2%     1497Mi          39%

$ kubectl top pod -A
NAMESPACE     NAME                                      CPU(cores)   MEMORY(bytes)
default       nginx                                     0m           3Mi
kube-system   calico-kube-controllers-ddf655445-6m4g9   1m           10Mi
kube-system   calico-node-tzwb5                         8m           114Mi
kube-system   coredns-7db6d8ff4d-9wc2f                  1m           12Mi
kube-system   coredns-7db6d8ff4d-zs9cb                  1m           12Mi
kube-system   etcd-ubuntu-test                          7m           36Mi
kube-system   kube-apiserver-ubuntu-test                17m          227Mi
kube-system   kube-controller-manager-ubuntu-test       4m           44Mi
kube-system   kube-proxy-jzb9c                          1m           11Mi
kube-system   kube-scheduler-ubuntu-test                1m           16Mi
kube-system   metrics-server-68cfccbdf6-8vh97           1m           15Mi

uablrek · 2024-04-25T08:35:18Z

If you can't reproduce, please close this issue with "not reproducible". It can be a problem with my environment, but it is very clear that something has happened in K8s v1.30, so there may be others who encounter this problem in the future.

My aim was testing the horizontal autoscaler, and I can do that with containerd. No problem

uablrek · 2024-04-28T08:06:23Z

There seem to be a general "get resources" problem with the combination K8s-v1.30+crio. Eviction test with:

apiVersion: v1
kind: Pod
metadata:
  name: test-pod-evicted
spec:
  containers:
  - name: alpine
    image: alpine
    imagePullPolicy: IfNotPresent
    command: ["/bin/sh", "-c", "sleep 10; dd if=/dev/zero of=file bs=1M count=10; sleep 10000"]
    resources:
      limits:
        ephemeral-storage: 5Mi
      requests:
        ephemeral-storage: 5Mi

doesn't work. The pod runs forever.

However, the pod get's evicted with K8s-v1.29.4+crio, or with containerd and any K8s version (including master)

saschagrunert · 2024-04-29T09:17:01Z

@uablrek I cant reproduce that with the mentioned pod using CRI-O d5666af and Kubernetes c6b6163e2e8:

> k get pods
NAME               READY   STATUS                   RESTARTS   AGE
test-pod-evicted   0/1     ContainerStatusUnknown   1          4m10s

And the kubelet logs:

I0429 11:13:43.162982  735336 status_manager.go:984] "Pod status is inconsistent with cached status for pod, a reconciliation should be triggered" pod="default/test-pod-evicted" statusDiff=<
	  &v1.PodStatus{
	- 	Phase:             "Running",
	+ 	Phase:             "Failed",
	  	Conditions:        {{Type: "PodReadyToStartContainers", Status: "True", LastTransitionTime: {Time: s"2024-04-29 11:12:22 +0200 CEST"}}, {Type: "Initialized", Status: "True", LastTransitionTime: {Time: s"2024-04-29 11:12:16 +0200 CEST"}}, {Type: "Ready", Status: "True", LastTransitionTime: {Time: s"2024-04-29 11:12:22 +0200 CEST"}}, {Type: "ContainersReady", Status: "True", LastTransitionTime: {Time: s"2024-04-29 11:12:22 +0200 CEST"}}, ...},
	- 	Message:           "",
	+ 	Message:           "Pod ephemeral local storage usage exceeds the total limit of containers 5Mi. ",
	- 	Reason:            "",
	+ 	Reason:            "Evicted",
	  	NominatedNodeName: "",
	  	HostIP:            "127.0.0.1",
	  	... // 2 identical fields
	  	PodIPs:                     {{IP: "10.88.0.3"}, {IP: "2001:db8:4860::3"}},
	  	StartTime:                  s"2024-04-29 11:12:16 +0200 CEST",
	- 	InitContainerStatuses:      nil,
	+ 	InitContainerStatuses:      []v1.ContainerStatus{},
	  	ContainerStatuses:          {{Name: "alpine", State: {Running: &{StartedAt: {Time: s"2024-04-29 11:12:21 +0200 CEST"}}}, Ready: true, Image: "docker.io/library/alpine:latest", ...}},
	  	QOSClass:                   "BestEffort",
	- 	EphemeralContainerStatuses: nil,
	+ 	EphemeralContainerStatuses: []v1.ContainerStatus{},
	  	Resize:                     "",
	  	ResourceClaimStatuses:      nil,
	  }

uablrek · 2024-04-29T11:06:12Z

It must be something is my environment then, but I can't understand what. Since I get it working with other combinations, it must be something that happened in K8s v1.30.0, but may be unrelated to crio.

My run differs from yours:

# kubectl get pods
NAME               READY   STATUS   RESTARTS   AGE
test-pod-evicted   0/1     Error    0          3m36s

It gets evicted after a minute or so, goes to "Error" and never restarts.

BTW, I built crio from source (on master), but get the same problem. But I couldn't even start containers with "crun" (my default), but "runc" worked. My guess was that new crio needs a newer crun, but I didn't investigate further.

aojea · 2024-04-29T11:22:23Z

do you use a non standard crio socket path? it seems this patch fixes this problem for cadvisor metrics https://github.com/kubernetes/kubernetes/pull/118704/files

uablrek · 2024-04-29T12:33:57Z

do you use a non standard crio socket path?

I don't think so:

# In /etc/crio/crio.conf
listen = "/var/run/crio/crio.sock"
# And kubelet started with:
--container-runtime-endpoint=unix:///var/run/crio/crio.sock

uablrek added the kind/bug Categorizes issue or PR as related to a bug. label Apr 17, 2024

uablrek mentioned this issue Apr 17, 2024

ovl/kubernetes: add more options to the api-server Nordix/xcluster#89

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubectl top pods doesn't work with cri-o on K8s v1.30.0-rc.2 #8034

Kubectl top pods doesn't work with cri-o on K8s v1.30.0-rc.2 #8034

uablrek commented Apr 17, 2024

uablrek commented Apr 18, 2024

saschagrunert commented Apr 18, 2024 •

edited

uablrek commented Apr 18, 2024

uablrek commented Apr 18, 2024

uablrek commented Apr 18, 2024

uablrek commented Apr 18, 2024

uablrek commented Apr 18, 2024 •

edited

haircommander commented Apr 19, 2024

uablrek commented Apr 19, 2024

pycgo commented Apr 19, 2024

uablrek commented Apr 19, 2024

uablrek commented Apr 19, 2024

pycgo commented Apr 19, 2024

uablrek commented Apr 19, 2024

pycgo commented Apr 19, 2024

uablrek commented Apr 19, 2024

uablrek commented Apr 23, 2024

lbogdan commented Apr 25, 2024 •

edited

uablrek commented Apr 25, 2024

uablrek commented Apr 28, 2024

saschagrunert commented Apr 29, 2024

uablrek commented Apr 29, 2024

aojea commented Apr 29, 2024

uablrek commented Apr 29, 2024

Kubectl top pods doesn't work with cri-o on K8s v1.30.0-rc.2 #8034

Kubectl top pods doesn't work with cri-o on K8s v1.30.0-rc.2 #8034

Comments

uablrek commented Apr 17, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

CRI-O and Kubernetes version

OS version

Additional environment details (AWS, VirtualBox, physical, etc.)

uablrek commented Apr 18, 2024

saschagrunert commented Apr 18, 2024 • edited

uablrek commented Apr 18, 2024

uablrek commented Apr 18, 2024

uablrek commented Apr 18, 2024

uablrek commented Apr 18, 2024

uablrek commented Apr 18, 2024 • edited

haircommander commented Apr 19, 2024

uablrek commented Apr 19, 2024

pycgo commented Apr 19, 2024

uablrek commented Apr 19, 2024

uablrek commented Apr 19, 2024

pycgo commented Apr 19, 2024

uablrek commented Apr 19, 2024

pycgo commented Apr 19, 2024

uablrek commented Apr 19, 2024

uablrek commented Apr 23, 2024

lbogdan commented Apr 25, 2024 • edited

uablrek commented Apr 25, 2024

uablrek commented Apr 28, 2024

saschagrunert commented Apr 29, 2024

uablrek commented Apr 29, 2024

aojea commented Apr 29, 2024

uablrek commented Apr 29, 2024

saschagrunert commented Apr 18, 2024 •

edited

uablrek commented Apr 18, 2024 •

edited

lbogdan commented Apr 25, 2024 •

edited