Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu and memory cgroup hierarchy not unified #2632

Closed
chrstnwhlrt opened this issue Jul 19, 2019 · 8 comments
Closed

cpu and memory cgroup hierarchy not unified #2632

chrstnwhlrt opened this issue Jul 19, 2019 · 8 comments

Comments

@chrstnwhlrt
Copy link

Description

Trying to setup a fresh ubuntu 18.04 (minimal) based kubernetes cluster using cri-o leads to

server.go:273] failed to run Kubelet: failed to get the kubelet's cgroup: cpu and memory cgroup hierarchy not unified. cpu: /system.slice, memory: /

Steps to reproduce the issue:

  1. Install Ubuntu 18.04
  2. Install kubernetes & projectatomic ppa
  3. Install kubelet kubeadm kubectl cri-o-1.14
  4. Setup /etc/default/kubelet as described in https://github.com/cri-o/cri-o/blob/master/tutorials/kubeadm.md
  5. systemctl enable crio
  6. Reboot
  7. kubeadm init --pod-network-cidr=192.168.0.0/16

Describe the results you received:

kubeadmin

[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
	- 'docker ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

journalctl -xeu kubelet

-- The start-up result is RESULT.
Jul 19 12:40:52 master0 kubelet[1797]: Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 19 12:40:52 master0 kubelet[1797]: Flag --feature-gates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 19 12:40:52 master0 kubelet[1797]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 19 12:40:52 master0 kubelet[1797]: Flag --runtime-request-timeout has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 19 12:40:52 master0 kubelet[1797]: Flag --resolv-conf has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 19 12:40:52 master0 kubelet[1797]: Flag --feature-gates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 19 12:40:52 master0 kubelet[1797]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 19 12:40:52 master0 kubelet[1797]: Flag --runtime-request-timeout has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jul 19 12:40:52 master0 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jul 19 12:40:52 master0 kubelet[1797]: I0719 12:40:52.753585    1797 server.go:425] Version: v1.15.1
Jul 19 12:40:52 master0 kubelet[1797]: I0719 12:40:52.753729    1797 plugins.go:103] No cloud provider specified.
Jul 19 12:40:52 master0 kubelet[1797]: I0719 12:40:52.753741    1797 server.go:791] Client rotation is on, will bootstrap in background
Jul 19 12:40:52 master0 kubelet[1797]: I0719 12:40:52.754932    1797 certificate_store.go:129] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Jul 19 12:40:52 master0 kubelet[1797]: F0719 12:40:52.755353    1797 server.go:273] failed to run Kubelet: failed to get the kubelet's cgroup: cpu and memory cgroup hierarchy not unified.  cpu: /system.slice, memory: /
Jul 19 12:40:52 master0 systemd[1]: kubelet.service: Failed with result 'exit-code'.

Describe the results you expected:

Successful initialized kubernetes master

Additional information you deem important (e.g. issue happens only occasionally):

Output of crio --version:

crio version 1.14.7

Output of dpkg -l|grep -i kube:

ii  cri-o-1.14                            1.14.7-3~ubuntu18.04~ppa1         amd64        OCI-based implementation of Kubernetes Container Runtime Interface.
ii  kubeadm                               1.15.1-00                         amd64        Kubernetes Cluster Bootstrapping Tool
ii  kubectl                               1.15.1-00                         amd64        Kubernetes Command Line Tool
ii  kubelet                               1.15.1-00                         amd64        Kubernetes Node Agent
ii  kubernetes-cni                        0.7.5-00                          amd64        Kubernetes CNI

Additional environment details (AWS, VirtualBox, physical, etc.):

On prem dedicated root server

@saschagrunert
Copy link
Member

saschagrunert commented Jul 19, 2019

Hi @christian667, this is an issue with the latest kubelet rather than CRI-O, can you try to apply this patch which unfortunately did not make it into v1.15.1: kubernetes/kubernetes#80121

@chrstnwhlrt
Copy link
Author

@saschagrunert Thanks for the quick response - will this be part of the next minor kubelet release? When will this be merged into the stable packages? Should I use cri-o 1.13 (seems to work) for deployment and upgrade to 1.15 later (with the kubelet fix available), wait for the fix to be public available (is this a matter of days?) or better build kubelet all by myself?

@mariusgrigoriu
Copy link
Contributor

You should be able to get around this with the --runtime-cgroups and --kubelet-cgroups kubelet flags.

@chrstnwhlrt
Copy link
Author

@mariusgrigoriu Which arguments should I provide with --runtime-cgroups and --kubelet-cgroups ? /systemd/system.slice ?

@mariusgrigoriu
Copy link
Contributor

Depends on your cgroup layout. Looks like you might be putting kubelet into /system.slice

Our cgroups looks like this in systemd-cgls:

Control group /:
-.slice
├─kubeletreserved.slice
│ ├─kubeletreserved-runtime.slice
│ │ └─crio.service
│ │   ├─ 1006 /bin/bash /var/run/torcx/unpack/crio/bin/crio-wrapper
│ │   ├─ 1007 /var/run/torcx/unpack/crio/bin/crio --log-level=info --enable-metrics
│ │   └─32530 sleep 20
│ └─kubeletreserved-kubelet.slice
│   └─1050 /kubelet --allow-privileged --anonymous-auth=false --read-only-port=0 --cgroup-driver=cgroupfs 
├─init.scope
│ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 15
└─system.slice
  ├─sshd.service

And our flags are like this:

        --kube-reserved-cgroup=kubeletreserved.slice \
        --runtime-cgroups=/kubeletreserved.slice/kubeletreserved-runtime.slice \
        --kubelet-cgroups=/kubeletreserved.slice/kubeletreserved-kubelet.slice \

I'm not saying this is correct, but it is based on the k8s upstream recommendations and seems to work for us.

@chrstnwhlrt
Copy link
Author

This worked for me just fine:

        --kube-reserved-cgroup=kubeletreserved.slice \
        --runtime-cgroups=/kubeletreserved.slice/kubeletreserved-runtime.slice \
        --kubelet-cgroups=/kubeletreserved.slice/kubeletreserved-kubelet.slice \

Should I close this issue?

@saschagrunert
Copy link
Member

If there are no further issues right now yes. I will follow up with the fix in the next patch release of Kubernetes. Thanks 🙏

@chrstnwhlrt
Copy link
Author

Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants