kubelite process takes huge amount of CPU #2186

muxi · 2021-04-20T18:39:01Z

After getting the 1.21 package from Debian 10 repo, the kubelite process is eating up a lot of CPU. The process keep using >100% CPU in top and the average uptime load is ~2 instead of <1 in previous versions.

inspection-report-20210420_092101.tar.gz

The text was updated successfully, but these errors were encountered:

navodissa · 2021-04-20T18:59:16Z

I'm facing the same issue and I'm using Ubuntu 18.04.

balchua · 2021-04-20T22:28:52Z

@muxi from the tarball you provided, i can see that kubelite is not healthy, leading it to crashloop. That's why it is using up that much cpu. But i couldn't determine what is causing it to crashloop.

muxi · 2021-04-20T22:47:24Z

@balchua thanks for looking into it. For now I am just reverting to v1.20.5 which works for me. If anything else I can provide to help with debugging this problem let me know.

balchua · 2021-04-21T03:07:41Z

When you install 1.20.5 did you purge the snap?
For example: sudo snap remove microk8s --purge
Just curious.

muxi · 2021-04-21T20:31:25Z

When you install 1.20.5 did you purge the snap?

Did you mean 1.20.5 or 1.21? The install of 1.21 was (surprisingly) automatically done by snap. The rollback to 1.20.5 was done by snap revert. No purge was ever run.

balchua · 2021-04-21T23:53:55Z

Hi @muxi i can see that you are using the channel latest/stable. It seems like you have a long lasting cluster. Its highly recommended to stick to a channel example 1.20/stable or 1.21/stable to avoid unexpected incompatibility coming from different versions of kubernetes.
I was wondering if you can get the chance for a clean install of MicroK8s. Like remove with purge then install with 1.21/stable channel.
Thanks.

Aaron-Ritter · 2021-05-05T07:24:22Z

i face the same after upgrading to 1.21/edge especially the CPU usage is concerning with averaging around 10% in test with much less pods then our productin system (1.19/stable) kube-apiserver with averaging on 2% cpu load. Memory consumption is around 20% higher too.

k8s-test-n2   Ready    <none>   112d   v1.21.0-3+dc123ff2da727a

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 2462 root      20   0 3304.4m 939.7m  99.1m S   9.3   9.6  53:25.36 kubelite

k8s-test-n1   Ready    <none>   217d   v1.21.0-3+dc123ff2da727a

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
21136 root      20   0 3311.2m 908.5m  99.1m S   8.3   9.3  51:36.99 kubelite

k8s-test-m    Ready    <none>   214d   v1.21.0-3+dc123ff2da727a

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
31545 root      20   0 3444.9m   1.0g 101.5m S  10.6  10.5  94:01.05 kubelite

balchua · 2021-05-05T07:51:41Z

Its hard to compare kubelite with the kube-apiserver alone. Kubelite packs in all the kubernetes components into 1 binary.
Kube-controller-manager, scheduler, proxy, apiserver, kubelet and dqlite.
Running the kubernetes as go routines vs each as standalone process do reduce the overall resource usage cpu and memory.

@ktsakalozos thoughts?

ktsakalozos · 2021-05-05T10:12:28Z

@balchua is right, in terms of CPU load it is the sum of all k8s services and the datastore (dqlite). In terms of memory usage it should be about 200MB less than the total memory used by a setup where each service starts by its own.

Aaron-Ritter · 2021-05-12T10:47:24Z

@balchua @ktsakalozos having a look for the last 7 days on to the avg and max CPU consumption of our test setup it actually looks like that overall the CPU consumption slightly dropped. Its at first without the explenation from @balchua surprising to see kubelite in top poping up, but overall there seems to be no real visible increase and maybe even a slight drop. Once 1.21.1 is released we will upgrade it in our prod environment.

krichter722 · 2021-05-14T10:56:40Z

I'm also seeing the CPU usage which I understand is a crash loop. It prevents me from starting microk8s after rebooting and stopping it.

The issue is probably that in snap stable/1.21 has revision 2128 and stable/1.20 has 2143 making the higher version number pointing to an older revision. Maybe that causes an (partial/incomplete) incompatible downgrade.

inspection-report-20210514_123833.tar.gz

balchua · 2021-05-14T11:37:25Z

@krichter722 yes you are right, kubelite is crashlooping with this error.

Mai 14 12:38:23 mereet.com microk8s.daemon-kubelite[181888]: Error: start node: raft_start(): io: load closed segment 0000000021384298-0000000021384473: entries batch 176 starting at byte 5799224: entries count in preamble is zero

Is this a new setup or its an upgrade from previous version?

Maybe @ktsakalozos or @MathieuBordere can shed more light on this one.

krichter722 · 2021-05-14T12:15:50Z

The issue occurred after upgrading to 1.21 with a long running instance running 1.20 which might have been upgraded before. A fresh install of 1.21 as well and 1.20 works (smoke test microk8s.status and kubectl get pods --all-namespaces which wasn't possible with crashlooping kubelite before) as well as an upgrade from 1.20 to 1.21 (same smoke test).

So, the issue is "resolved" for me by purging the microk8s snap installation and reinstalling, however light should be shed on the revision numbers as well as issues with an upgrade of an installation which is not freshly installed from 1.20 to 1.21.

balchua · 2021-05-14T12:57:13Z

Thank you @krichter822 for the information. The revision number is normal. I think the v 1.20.6 came out after the 1.21/stable was cut.
There is an upgrade test in the CI, ranging from 1.17 (i think) up to the latest.

tsipo · 2021-06-03T22:14:50Z

I am facing similar - and worse - issue. Not only the CPU consumption of kubelite is between 70%-130%, it takes microk8s minutes to get to start (after which it stabilizes on ~70% CPU), AND kubelite consumes 22-23GB which is >70% of my machine.
inspection-report-20210603_173003.tar.gz

balchua · 2021-06-06T00:50:51Z

@tsipo thank you for providing the inspect tarball.
IMHO, the high CPU is caused by kubelite crashing. I saw these logs.

Jun 03 16:49:48 rreshef-linux microk8s.daemon-kubelite[9911]: E0603 16:49:48.005191    9911 leaderelection.go:325] error retrieving resource lock kube-system/kube-controller-manager: Get "https://127.0.0.1:16443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=5s": context deadline exceeded
Jun 03 16:49:48 rreshef-linux microk8s.daemon-kubelite[9911]: I0603 16:49:48.005245    9911 leaderelection.go:278] failed to renew lease kube-system/kube-controller-manager: timed out waiting for the condition
Jun 03 16:49:48 rreshef-linux microk8s.daemon-kubelite[9911]: I0603 16:49:48.905442    9911 event.go:291] "Event occurred" object="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="rreshef-linux_653179cd-8cc7-40dc-8b98-db77e0586259 stopped leading"
Jun 03 16:49:49 rreshef-linux microk8s.daemon-kubelite[9911]: I0603 16:49:49.266364    9911 garbagecollector.go:160] Shutting down garbage collector controller
Jun 03 16:49:49 rreshef-linux microk8s.daemon-kubelite[9911]: I0603 16:49:49.366886    9911 leaderelection.go:278] failed to renew lease kube-system/kube-scheduler: timed out waiting for the condition
Jun 03 16:49:49 rreshef-linux microk8s.daemon-kubelite[9911]: F0603 16:49:49.366929    9911 server.go:205] leaderelection lost

Can you try adding the following

--leader-elect-lease-duration=60s
--leader-elect-renew-deadline=40s

to these files /var/snap/microk8s/current/args/kube-controller-manager and /var/snap/microk8s/current/args/kube-scheduler and then restart MicroK8s.
I am not sure if this will resolve the issue.

tsipo · 2021-06-06T04:39:58Z

@balchua Thanks for your prompt reply, it's unfortunately too late. I have already purged the previous installation of microk8s and installed the same version (I'm on latest/edge channel) anew. Now no problems - CPU consumption is reasonable, same is memory consumption (down to a few hundreds of MBs). So the issue was surely with the upgrade path from previous version to current on that channel.
BTW my biggest problem was not the CPU consumption, though 70%-130% is a bit high (but I have 8 cores). It's the memory consumption that killed me - it got up to 22-23GB, and that's my dev machine that runs more apps (and has 32GB in total).

balchua · 2021-06-06T05:40:01Z

Hi @tsipo thanks for giving us an update. If ever you find something strange feel free to create an issue.

vdavy · 2021-06-19T06:50:10Z

Hi, I got the same problem with microk8s 1.21 running on debian 11 testing : kubelite eats all my CPU
Here is the report : inspection-report-20210618_210554.tar.gz

When it autoupgraded to 1.21, I had to upgrade to debian 11 otherwise it wouldn't start. Kernel version now is 5.10.0-7-amd64 #1 SMP Debian 5.10.40-1 (2021-05-28) x86_64 GNU/Linux
The node is very slow to start and pretty unusable, so I gonna revert to 1.20 branch, waiting for a fix.

I'm running only one node and already tried to purge the cluster as mentioned above (pain in the ... to reinstall everything and what a pitty, didn't solve the problem).

Please note I'm open to try and test fixes.

thenets · 2021-07-05T21:55:34Z

Same problem here. I'm using a fresh Ubuntu 20.04 install.

This is my version (snap):

Client Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.1-3+ba118484dd39df", GitCommit:"ba118484dd39df570e55e47f082e523cda7583e5", GitTreeState:"clean", BuildDate:"2021-06-11T05:09:28Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.1-3+ba118484dd39df", GitCommit:"ba118484dd39df570e55e47f082e523cda7583e5", GitTreeState:"clean", BuildDate:"2021-06-11T05:06:35Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

krichter722 · 2021-07-06T07:12:56Z

I experienced rather high CPU consumption with a DNS forward loop as well on 1.21 after the crash loop mentioned above was no longer an issue for me. I set up calico and a IPv4+6 dual stack, but I don't see why this might not happen with other setups as well, e.g. if your ISP provides strange DNS settings.

Therefore I needed to add --resolv-conf=/run/systemd/resolve/resolv.conf to /var/snap/microk8s/current/args/kubelet and setup the coredns configmap accordingling, see https://coredns.io/plugins/loop/ and https://microk8s.io/docs/addon-dns for a detailed explanation.

For me the issue was a huge amount of probe failures which also contributed to the high CPU usage which then probably lead to even more probe failures, but this issue might be worth taking a look at when investigating high CPU usage.

AmilaDevops · 2021-07-28T10:05:57Z

hi, is any one knows after upgrading to my k8s cluster into v1.21 was this kubelite process issue have to deal with the kernel versions I'm having in my Ubuntu O/S ? Because i'm getting this cpu 100% issue only in a one node of my microk8s cluster (which has a higher kernel version when comparing with other nodes). All other nodes are fine.

tnx

YpeKingma · 2021-11-30T12:42:47Z

Just installed microk8s 1.22 stable on ubuntu via snap, and top reports 10-30% cpu for kubelite on an i3, 4 core, 2GHz.
It's tolerable, but that much should not be needed to keep a single node available.

Would it make sense to try the above suggestion:
--leader-elect-lease-duration=60s
--leader-elect-renew-deadline=40s
?

YpeKingma · 2021-11-30T14:22:39Z

For the record, both files referred to above, kube-controller-manager and kube-scheduler in directory /var/snap/microk8s/current/args,
have these lines:
--leader-elect-lease-duration=60s
--leader-elect-renew-deadline=30s

bttger · 2022-03-31T17:08:07Z

I also came across this issue because I was wondering about relatively high CPU usage from the Kubelite process. I have about 20-30% load on idle after a fresh install. (dns, storage, ingress enabled)

AlexsJones · 2022-04-02T10:17:38Z

@YpeKingma the resource usage will be related to the intensity of workloads running in Kubernetes.
Are you saying this is running nothing or are there workloads scheduled?

YpeKingma · 2022-04-02T12:51:17Z

@AlexsJones There were no workloads scheduled.

ktsakalozos · 2022-04-02T13:49:39Z

@YpeKingma it would also help us know the hardware specs and the version of kubernetes you run. Would you be able to attach a microk8s inspect tarball? Thank you.

castellanoj · 2022-04-11T11:12:11Z

duplicate #3026

barrettj12 · 2022-04-28T23:52:02Z

I also came across this issue because I was wondering about relatively high CPU usage from the Kubelite process. I have about 20-30% load on idle after a fresh install. (dns, storage, ingress enabled)

I'm also experiencing this. I have microk8s running but with literally nothing to do (idle), so it's CPU/memory usage should be extremely minimal. Nonetheless, I find it consistently using 20% of my CPU.

As a temporary fix, you can stop microk8s and restart it when you actually need it. It goes without saying that this should not be necessary for a user to do.

barrettj12 · 2022-04-29T01:11:38Z

Looking at microk8s kubectl top pod -A, it seems that calico-node is accounting for most of the CPU/memory usage.

bttger · 2022-04-29T15:04:44Z

Looking at microk8s kubectl top pod -A, it seems that calico-node is accounting for most of the CPU/memory usage.

Have you run top on the host machine? I also see that calico-node is using the most regarding running pods, but overall the usage of the kubelite process is much higher (ranging between 3-10x higher than calico)

barrettj12 · 2022-05-03T04:33:20Z

Looking at microk8s kubectl top pod -A, it seems that calico-node is accounting for most of the CPU/memory usage.

Have you run top on the host machine? I also see that calico-node is using the most regarding running pods, but overall the usage of the kubelite process is much higher (ranging between 3-10x higher than calico)

So kubelite is excluded from the processes listed in microk8s kubectl top pod -A? Was not aware, you are probably right.

alexmarshallces · 2022-10-14T21:03:52Z

I also came across this issue because I was wondering about relatively high CPU usage from the Kubelite process. I have about 20-30% load on idle after a fresh install. (dns, storage, ingress enabled)

I'm also experiencing this. I have microk8s running but with literally nothing to do (idle), so it's CPU/memory usage should be extremely minimal. Nonetheless, I find it consistently using 20% of my CPU.

As a temporary fix, you can stop microk8s and restart it when you actually need it. It goes without saying that this should not be necessary for a user to do.

Adding to this, I'm also seeing similar metrics: 20-40% CPU usage with nothing actually running on the cluster, calico taking up the majority of the processing being done with similar items enabled: dns, storage, ingress. Is there any progress on this ? Are there any other discussion threads where the performance issue is discussed and, ideally, resolved ?

Atem18 · 2022-11-11T16:50:38Z

Same here with Ubuntu 22.04 on Raspberrypi 4 model B 4GB with three nodes HA enabled.

jglick · 2022-11-11T17:25:53Z

For purposes of local development and testing I have switched from Microk8s to Kind, for this reason among others.

Atem18 · 2022-11-11T19:23:01Z

Yes but what about people wanting to use it in production ?

AlexsJones · 2022-11-12T12:01:03Z

Same here with Ubuntu 22.04 on Raspberrypi 4 model B 4GB with three nodes HA enabled.

HA enabled, you mean a control-plane on two nodes and third as worker or all three running as the control-plane?

Atem18 · 2022-11-12T12:22:36Z

Same here with Ubuntu 22.04 on Raspberrypi 4 model B 4GB with three nodes HA enabled.

HA enabled, you mean a control-plane on two nodes and third as worker or all three running as the control-plane?

Hi, I used the following tutorial : https://ubuntu.com/tutorials/getting-started-with-kubernetes-ha?&_ga=2.187560111.665053589.1668255625-380658086.1668255625#1-overview

So I think the control plane is running on all the three nodes

mikezerosix · 2022-12-05T19:07:03Z

I have same problem, kubelite is running near 100% CPU. Ok, I was cheap and ran master Rasbi 3. I just set it up running version 1.23 as 1.25 did not work with it claiming cgroups were in enabled. There are no containers deployed and it can barely cope with 1 node joining.

kubelite at 75% and next ones are containerd 15+10% , sometime dblite jumps up.
BTW can I kill disable contianerd on master ? Isn't that unnecessary on master ?

neoaggelos · 2022-12-17T10:50:28Z

Hi @mikezerosix

BTW can I kill disable contianerd on master ? Isn't that unnecessary on master ?

Control plane nodes are also registered as workers by default, a work-around to prevent workloads from being scheduled there are to drain and taint the control plane nodes with:

microk8s kubectl drain $node
microk8s kubectl taint node $node key1=value1:NoSchedule

Also, can you share an inspection report so that we can have a look at it?

augusto · 2023-06-14T17:04:26Z

I have the same issue. Installed recently on Ubuntu 22.04. Running on a VM with 6 cores and 32gb of ram.
I'm running microk8s from snap 1.27/stable. I have disabled ha and enabled dns; no pods running.

The 2 main processes eating cpu are kubelite and etcd and they use roughtly 20% of it.

Any idea how to resolve this?

ktsakalozos · 2023-06-15T05:21:29Z

Hi @augusto Kubernetes services (API server, proxy, kubelet, scheduler, controller manager) will always produce in idle. For example the K8s services (all under kubelite) will constantly query the state of the cluster so as to figure out if there is work for them to do. Depending on the hardware you are running MicroK8s on 20% may be expected. Could you share a microk8s inspect tarball so that we check if this is the case or if there is a problem with the cluster?

fybmain · 2023-07-21T20:04:10Z

@ktsakalozos How frequent will the polling be? May the user adjust the frequency of polling?
I assigned all six P-cores of a Core-i7 12650H to my VM running microk8s. kubelite consumes about 10% of CPU on average.

masterkain · 2023-07-22T12:40:25Z

I see the same thing on microk8s 1.27 on ubuntu 23

/snap/microk8s/5372/kubelite --scheduler-args-file=/var/snap/microk8s/5372/args/kube-scheduler --controller-manage... => 10% cpu average
/snap/microk8s/5372/bin/k8s-dqlite --storage-dir=/var/snap/microk8s/5372/var/kubernetes/backe... => 7% cpu average

this is on a i5-12600KF

must say that using ~20% cpu 24/7 is not really ideal

ktsakalozos · 2023-07-24T12:05:05Z

@fybmain @masterkain I am not aware of any way to tune the frequency k8s services (many) check for state changes. But it seems a reasonable ask. Maybe upstream kubernetes has an answer. For sure if there is a way to tune this, MicroK8s you can do it in MicroK8s.

The percentages we are talking about here or on one core, right? What tool do you use to measure them?

masterkain · 2023-07-24T12:19:34Z

@ktsakalozos thanks for the reply -- nothing fancy, just a quick glance at top so the 20% is on all cores, this is on my baremetal ubuntu server homelab with just microk8s on it - in an effort to reduce wattage/billing costs I started to review some stuff and just happened to stumble upon this kind of usage

augusto · 2023-07-30T16:09:02Z

Sorry for the late reply @ktsakalozos ! I've torn down the microk8s vm I had. Not sure if this is of any use, but I installed K3s and out of the box the CPU utilisation was ~10% (compared to ~20% in microK8s after removing HA). Somehow K3s manages to use 50% less CPU than microk8s.

luispabon · 2023-12-20T11:30:46Z

Here's a comparison of k3s vs microk8s (HA off) on identical VMs on the same host with a similar amount of pods running:

Microk8s is a new install on a pristine system and k3s has an uptime of over 2 months

BxL221 · 2024-04-28T09:51:59Z

When you install 1.20.5 did you purge the snap? For example: sudo snap remove microk8s --purge Just curious.

Great, thanks, I had the same problem with ubuntu server 2204 LTS , and purge solved it.

Saw that microk8s used more than 500% cpu on htop.

MathieuBordere mentioned this issue May 21, 2021

Node fails to join as stand-by canonical/raft#192

Closed

bc185174 mentioned this issue Jan 4, 2022

Leader election and dqlite errors when recovering nodes in HA cluster #2819

Closed

kubelite process takes huge amount of CPU #2186

kubelite process takes huge amount of CPU #2186

Comments

muxi commented Apr 20, 2021

navodissa commented Apr 20, 2021

balchua commented Apr 20, 2021 • edited Loading

muxi commented Apr 20, 2021

balchua commented Apr 21, 2021

muxi commented Apr 21, 2021

balchua commented Apr 21, 2021

Aaron-Ritter commented May 5, 2021

balchua commented May 5, 2021

ktsakalozos commented May 5, 2021

Aaron-Ritter commented May 12, 2021

krichter722 commented May 14, 2021 • edited Loading

balchua commented May 14, 2021

krichter722 commented May 14, 2021 • edited Loading

balchua commented May 14, 2021 • edited Loading

tsipo commented Jun 3, 2021

balchua commented Jun 6, 2021

tsipo commented Jun 6, 2021

balchua commented Jun 6, 2021

vdavy commented Jun 19, 2021 • edited Loading

thenets commented Jul 5, 2021

krichter722 commented Jul 6, 2021

AmilaDevops commented Jul 28, 2021 • edited Loading

YpeKingma commented Nov 30, 2021

YpeKingma commented Nov 30, 2021

bttger commented Mar 31, 2022

AlexsJones commented Apr 2, 2022

YpeKingma commented Apr 2, 2022

ktsakalozos commented Apr 2, 2022

castellanoj commented Apr 11, 2022

barrettj12 commented Apr 28, 2022

barrettj12 commented Apr 29, 2022

bttger commented Apr 29, 2022

barrettj12 commented May 3, 2022

alexmarshallces commented Oct 14, 2022

Atem18 commented Nov 11, 2022

jglick commented Nov 11, 2022

Atem18 commented Nov 11, 2022

AlexsJones commented Nov 12, 2022 • edited Loading

Atem18 commented Nov 12, 2022

mikezerosix commented Dec 5, 2022

neoaggelos commented Dec 17, 2022 • edited Loading

augusto commented Jun 14, 2023

ktsakalozos commented Jun 15, 2023

fybmain commented Jul 21, 2023

masterkain commented Jul 22, 2023 • edited Loading

ktsakalozos commented Jul 24, 2023

masterkain commented Jul 24, 2023

augusto commented Jul 30, 2023

luispabon commented Dec 20, 2023 • edited Loading

BxL221 commented Apr 28, 2024

balchua commented Apr 20, 2021 •

edited

Loading

krichter722 commented May 14, 2021 •

edited

Loading

krichter722 commented May 14, 2021 •

edited

Loading

balchua commented May 14, 2021 •

edited

Loading

vdavy commented Jun 19, 2021 •

edited

Loading

AmilaDevops commented Jul 28, 2021 •

edited

Loading

AlexsJones commented Nov 12, 2022 •

edited

Loading

neoaggelos commented Dec 17, 2022 •

edited

Loading

masterkain commented Jul 22, 2023 •

edited

Loading

luispabon commented Dec 20, 2023 •

edited

Loading