-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet CPU usage linearly increases #750
Comments
Thanks for the report. We are planning to use the Cluster Autoscaler so this issue means additional awareness when the clusters does not seem to cool down automatically after a typical amount of time. |
It is through kubernetes wizardry. You can make a DaemonSet which executes shell commands as a host process with full host access. You can use it to patch systemd and leave a flag for the next time the DaemonSet runs to not need to update. Terrible hack but it works as a solution until Microsoft patches the standard VM image. |
@marekr Could you elaborate on what kind of work around you're referring to? We currently have a daemonset cleanup cgroups intermittently but that of course isn't a (good) fix. We checked for other options like fixing the host image/systemd used but;
Executing commands on the host from a container is ill advised so we did not think it through seriously but we're not sure how to do that short of exposing the crontab to the container and adding a line there... We're open to work around options if they can be implemented from a k8s deployment perspective. |
@sg3s Sorry for the delayed response - we're looking at this to see if we can mitigate it with patching systemd. I have your internal support ticket and have communicated this to engineering. |
@jnoller is there any update on this? As we're having lots of cronjobs this is kind of a showstopper.. |
One thing we changed recently to help job leaking is we reduced the setting for garbage collection (see the changelog) which should help with large number of jobs. I’m also digging in and seeing where the bug is / available mitigations |
This is currently happening to us and crashing our cluster weekly. I would really appreciate that DaemonSet workaround if you could provide a complete implementation. A fix on the AKS side would be even better! |
@reaperes's systemd cgroup cleanup code from kubernetes/kubernetes#64137 seemed the cleanest and most surgical of all the workarounds that I've found documented for this and the related issues, so I've converted it into a DaemonSet that runs the fix hourly on every node in a cluster. You could set any interval that you like, of course, but the script isn't very resource intensive and hourly seemed reasonable. It actually takes about a day or so for the CPU loading to become noticeable in my cluster and a week or so for it to crash a node. I've been running this for a few days now in my staging cluster and it appears to keep the CPU loading under control. |
Action required from @Azure/aks-pm |
Action required from @Azure/aks-pm |
Issue needing attention of @Azure/aks-leads |
2 similar comments
Issue needing attention of @Azure/aks-leads |
Issue needing attention of @Azure/aks-leads |
@Azure/aks-pm issue needs labels |
1 similar comment
@Azure/aks-pm issue needs labels |
stale-close |
We were asked to open an issue on the public tracker following our interactions with support engineers (118113025003216) & slack discussion.
What happened:
We've been experiencing degraded performance & mainly CPU creep over longer periods of time on clusters rolled out several months ago.
The CPU creep causes our cluster workers to become unresponsive/NotReady as the CPU load approaches 100%.
The upstream issue is the following
kubernetes/kubernetes#64137
We've been having this issue on all clusters since at least mid-October. We've been trying to manage our production and test cluster loads, since these are actively used/have more significant workloads they reached 100% CPU load on multiple nodes. But that also makes the monitoring graphs very hard to read.
The clearest example is our staging cluster, but it never reached 100% CPU. See graph below for the past 3 months. We scaled this cluster up/changed the node before we knew what caused the issue but it was obviously also present on the new node.
We rarely make large changes or experience high load on this cluster, so the line should be pretty horizontal, and not going upward.
What you expected to happen:
We expect CPU to not increase significantly as long as conditions/configuration does not change.
How to reproduce it (as minimally and precisely as possible):
Reproduction steps are noted in the upstream issue.
From what I understand this occurs when you:
The problem is then exacerbated when you use (Cron)Jobs in k8s because you're constantly scheduling pods.
Anything else we need to know?:
This should already be a known issue being worked on by other vendors responsible for the mentioned components. This issue exists because within AKS it is not possible to customize / patch said components as a customer.
Environment:
kubectl version
):Production: 2-3
Staging: 1
Test: 1-2
Workloads roughly consist of the following:
No service meshing other other exciting components what would make this a special setup.
Thanks to our monitoring partner we were able to confirm the upstream issue with the following instructions. I have not seen this explicitly noted in a ticket so I thought I should include these here for others trying to confirm the issue.
On some version combinations of systemd+kernel versions (see issue), the kubelet leaks system slices and slows down to a crawl. This can be confirmed by inspecting the contents of the
metrics/cadvisor
endpoint (full path if read-only port is enabled ishttp://localhost:10255/metrics/cadvisor
).A "normal" payload is under a MB (or a couple of MB).
These leaked slices have an empty container_name label, you can confirm it's this by comparing the total number of lines and the number of lines with an empty label
Our cadvisor metrics were 15+MB after several days of running hosts uninterrupted.
The text was updated successfully, but these errors were encountered: