New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to modify the default prometheus-to-sd resource limits? #327
Comments
/cc @loburm |
Hi Artem, This container is used by GKE engineers to collect operational metrics from system components and shouldn't influence user workloads. Previously we had a bug in prometheus-to-sd that caused memory leak and high CPU usage, so it was made decision to minimize potential impact by setting those limits. I'm really sorry for the alerts that this component is causing in your cluster. Our team is working on completely removing it and replacing by Open Telemetry agent. |
Hi @loburm , |
We're seeing similar issues on our clusters in GKE. the |
The only way I found to limit the resources is to set a apiVersion: v1
kind: LimitRange
metadata:
name: kube-system-resource-limits
namespace: kube-system
spec:
limits:
- default:
memory: 200Mi
defaultRequest:
memory: 20Mi
type: Container Every other change affecting the pod seems to be reverted by GKE. I've put this in place because of a memory leak with I hope this helps someone. |
The memory leak seems to have gone away after the |
@jamesproud I've been experiencing some issues at the end of June that went away after a while after Google pushed an update to the event-exporter-gke pod on the 29th of June, this is how the disruption looked like: It all looks good now since the last revision: My cluster is on the stable release channel on version Not sure if this helps. |
Hi, this daemonset is causing a lot of issues when running virtual nodes using virtual-kubelet. Please change the tolerance levels or can you provide a way for me to patch it? I have a virtual node that runs pods in Azure Container instances, so it does not make sense to have this running on the virtual node at all. |
It is perfectly fine to set low defaults for the resources, but you really should let us update them to higher levels. The other option is to start setting ignores on the alerts, which is not something we would like to do. |
Hi,
Several of my GKE clusters experience constant CPUThrottlingHigh alerts, coming from prometheus-to-sd pods.
This deployment has incredibly low CPU requests/limits by default:
When I try to edit the prometheus-to-sd daemonset and increase these values, they get reverted to defaults.
Questions:
Is it possible to modify these default values in any way?
Why are they so low? Such low values are very likely to cause CPU throttling and increase the monitoring noise from GKE clusters.
Similar report by a different user: https://stackoverflow.com/questions/58182345/cpu-throttling-on-default-gke-pods
My GKE cluster version is v1.15.11-gke.5
The text was updated successfully, but these errors were encountered: