-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cadvisor metrics not present when using newer AKS Node image versions #3216
Comments
Action required from @Azure/aks-pm |
Issue needing attention of @Azure/aks-leads |
6 similar comments
Issue needing attention of @Azure/aks-leads |
Issue needing attention of @Azure/aks-leads |
Issue needing attention of @Azure/aks-leads |
Issue needing attention of @Azure/aks-leads |
Issue needing attention of @Azure/aks-leads |
Issue needing attention of @Azure/aks-leads |
AKS v1.25.4 got same issue - no CPU, memory metrics available in Prometheus :( |
Issue needing attention of @Azure/aks-leads |
1 similar comment
Issue needing attention of @Azure/aks-leads |
This isssue can be closed I've found where the issue lies (in our terraform that defines the node we set a kubelet_config . For every node pool where this has been set we are missing the CFS quota metrics. If I log into one of the nodes and check the kubelet config, I can see the value as false cat ./etc/default/kubeletconfig.json | grep CFS
"cpuCFSQuota": false,
cat ./etc/default/kubeletconfig.json | grep -i log
"containerLogMaxSize": "100M", <-- we are only changing this value but cpuCFSQuota is false, hence no CFS stats For where we are not setting a kubelet_config there is no kubeletconfig.json file so accepts the defaults as per the Azure configuration - https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration meaning CFS quota is true and you get the stats. |
Since upgrading to newer AKS Node image versions the cadvisor metrics have stopped reporting on container_cpu_cfs:
metrics no longer reporting:
If I log onto a container running on a newer worker node image I can see the cpu.cfs_quota_us set to -1
If I log onto a container using an older worker node image running the same deployment and container spec the cpu metrics are visible:
The text was updated successfully, but these errors were encountered: