Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cadvisor metrics not present when using newer AKS Node image versions #3216

Closed
davidg-jellyfish opened this issue Sep 16, 2022 · 12 comments
Closed
Labels
action-required bug Needs Attention 👋 Issues needs attention/assignee/owner

Comments

@davidg-jellyfish
Copy link

davidg-jellyfish commented Sep 16, 2022

Since upgrading to newer AKS Node image versions the cadvisor metrics have stopped reporting on container_cpu_cfs:

metrics no longer reporting:

container_cpu_cfs_periods_total
container_cpu_cfs_throttled_periods_total
container_cpu_cfs_throttled_seconds_total

If I log onto a container running on a newer worker node image I can see the cpu.cfs_quota_us set to -1

cat  /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
-1

If I log onto a container using an older worker node image running the same deployment and container spec the cpu metrics are visible:

cat  /sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us
50000 
AKS version 1.22
Node images:
AKSUbuntu-1804gen2containerd-2022.08.10 <-- containers running on these nodes have an issue
AKSUbuntu-1804gen2containerd-2022.07.04 <-- containers running on these nodes appear to report the container cfs cpu cadvisor metrics correctly

cadvisor_version_info{dockerVersion="Unknown", endpoint="https-metrics", job="kubelet", kernelVersion="5.4.0-1090-azure", metrics_path="/metrics/cadvisor", namespace="kube-system", osVersion="Ubuntu 18.04.6 LTS"}

cadvisor_version_info{dockerVersion="Unknown", endpoint="https-metrics", job="kubelet", kernelVersion="5.4.0-1089-azure", metrics_path="/metrics/cadvisor", namespace="kube-system", osVersion="Ubuntu 18.04.6 LTS"}
@ghost
Copy link

ghost commented Oct 20, 2022

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Oct 20, 2022
@ghost
Copy link

ghost commented Nov 5, 2022

Issue needing attention of @Azure/aks-leads

6 similar comments
@ghost
Copy link

ghost commented Nov 20, 2022

Issue needing attention of @Azure/aks-leads

@ghost
Copy link

ghost commented Dec 5, 2022

Issue needing attention of @Azure/aks-leads

@ghost
Copy link

ghost commented Dec 20, 2022

Issue needing attention of @Azure/aks-leads

@ghost
Copy link

ghost commented Jan 4, 2023

Issue needing attention of @Azure/aks-leads

@ghost
Copy link

ghost commented Jan 20, 2023

Issue needing attention of @Azure/aks-leads

@ghost
Copy link

ghost commented Feb 4, 2023

Issue needing attention of @Azure/aks-leads

@aerott
Copy link

aerott commented Feb 4, 2023

AKS v1.25.4 got same issue - no CPU, memory metrics available in Prometheus :(

@ghost
Copy link

ghost commented Feb 19, 2023

Issue needing attention of @Azure/aks-leads

1 similar comment
@ghost
Copy link

ghost commented Mar 7, 2023

Issue needing attention of @Azure/aks-leads

@davidg-jellyfish
Copy link
Author

This isssue can be closed

I've found where the issue lies (in our terraform that defines the node we set a kubelet_config .  For every node pool where this has been set we are missing the CFS quota metrics.  If I log into one of the nodes and check the kubelet config, I can see the value as false

cat ./etc/default/kubeletconfig.json | grep CFS
    "cpuCFSQuota": false,
    cat ./etc/default/kubeletconfig.json | grep -i log
    "containerLogMaxSize": "100M",   <-- we are only changing this value but cpuCFSQuota is false, hence no CFS stats

For where we are not setting a kubelet_config there is no  kubeletconfig.json file so accepts the defaults as per the Azure configuration - https://learn.microsoft.com/en-us/azure/aks/custom-node-configuration meaning CFS quota is true and you get the stats.  

@ghost ghost locked as resolved and limited conversation to collaborators Apr 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action-required bug Needs Attention 👋 Issues needs attention/assignee/owner
Projects
None yet
Development

No branches or pull requests

2 participants