Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] Add support for monitoring node runtime & system resource usage #101

Closed
stevehipwell opened this issue Feb 16, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@stevehipwell
Copy link

Describe the enhancement you'd like

I'd like the nodes dashboard to show the runtime and system resource usage, as are exported by kubelet.

Additional context

This requires that the cAdvisor metrics for cgroup slices aren't being dropped. For this to work with Kube Prometheus Stack the kubelet ServiceMonitor cAdvisorMetricRelabelings value needs to be overridden to keep the required values.

@stevehipwell stevehipwell added the enhancement New feature or request label Feb 16, 2024
@dotdc
Copy link
Owner

dotdc commented Feb 28, 2024

Hi @stevehipwell,

Could you share which metrics you have in mind and what is your use-case?
If these metrics are disabled by default in kube-prometheus-stack, it might be because of their cardinality.

@stevehipwell
Copy link
Author

stevehipwell commented Feb 28, 2024

@dotdc I'd like to see the metrics showing the runtime and system resource utilization, the panels could be hidden with a tip on how to enable them if the metrics aren't present. A rough stab at the queries based on an EKS cluster follows.

Runtime Usage

(container_memory_rss{job="kubelet", id="/runtime.slice/kubelet.service"} + on(instance) container_memory_rss{job="kubelet", id="/runtime.slice/containerd.service"}) / (1024 * 1024)

(rate(container_cpu_usage_seconds_total{job="kubelet", id="/runtime.slice/kubelet.service"}[5m]) + on(instance) irate(container_cpu_usage_seconds_total{job="kubelet", id="/runtime.slice/containerd.service"}[5m])) / 1000

System Usage (mem only)

(container_memory_rss{job="kubelet", id="/"} - on(instance) (container_memory_rss{job="kubelet", id="/kubepods.slice"} + on(instance) container_memory_rss{job="kubelet", id="/runtime.slice/kubelet.service"} + on(instance)  container_memory_rss{job="kubelet", id="/runtime.slice/containerd.service"})) / (1024 * 1024)

AFAIK they're not high cardinality metrics and were disabled via a fairly opaque discussion.

@dotdc
Copy link
Owner

dotdc commented Apr 12, 2024

@stevehipwell

I have the metrics, but they don't have the same id path, here's an example:

{
    __name__="container_memory_rss",
    container="prometheus",
    endpoint="https-metrics",
    id="/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/kubelet-kubepods-besteffort-....slice/cri-containerd-....scope",
    image="quay.io/prometheus/prometheus:v2.51.0",
    instance="172.18.0.5:10250",
    job="kubelet",
    metrics_path="/metrics/cadvisor",
    name="...",
    namespace="monitoring",
    node="k8s-worker-003",
    pod="prometheus-kube-prometheus-stack-prometheus-0",
    service="kube-prometheus-stack-kubelet"
}

Do you think you can make generic queries that would work everywhere (with sum() by) or a regex?

@stevehipwell
Copy link
Author

@dotdc the metrics above are container metrics, you need the following metrics. Due to AKS currently using system.slice instead of runtime.slice I've added that to the metrics selector.

  • {job="kubelet", id=~"/(?:runtime|system).slice/kubelet.service"}
  • {job="kubelet", id=~"/(?:runtime|system).slice/containerd.service"}
  • {job="kubelet", id="/}

@dotdc
Copy link
Owner

dotdc commented Apr 20, 2024

@stevehipwell I'm sorry, but I don't understand what you're trying to achieve.

Which information are you missing in this dashboard (panel title) ?
Could you create the panel and draft a PR, so I can take a look ?
Would also need the cAdvisorMetricRelabelings configuration you mentioned earlier.

@dotdc dotdc closed this as completed May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants