Reduce metrics #2387

brancz · 2019-02-06T13:39:23Z

As described in google/cadvisor#1925, the container network metrics are disabled and therefore cause more confusion than gain as they are always set to 0. This PR removes those at ingestion time, and also drops unnecessary high cardinality metrics from the Kubernetes API.

@metalmatze @mxinden @squat @s-urbaniak

s-urbaniak · 2019-02-06T14:38:27Z

LGTM 👌

* prometheus-operator/prometheus-operator#2387 * prometheus-operator/prometheus-operator#1959

- Adjust network metrics to filter on veth* rather than hard-coded 'eth0' - Remove some k8s API metrics with very high cardinality - Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387

- Replace hard-coded roles with a ClusteRole for now * We might want to revisit this if multi-tenancy issues become relevant - Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var - Convert Prometheus manifests to Ansible j2 templates and some tweaks * Define Prometheus instance name and Prometheus server version as Ansible vars * Reduce to 1 replica of Prometheus * Adjust requests and limits - Remove hard-coded serverName for API servers * If we want to limit to servername we need to template this somehow * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig - Add Ansible role for prometheus_operator * Generates the files only for operator and prometheus yet, no apply - Add prometheus operator namespace - Network policy for Prometheus - Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor - rules * Fix hard-coded prometheus job name in alerts * Add namespace to a bunch of prometheus rules, synced with upstream - Prometheus metrics scraping adjustments * Adjust network metrics to filter on veth* rather than hard-coded 'eth0' * Remove some k8s API metrics with very high cardinality * Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387 - Make Prometheus storage persistent and bump it to 100Gi

- Replace hard-coded roles with a ClusteRole for now * We might want to revisit this if multi-tenancy issues become relevant - Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var - Convert Prometheus manifests to Ansible j2 templates and some tweaks * Define Prometheus instance name and Prometheus server version as Ansible vars * Reduce to 1 replica of Prometheus * Adjust requests and limits - Remove hard-coded serverName for API servers * If we want to limit to servername we need to template this somehow * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig - Add Ansible role for prometheus_operator * Generates the files only for operator and prometheus yet, no apply - Add prometheus operator namespace - Network policy for Prometheus - Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor - rules * Fix hard-coded prometheus job name in alerts * Add namespace to a bunch of prometheus rules, synced with upstream - Prometheus metrics scraping adjustments * Adjust network metrics to filter on veth* rather than hard-coded 'eth0' * Remove some k8s API metrics with very high cardinality * Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387 - Make Prometheus storage persistent and bump it to 100Gi - Bump Prometheus retention from default 24 hours to one month

brancz added 3 commits February 6, 2019 14:29

kube-prometheus: Drop disabled and high cardinality metrics

986d387

kube-prometheus: Update jsonnet deps

68e5344

kube-prometheus: Re-generate

f01276d

s-urbaniak approved these changes Feb 6, 2019

View reviewed changes

brancz merged commit 7b73aa0 into prometheus-operator:master Feb 6, 2019

brancz deleted the reduce-metrics branch February 6, 2019 15:02

dghubble added a commit to poseidon/typhoon that referenced this pull request Feb 11, 2019

Drop metrics that are unset, high cardinality, or extraneous

49089d2

* prometheus-operator/prometheus-operator#2387 * prometheus-operator/prometheus-operator#1959

dghubble mentioned this pull request Feb 11, 2019

Improve Prometheus relabeling and drop extraneous metrics poseidon/typhoon#397

Merged

dghubble added a commit to poseidon/typhoon that referenced this pull request Feb 11, 2019

Drop metrics that are unset, high cardinality, or extraneous

b13a651

* prometheus-operator/prometheus-operator#2387 * prometheus-operator/prometheus-operator#1959

brancz mentioned this pull request Feb 18, 2019

Prometheus in prometheus-operator consume 1G+ memory #1760

Closed

salamachinas mentioned this pull request Mar 14, 2019

[stable/prometheus-operator] kubelet.serviceMonitor.cAdvisorMetricRelabelings doesn't work with https helm/charts#12215

Closed

arajkumar mentioned this pull request Jun 28, 2021

Bug 1950810: Remove high cardinality metrics from cadvisor and apiserver openshift/cluster-monitoring-operator#1251

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce metrics #2387

Reduce metrics #2387

brancz commented Feb 6, 2019

s-urbaniak commented Feb 6, 2019

Reduce metrics #2387

Reduce metrics #2387

Conversation

brancz commented Feb 6, 2019

s-urbaniak commented Feb 6, 2019