Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce metrics #2387

Merged
merged 3 commits into from
Feb 6, 2019
Merged

Reduce metrics #2387

merged 3 commits into from
Feb 6, 2019

Conversation

brancz
Copy link
Contributor

@brancz brancz commented Feb 6, 2019

As described in google/cadvisor#1925, the container network metrics are disabled and therefore cause more confusion than gain as they are always set to 0. This PR removes those at ingestion time, and also drops unnecessary high cardinality metrics from the Kubernetes API.

@metalmatze @mxinden @squat @s-urbaniak

@s-urbaniak
Copy link
Contributor

LGTM 👌

@brancz brancz merged commit 7b73aa0 into prometheus-operator:master Feb 6, 2019
@brancz brancz deleted the reduce-metrics branch February 6, 2019 15:02
haskjold pushed a commit to Uninett/kubernetes-terraform that referenced this pull request Feb 11, 2019
- Adjust network metrics to filter on veth* rather than hard-coded 'eth0'
- Remove some k8s API metrics with very high cardinality
- Drop metrics that cadvisor does not collect, but expose anyway

Upstream ref: prometheus-operator/prometheus-operator#2387
haskjold pushed a commit to Uninett/kubernetes-terraform that referenced this pull request Apr 10, 2019
- Adjust network metrics to filter on veth* rather than hard-coded 'eth0'
- Remove some k8s API metrics with very high cardinality
- Drop metrics that cadvisor does not collect, but expose anyway

Upstream ref: prometheus-operator/prometheus-operator#2387
haskjold pushed a commit to Uninett/kubernetes-terraform that referenced this pull request Apr 10, 2019
- Adjust network metrics to filter on veth* rather than hard-coded 'eth0'
- Remove some k8s API metrics with very high cardinality
- Drop metrics that cadvisor does not collect, but expose anyway

Upstream ref: prometheus-operator/prometheus-operator#2387
haskjold pushed a commit to Uninett/kubernetes-terraform that referenced this pull request Apr 10, 2019
- Adjust network metrics to filter on veth* rather than hard-coded 'eth0'
- Remove some k8s API metrics with very high cardinality
- Drop metrics that cadvisor does not collect, but expose anyway

Upstream ref: prometheus-operator/prometheus-operator#2387
haskjold pushed a commit to Uninett/kubernetes-terraform that referenced this pull request Apr 10, 2019
- Adjust network metrics to filter on veth* rather than hard-coded 'eth0'
- Remove some k8s API metrics with very high cardinality
- Drop metrics that cadvisor does not collect, but expose anyway

Upstream ref: prometheus-operator/prometheus-operator#2387
haskjold pushed a commit to Uninett/kubernetes-terraform that referenced this pull request Apr 10, 2019
- Replace hard-coded roles with a ClusteRole for now
  * We might want to revisit this if multi-tenancy issues become relevant
- Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var
- Convert Prometheus manifests to Ansible j2 templates and some tweaks
  * Define Prometheus instance name and Prometheus server version as Ansible vars
  * Reduce to 1 replica of Prometheus
  * Adjust requests and limits
- Remove hard-coded serverName for API servers
  * If we want to limit to servername we need to template this somehow
  * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig
- Add Ansible role for prometheus_operator
  * Generates the files only for operator and prometheus yet, no apply
- Add prometheus operator namespace
- Network policy for Prometheus
- Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor
- rules
  * Fix hard-coded prometheus job name in alerts
  * Add namespace to a bunch of prometheus rules, synced with upstream
- Prometheus metrics scraping adjustments
  * Adjust network metrics to filter on veth* rather than hard-coded 'eth0'
  * Remove some k8s API metrics with very high cardinality
  * Drop metrics that cadvisor does not collect, but expose anyway
    Upstream ref: prometheus-operator/prometheus-operator#2387
- Make Prometheus storage persistent and bump it to 100Gi
haskjold pushed a commit to Uninett/kubernetes-terraform that referenced this pull request Apr 10, 2019
- Replace hard-coded roles with a ClusteRole for now
  * We might want to revisit this if multi-tenancy issues become relevant
- Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var
- Convert Prometheus manifests to Ansible j2 templates and some tweaks
  * Define Prometheus instance name and Prometheus server version as Ansible vars
  * Reduce to 1 replica of Prometheus
  * Adjust requests and limits
- Remove hard-coded serverName for API servers
  * If we want to limit to servername we need to template this somehow
  * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig
- Add Ansible role for prometheus_operator
  * Generates the files only for operator and prometheus yet, no apply
- Add prometheus operator namespace
- Network policy for Prometheus
- Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor
- rules
  * Fix hard-coded prometheus job name in alerts
  * Add namespace to a bunch of prometheus rules, synced with upstream
- Prometheus metrics scraping adjustments
  * Adjust network metrics to filter on veth* rather than hard-coded 'eth0'
  * Remove some k8s API metrics with very high cardinality
  * Drop metrics that cadvisor does not collect, but expose anyway
    Upstream ref: prometheus-operator/prometheus-operator#2387
- Make Prometheus storage persistent and bump it to 100Gi
- Bump Prometheus retention from default 24 hours to one month
haskjold pushed a commit to Uninett/kubernetes-terraform that referenced this pull request Apr 10, 2019
- Replace hard-coded roles with a ClusteRole for now
  * We might want to revisit this if multi-tenancy issues become relevant
- Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var
- Convert Prometheus manifests to Ansible j2 templates and some tweaks
  * Define Prometheus instance name and Prometheus server version as Ansible vars
  * Reduce to 1 replica of Prometheus
  * Adjust requests and limits
- Remove hard-coded serverName for API servers
  * If we want to limit to servername we need to template this somehow
  * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig
- Add Ansible role for prometheus_operator
  * Generates the files only for operator and prometheus yet, no apply
- Add prometheus operator namespace
- Network policy for Prometheus
- Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor
- rules
  * Fix hard-coded prometheus job name in alerts
  * Add namespace to a bunch of prometheus rules, synced with upstream
- Prometheus metrics scraping adjustments
  * Adjust network metrics to filter on veth* rather than hard-coded 'eth0'
  * Remove some k8s API metrics with very high cardinality
  * Drop metrics that cadvisor does not collect, but expose anyway
    Upstream ref: prometheus-operator/prometheus-operator#2387
- Make Prometheus storage persistent and bump it to 100Gi
- Bump Prometheus retention from default 24 hours to one month
haskjold pushed a commit to Uninett/kubernetes-terraform that referenced this pull request Apr 11, 2019
- Replace hard-coded roles with a ClusteRole for now
  * We might want to revisit this if multi-tenancy issues become relevant
- Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var
- Convert Prometheus manifests to Ansible j2 templates and some tweaks
  * Define Prometheus instance name and Prometheus server version as Ansible vars
  * Reduce to 1 replica of Prometheus
  * Adjust requests and limits
- Remove hard-coded serverName for API servers
  * If we want to limit to servername we need to template this somehow
  * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig
- Add Ansible role for prometheus_operator
  * Generates the files only for operator and prometheus yet, no apply
- Add prometheus operator namespace
- Network policy for Prometheus
- Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor
- rules
  * Fix hard-coded prometheus job name in alerts
  * Add namespace to a bunch of prometheus rules, synced with upstream
- Prometheus metrics scraping adjustments
  * Adjust network metrics to filter on veth* rather than hard-coded 'eth0'
  * Remove some k8s API metrics with very high cardinality
  * Drop metrics that cadvisor does not collect, but expose anyway
    Upstream ref: prometheus-operator/prometheus-operator#2387
- Make Prometheus storage persistent and bump it to 100Gi
- Bump Prometheus retention from default 24 hours to one month
haskjold pushed a commit to Uninett/kubernetes-terraform that referenced this pull request Apr 11, 2019
- Replace hard-coded roles with a ClusteRole for now
  * We might want to revisit this if multi-tenancy issues become relevant
- Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var
- Convert Prometheus manifests to Ansible j2 templates and some tweaks
  * Define Prometheus instance name and Prometheus server version as Ansible vars
  * Reduce to 1 replica of Prometheus
  * Adjust requests and limits
- Remove hard-coded serverName for API servers
  * If we want to limit to servername we need to template this somehow
  * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig
- Add Ansible role for prometheus_operator
  * Generates the files only for operator and prometheus yet, no apply
- Add prometheus operator namespace
- Network policy for Prometheus
- Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor
- rules
  * Fix hard-coded prometheus job name in alerts
  * Add namespace to a bunch of prometheus rules, synced with upstream
- Prometheus metrics scraping adjustments
  * Adjust network metrics to filter on veth* rather than hard-coded 'eth0'
  * Remove some k8s API metrics with very high cardinality
  * Drop metrics that cadvisor does not collect, but expose anyway
    Upstream ref: prometheus-operator/prometheus-operator#2387
- Make Prometheus storage persistent and bump it to 100Gi
- Bump Prometheus retention from default 24 hours to one month
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants