-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce metrics #2387
Merged
Merged
Reduce metrics #2387
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
LGTM 👌 |
s-urbaniak
approved these changes
Feb 6, 2019
dghubble
added a commit
to poseidon/typhoon
that referenced
this pull request
Feb 11, 2019
dghubble
added a commit
to poseidon/typhoon
that referenced
this pull request
Feb 11, 2019
haskjold
pushed a commit
to Uninett/kubernetes-terraform
that referenced
this pull request
Feb 11, 2019
- Adjust network metrics to filter on veth* rather than hard-coded 'eth0' - Remove some k8s API metrics with very high cardinality - Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387
haskjold
pushed a commit
to Uninett/kubernetes-terraform
that referenced
this pull request
Apr 10, 2019
- Adjust network metrics to filter on veth* rather than hard-coded 'eth0' - Remove some k8s API metrics with very high cardinality - Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387
haskjold
pushed a commit
to Uninett/kubernetes-terraform
that referenced
this pull request
Apr 10, 2019
- Adjust network metrics to filter on veth* rather than hard-coded 'eth0' - Remove some k8s API metrics with very high cardinality - Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387
haskjold
pushed a commit
to Uninett/kubernetes-terraform
that referenced
this pull request
Apr 10, 2019
- Adjust network metrics to filter on veth* rather than hard-coded 'eth0' - Remove some k8s API metrics with very high cardinality - Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387
haskjold
pushed a commit
to Uninett/kubernetes-terraform
that referenced
this pull request
Apr 10, 2019
- Adjust network metrics to filter on veth* rather than hard-coded 'eth0' - Remove some k8s API metrics with very high cardinality - Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387
haskjold
pushed a commit
to Uninett/kubernetes-terraform
that referenced
this pull request
Apr 10, 2019
- Replace hard-coded roles with a ClusteRole for now * We might want to revisit this if multi-tenancy issues become relevant - Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var - Convert Prometheus manifests to Ansible j2 templates and some tweaks * Define Prometheus instance name and Prometheus server version as Ansible vars * Reduce to 1 replica of Prometheus * Adjust requests and limits - Remove hard-coded serverName for API servers * If we want to limit to servername we need to template this somehow * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig - Add Ansible role for prometheus_operator * Generates the files only for operator and prometheus yet, no apply - Add prometheus operator namespace - Network policy for Prometheus - Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor - rules * Fix hard-coded prometheus job name in alerts * Add namespace to a bunch of prometheus rules, synced with upstream - Prometheus metrics scraping adjustments * Adjust network metrics to filter on veth* rather than hard-coded 'eth0' * Remove some k8s API metrics with very high cardinality * Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387 - Make Prometheus storage persistent and bump it to 100Gi
haskjold
pushed a commit
to Uninett/kubernetes-terraform
that referenced
this pull request
Apr 10, 2019
- Replace hard-coded roles with a ClusteRole for now * We might want to revisit this if multi-tenancy issues become relevant - Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var - Convert Prometheus manifests to Ansible j2 templates and some tweaks * Define Prometheus instance name and Prometheus server version as Ansible vars * Reduce to 1 replica of Prometheus * Adjust requests and limits - Remove hard-coded serverName for API servers * If we want to limit to servername we need to template this somehow * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig - Add Ansible role for prometheus_operator * Generates the files only for operator and prometheus yet, no apply - Add prometheus operator namespace - Network policy for Prometheus - Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor - rules * Fix hard-coded prometheus job name in alerts * Add namespace to a bunch of prometheus rules, synced with upstream - Prometheus metrics scraping adjustments * Adjust network metrics to filter on veth* rather than hard-coded 'eth0' * Remove some k8s API metrics with very high cardinality * Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387 - Make Prometheus storage persistent and bump it to 100Gi - Bump Prometheus retention from default 24 hours to one month
haskjold
pushed a commit
to Uninett/kubernetes-terraform
that referenced
this pull request
Apr 10, 2019
- Replace hard-coded roles with a ClusteRole for now * We might want to revisit this if multi-tenancy issues become relevant - Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var - Convert Prometheus manifests to Ansible j2 templates and some tweaks * Define Prometheus instance name and Prometheus server version as Ansible vars * Reduce to 1 replica of Prometheus * Adjust requests and limits - Remove hard-coded serverName for API servers * If we want to limit to servername we need to template this somehow * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig - Add Ansible role for prometheus_operator * Generates the files only for operator and prometheus yet, no apply - Add prometheus operator namespace - Network policy for Prometheus - Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor - rules * Fix hard-coded prometheus job name in alerts * Add namespace to a bunch of prometheus rules, synced with upstream - Prometheus metrics scraping adjustments * Adjust network metrics to filter on veth* rather than hard-coded 'eth0' * Remove some k8s API metrics with very high cardinality * Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387 - Make Prometheus storage persistent and bump it to 100Gi - Bump Prometheus retention from default 24 hours to one month
haskjold
pushed a commit
to Uninett/kubernetes-terraform
that referenced
this pull request
Apr 11, 2019
- Replace hard-coded roles with a ClusteRole for now * We might want to revisit this if multi-tenancy issues become relevant - Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var - Convert Prometheus manifests to Ansible j2 templates and some tweaks * Define Prometheus instance name and Prometheus server version as Ansible vars * Reduce to 1 replica of Prometheus * Adjust requests and limits - Remove hard-coded serverName for API servers * If we want to limit to servername we need to template this somehow * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig - Add Ansible role for prometheus_operator * Generates the files only for operator and prometheus yet, no apply - Add prometheus operator namespace - Network policy for Prometheus - Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor - rules * Fix hard-coded prometheus job name in alerts * Add namespace to a bunch of prometheus rules, synced with upstream - Prometheus metrics scraping adjustments * Adjust network metrics to filter on veth* rather than hard-coded 'eth0' * Remove some k8s API metrics with very high cardinality * Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387 - Make Prometheus storage persistent and bump it to 100Gi - Bump Prometheus retention from default 24 hours to one month
haskjold
pushed a commit
to Uninett/kubernetes-terraform
that referenced
this pull request
Apr 11, 2019
- Replace hard-coded roles with a ClusteRole for now * We might want to revisit this if multi-tenancy issues become relevant - Replaced some files with Ansible templated files so we can set prometheus instance name from ansible var - Convert Prometheus manifests to Ansible j2 templates and some tweaks * Define Prometheus instance name and Prometheus server version as Ansible vars * Reduce to 1 replica of Prometheus * Adjust requests and limits - Remove hard-coded serverName for API servers * If we want to limit to servername we need to template this somehow * Ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#tlsconfig - Add Ansible role for prometheus_operator * Generates the files only for operator and prometheus yet, no apply - Add prometheus operator namespace - Network policy for Prometheus - Mount kubelet CA cert and use it in the service monitoring of kubelet /metrics and /metrics/cadvisor - rules * Fix hard-coded prometheus job name in alerts * Add namespace to a bunch of prometheus rules, synced with upstream - Prometheus metrics scraping adjustments * Adjust network metrics to filter on veth* rather than hard-coded 'eth0' * Remove some k8s API metrics with very high cardinality * Drop metrics that cadvisor does not collect, but expose anyway Upstream ref: prometheus-operator/prometheus-operator#2387 - Make Prometheus storage persistent and bump it to 100Gi - Bump Prometheus retention from default 24 hours to one month
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As described in google/cadvisor#1925, the container network metrics are disabled and therefore cause more confusion than gain as they are always set to 0. This PR removes those at ingestion time, and also drops unnecessary high cardinality metrics from the Kubernetes API.
@metalmatze @mxinden @squat @s-urbaniak