Skip to content

Commit

Permalink
Proper aggregation of istio metrics using prometheus federation (kyma…
Browse files Browse the repository at this point in the history
…-project#9700)

* switch to federation for istio metrics by adding a second prometheus
  • Loading branch information
rakesh-garimella authored and dbadura committed Oct 19, 2020
1 parent f58f4c6 commit 182dc1c
Show file tree
Hide file tree
Showing 38 changed files with 2,205 additions and 2,467 deletions.
5 changes: 3 additions & 2 deletions docs/monitoring/05-03-prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@ This table lists the configurable parameters, their descriptions, and default va

| Parameter | Description | Default value |
|-----------|-------------|---------------|
| **retention** | Specifies a period for which Prometheus stores the metrics in-memory. This retention time applies to in-memory storage only. Prometheus stores the recent data in-memory for the specified amount of time to avoid reading the entire data from disk.| `2h` |
| **storageSpec.volumeClaimTemplate.spec.resources.requests.storage** | Specifies the size of a Persistent Volume Claim (PVC). | `4Gi` |
| **prometheusSpec.retention** | Specifies a period for which Prometheus stores the metrics.| `1d` |
| **prometheusSpec.retentionSize** | Specifies the maximum number of bytes that storage blocks can use. The oldest data will be removed first.| `2GB` |
| **prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage** | Specifies the size of a PersistentVolumeClaim (PVC). | `10Gi` |
17 changes: 8 additions & 9 deletions docs/monitoring/05-04-production-profile.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@ The table shows the parameters of each profile and their values:

Parameter | Description | Default profile| Production profile | Local profile|
|-----------|-------------|----------------|--------------------|--------------|
| **retentionSize** | Maximum number of bytes that storage blocks can use. The oldest data will be removed first. | `2GB` | `15GB` | `500MB` |
| **retention** | Time period for which Prometheus stores metrics in an in-memory database. Prometheus stores the recent data for the specified amount of time to avoid reading all data from the disk. This parameter only applies to in-memory storage.|`1d`| `30d` | `2h`|
| **prometheusSpec.volumeClaimTemplate.spec.resources.requests.storage** | Amount of storage requested by the Prometheus Pod. |`10Gi`| `20Gi` | `1Gi` |
| **prometheusSpec.resources.limits.cpu** | Maximum number of CPUs available for the Prometheus Pod to use. | `600m`| `1` | `150m`|
| **prometheusSpec.resources.limits.memory** | Maximum amount of memory available for the Prometheus Pod to use. |`1500Mi` | `3Gi` |`800Mi`|
| **prometheusSpec.resources.requests.cpu** | Number of CPUs requested by the Prometheus Pod to operate.| `300m`| `300m` | `100m` |
| **prometheusSpec.resources.requests.memory** | Amount of memory requested by the Prometheus Pod to operate. | `1000Mi`| `1Gi` | `200Mi` |
| **prometheus.prometheusSpec.retentionSize** | Maximum number of bytes that storage blocks can use. The oldest data will be removed first. | `2GB` | `15GB` | `256MB` |
| **prometheus.prometheusSpec.retention** | Time period for which Prometheus stores the metrics. |`1d`| `30d` | `2h`|
| **prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage** | Amount of storage requested by the Prometheus Pod. |`10Gi`| `20Gi` | `1Gi` |
| **prometheus.prometheusSpec.resources.limits.cpu** | Maximum number of CPUs available for the Prometheus Pod to use. | `600m`| `1` | `150m`|
| **prometheus.prometheusSpec.resources.limits.memory** | Maximum amount of memory available for the Prometheus Pod to use. |`2Gi` | `3Gi` |`800Mi`|
| **prometheus.prometheusSpec.resources.requests.cpu** | Number of CPUs requested by the Prometheus Pod to operate.| `200m`| `300m` | `100m` |
| **prometheus.prometheusSpec.resources.requests.memory** | Amount of memory requested by the Prometheus Pod to operate. | `600Mi`| `1Gi` | `200Mi` |
| **alertmanager.alertmanagerSpec.retention** | Time period for which Alertmanager retains data.| `120h` | `240h` | `1h` |
| **grafana.persistence.enabled**| Storing grafana database on a PersistentVolume?|`true`|`true`|`false`|
| **grafana.persistence.enabled**| Parameter that enables storing Grafana database on a PersistentVolume |`true`|`true`|`false`|

## Use profiles

Expand Down Expand Up @@ -113,4 +113,3 @@ You can deploy a Kyma cluster with Monitoring configured to use the production p
</details>
</div>
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ data:
prometheus.prometheusSpec.resources.limits.memory: "3Gi"
prometheus.prometheusSpec.resources.requests.cpu: "300m"
prometheus.prometheusSpec.resources.requests.memory: "1Gi"
prometheusIstio.server.resources.limits.memory: "3Gi"
alertmanager.alertmanagerSpec.retention: "240h"

---
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.dashboards.enabled) }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -1236,3 +1237,4 @@ data:
"uid": "G8wLrJIZk",
"version": 5
}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.dashboards.enabled) }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -1833,3 +1834,4 @@ data:
"uid": "vu8e0VWZk",
"version": 22
}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.dashboards.enabled) }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -1601,4 +1602,5 @@ data:
"title": "Istio / Pilot",
"uid": "3--MLVZZk",
"version": 11
}
}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.dashboards.enabled) }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -2612,3 +2613,4 @@ data:
"uid": "LJ_uJAvmk",
"version": 1
}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.dashboards.enabled) }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down Expand Up @@ -2314,3 +2315,4 @@ data:
"uid": "UbsSZTDik",
"version": 1
}
{{- end }}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{{- if (and .Values.monitoring.enabled .Values.monitoring.istioServiceMonitor.enabled) }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
Expand All @@ -20,7 +21,9 @@ spec:
- istio-system
endpoints:
- path: /metrics
interval: 30s
{{- if .Values.monitoring.istioServiceMonitor.scrapeInterval }}
interval: {{ .Values.monitoring.istioServiceMonitor.scrapeInterval }}
{{- end }}
relabelings:
- sourceLabels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
Expand All @@ -31,4 +34,4 @@ spec:
- sourceLabels: [ __name__ ]
regex: ^(envoy_cluster_upstream_cx_active|envoy_cluster_upstream_cx_connect_fail|envoy_cluster_upstream_cx_rx_bytes_total|envoy_cluster_upstream_cx_total|envoy_cluster_upstream_cx_tx_bytes_total|envoy_server_hot_restart_epoch|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|grpc_io_server_completed_rpcs|grpc_io_server_server_latency_bucket|istio_build|istio_mcp_request_acks_total|istio_mcp_request_nacks_total|mixer_runtime_dispatch_duration_seconds_bucket|mixer_runtime_dispatches_total|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|istio_build|istio_mcp_request_acks_total|pilot_conflict_inbound_listener|pilot_conflict_outbound_listener_http_over_current_tcp|pilot_conflict_outbound_listener_tcp_over_current_http|pilot_conflict_outbound_listener_tcp_over_current_tcp|pilot_proxy_convergence_time_bucket|pilot_services|pilot_virt_services|pilot_xds_push_context_errors|pilot_total_xds_rejects|pilot_total_xds_internal_errors|pilot_xds_write_timeout|pilot_xds_lds_reject|pilot_xds_rds_reject|pilot_xds_push_timeout_failures|pilot_xds_eds_instances|pilot_xds_eds_reject|pilot_xds|pilot_xds_push_timeout|pilot_xds_push_errors|pilot_xds_cds_reject|pilot_xds_pushes|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|galley_istio_authentication_meshpolicies|galley_istio_networking_destinationrules|galley_istio_networking_gateways|galley_istio_networking_virtualservices|galley_runtime_processor_events_processed_total|galley_runtime_processor_snapshot_events_total_bucket|galley_runtime_processor_snapshots_published_total|galley_runtime_state_type_instances_total|galley_runtime_strategy_on_change_total|galley_runtime_strategy_timer_max_time_reached_total|galley_runtime_strategy_timer_quiesce_reached_total|galley_runtime_strategy_timer_resets_total|galley_source_kube_dynamic_converter_failure_total|galley_source_kube_dynamic_converter_success_total|galley_source_kube_event_error_total|galley_source_kube_event_success_total|galley_validation_http_error|galley_validation_cert_key_update_errors|galley_validation_cert_key_updates|galley_validation_passed|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|istio_build|istio_mcp_clients_total|istio_mcp_request_acks_total|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|citadel_secret_controller_secret_deleted_cert_count|citadel_secret_controller_svc_acc_deleted_cert_count|citadel_secret_controller_svc_acc_created_cert_count|citadel_server_authentication_failure_count|citadel_server_csr_count|citadel_secret_controller_csr_err_count|citadel_server_csr_parsing_err_count|citadel_server_success_cert_issuance_count|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|grpc_server_handled_total|grpc_server_handling_seconds_bucket|grpc_server_started_total|istio_build|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|secret_deleted_cert_count|svc_acc_created_cert_count|svc_acc_deleted_cert_count|go_goroutines|go_memstats_alloc_bytes|go_memstats_heap_alloc_bytes|go_memstats_heap_inuse_bytes|go_memstats_heap_sys_bytes|go_memstats_stack_inuse_bytes|grpc_io_server_completed_rpcs|grpc_io_server_server_latency_bucket|istio_build|istio_mcp_request_acks_total|mixer_runtime_dispatch_duration_seconds_bucket|mixer_runtime_dispatches_total|process_cpu_seconds_total|process_max_fds|process_open_fds|process_resident_memory_bytes|process_start_time_seconds|process_virtual_memory_bytes|istio_request_bytes_bucket|istio_request_bytes_sum|istio_request_duration_milliseconds_bucket|istio_requests_total|istio_response_bytes_bucket|istio_response_bytes_sum|istio_tcp_received_bytes_total|istio_tcp_sent_bytes_total)$
action: keep
{{- end }}
8 changes: 8 additions & 0 deletions resources/istio/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,11 @@ istio:
installer:
image: eu.gcr.io/kyma-project/istio-installer
tag: 19e240cd

monitoring:
enabled: true
dashboards:
enabled: true
istioServiceMonitor:
enabled: true
scrapeInterval: ""
5 changes: 5 additions & 0 deletions resources/monitoring/charts/prometheus-istio/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v1
name: prometheus-istio
version: 11.16.2
appVersion: 2.21.0
description: Prometheus is a monitoring system and time series database.

0 comments on commit 182dc1c

Please sign in to comment.