From 0999e493e32de5777942b10e61aa9903317b1846 Mon Sep 17 00:00:00 2001 From: Vijit Singhal Date: Mon, 10 Feb 2020 17:09:09 -0800 Subject: [PATCH 1/6] add common issues in troubleshooting doc --- deploy/docs/Troubleshoot_Collection.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/deploy/docs/Troubleshoot_Collection.md b/deploy/docs/Troubleshoot_Collection.md index 7a4c3df525..69baffa269 100644 --- a/deploy/docs/Troubleshoot_Collection.md +++ b/deploy/docs/Troubleshoot_Collection.md @@ -257,6 +257,16 @@ helm install stable/prometheus-operator --name prometheus-operator --namespace s There’s an issue with backwards compatibility in the current version of the prometheus-operator helm chart that requires us to override the selectors for kube-scheduler and kube-controller-manager in order to see metrics from them. If you are not seeing metrics from these two targets, try running the commands in the "Configure Prometheus" section [here](./Non_Helm_Installation.md#missing-metrics-for-controller-manager-or-scheduler). +### Promethues stuck in `Terminating` state after running `helm del collection` +Delete the pod forcefully by adding `--force --grace-period=0` to the `kubectl delete pod` command. + + +### Validation error in helm installation +``` bash +Error: validation failed: [unable to recognize no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1" +``` +This is a known race condition with Helm and there is no workaround at this time. If this happens just re-run the `helm install` command and add in `--no-crd-hook` to the helm install command. + ### Rancher If you are running the out of the box rancher monitoring setup, you cannot run our Prometheus operator alongside it. The Rancher Prometheus Operator setup will actually kill and permanently terminate our Prometheus Operator instance and will prevent the metrics system from coming up. From cd3b2da315cbccf83d82e74c7b883131ff2bc26c Mon Sep 17 00:00:00 2001 From: Vijit Singhal Date: Mon, 10 Feb 2020 17:09:28 -0800 Subject: [PATCH 2/6] add common tasks in the best practices doc --- deploy/docs/Best_Practices.md | 43 +++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/deploy/docs/Best_Practices.md b/deploy/docs/Best_Practices.md index 6e88390f91..b4b9e3384c 100644 --- a/deploy/docs/Best_Practices.md +++ b/deploy/docs/Best_Practices.md @@ -85,3 +85,46 @@ $ helm upgrade collection sumologic/sumologic --reuse-values -f values.yaml See the following links to official Fluentd buffer documentation: - https://docs.fluentd.org/configuration/buffer-section - https://docs.fluentd.org/buffer/file + +### Excluding Logs From Specific Components + +You can exclude specific logs from being sent to Sumo Logic by specifying the following parameters either in the `values.yaml` file or the `helm install` command. +``` +excludeContainerRegex +excludeHostRegex +excludeNamespaceRegex +excludePodRegex +``` + + - This is Ruby regex, so all ruby regex rules apply. Unlike regex in the Sumo collector, you do not need to match the entire line. When doing multiple patterns, put them inside of parentheses and pipe separate them. + - For things like pods and containers you will need to use a star at the end because the string is dynamic. Example: +```bash +excludepodRegex: "(dashboard.*|sumologic.*)" +``` + - For things like namespace you won’t need to use a star at the end since there is no dynamic string. Example: +```bash +excludeNamespaceRegex: “(sumologic|kube-public)” +``` + +### Add a local file to fluent-bit configuration + +If you want to capture container logs to a container that writes locally, you will need to ensure the logs get mounted to the host so fluent-bit can be configured to capture from the host. + +Example: +In the fluentbit overrides file (https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/master/deploy/fluent-bit/overrides.yaml) in the `rawConfig section`, you have to add a new input specifying the file path, eg. + +```bash +[INPUT] + Name tail + Path /var/log/syslog +``` +Reference: https://fluentbit.io/documentation/0.12/input/tail.html + +### Filtering Prometheus Metrics by Namespace in the Remote Write Config +If you want to filter metrics by namespace, it can be done in the prometheus remote write config. Here is an example of excluding kube-state metrics. +```bash + - action: drop + regex: kube-state-metrics;(namespace1|namespace2) + sourceLabels: [job, namespace] +``` +The above section should be added in each of the kube-state remote write blocks. \ No newline at end of file From 79391d576c0b0ac166650f0ffee829829cffab2b Mon Sep 17 00:00:00 2001 From: Travis CI Date: Tue, 11 Feb 2020 01:16:18 +0000 Subject: [PATCH 3/6] Generate new overrides yaml/libsonnet file(s). --- ...kube-prometheus-sumo-logic-mixin.libsonnet | 144 +----------------- 1 file changed, 1 insertion(+), 143 deletions(-) diff --git a/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet b/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet index 22b0fa582c..7e28cc3bfe 100644 --- a/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet +++ b/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet @@ -4,149 +4,7 @@ clusterName: "kubernetes" }, sumologicCollector:: { - remoteWriteConfigs+: [ - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.state", - writeRelabelConfigs: [ - { - action: "keep", - regex: "kube-state-metrics;(?:kube_statefulset_status_observed_generation|kube_statefulset_status_replicas|kube_statefulset_replicas|kube_statefulset_metadata_generation|kube_daemonset_status_current_number_scheduled|kube_daemonset_status_desired_number_scheduled|kube_daemonset_status_number_misscheduled|kube_daemonset_status_number_unavailable|kube_daemonset_metadata_generation|kube_deployment_metadata_generation|kube_deployment_spec_paused|kube_deployment_spec_replicas|kube_deployment_spec_strategy_rollingupdate_max_unavailable|kube_deployment_status_replicas_available|kube_deployment_status_observed_generation|kube_deployment_status_replicas_unavailable|kube_node_info|kube_node_spec_unschedulable|kube_node_status_allocatable|kube_node_status_capacity|kube_node_status_condition|kube_pod_container_info|kube_pod_container_resource_requests|kube_pod_container_resource_limits|kube_pod_container_status_ready|kube_pod_container_status_terminated_reason|kube_pod_container_status_waiting_reason|kube_pod_container_status_restarts_total|kube_pod_status_phase)", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.controller-manager", - writeRelabelConfigs: [ - { - action: "keep", - regex: "kubelet;cloudprovider_.*_api_request_duration_seconds.*", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.scheduler", - writeRelabelConfigs: [ - { - action: "keep", - regex: "kube-scheduler;scheduler_(?:e2e_scheduling|binding|scheduling_algorithm)_latency_microseconds.*", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.apiserver", - writeRelabelConfigs: [ - { - action: "keep", - regex: "apiserver;(?:apiserver_request_count|apiserver_request_latencies.*|etcd_request_cache_get_latencies_summary.*|etcd_request_cache_add_latencies_summary.*|etcd_helper_cache_hit_count|etcd_helper_cache_miss_count)", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.kubelet", - writeRelabelConfigs: [ - { - action: "keep", - regex: "kubelet;(?:kubelet_docker_operations_errors|kubelet_docker_operations_latency_microseconds|kubelet_running_container_count|kubelet_running_pod_count|kubelet_runtime_operations_latency_microseconds.*)", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.container", - writeRelabelConfigs: [ - { - action: "labelmap", - regex: "container_name", - replacement: "container" - }, - { - action: "drop", - regex: "POD", - sourceLabels: [ - "container" - ] - }, - { - action: "keep", - regex: "kubelet;.+;(?:container_cpu_load_average_10s|container_cpu_system_seconds_total|container_cpu_usage_seconds_total|container_cpu_cfs_throttled_seconds_total|container_memory_usage_bytes|container_memory_swap|container_memory_working_set_bytes|container_spec_memory_limit_bytes|container_spec_memory_swap_limit_bytes|container_spec_memory_reservation_limit_bytes|container_spec_cpu_quota|container_spec_cpu_period|container_fs_usage_bytes|container_fs_limit_bytes|container_fs_reads_bytes_total|container_fs_writes_bytes_total|)", - sourceLabels: [ - "job", - "container", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.container", - writeRelabelConfigs: [ - { - action: "keep", - regex: "kubelet;(?:container_network_receive_bytes_total|container_network_transmit_bytes_total|container_network_receive_errors_total|container_network_transmit_errors_total|container_network_receive_packets_dropped_total|container_network_transmit_packets_dropped_total)", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.node", - writeRelabelConfigs: [ - { - action: "keep", - regex: "node-exporter;(?:node_load1|node_load5|node_load15|node_cpu_seconds_total|node_memory_MemAvailable_bytes|node_memory_MemTotal_bytes|node_memory_Buffers_bytes|node_memory_SwapCached_bytes|node_memory_Cached_bytes|node_memory_MemFree_bytes|node_memory_SwapFree_bytes|node_ipvs_incoming_bytes_total|node_ipvs_outgoing_bytes_total|node_ipvs_incoming_packets_total|node_ipvs_outgoing_packets_total|node_disk_reads_completed_total|node_disk_writes_completed_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_filesystem_avail_bytes|node_filesystem_free_bytes|node_filesystem_size_bytes|node_filesystem_files)", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.operator.rule", - writeRelabelConfigs: [ - { - action: "keep", - regex: "cluster_quantile:apiserver_request_latencies:histogram_quantile|instance:node_cpu:rate:sum|instance:node_filesystem_usage:sum|instance:node_network_receive_bytes:rate:sum|instance:node_network_transmit_bytes:rate:sum|instance:node_cpu:ratio|cluster:node_cpu:sum_rate5m|cluster:node_cpu:ratio|cluster_quantile:scheduler_e2e_scheduling_latency:histogram_quantile|cluster_quantile:scheduler_scheduling_algorithm_latency:histogram_quantile|cluster_quantile:scheduler_binding_latency:histogram_quantile|node_namespace_pod:kube_pod_info:|:kube_pod_info_node_count:|node:node_num_cpu:sum|:node_cpu_utilisation:avg1m|node:node_cpu_utilisation:avg1m|node:cluster_cpu_utilisation:ratio|:node_cpu_saturation_load1:|node:node_cpu_saturation_load1:|:node_memory_utilisation:|:node_memory_MemFreeCachedBuffers_bytes:sum|:node_memory_MemTotal_bytes:sum|node:node_memory_bytes_available:sum|node:node_memory_bytes_total:sum|node:node_memory_utilisation:ratio|node:cluster_memory_utilisation:ratio|:node_memory_swap_io_bytes:sum_rate|node:node_memory_utilisation:|node:node_memory_utilisation_2:|node:node_memory_swap_io_bytes:sum_rate|:node_disk_utilisation:avg_irate|node:node_disk_utilisation:avg_irate|:node_disk_saturation:avg_irate|node:node_disk_saturation:avg_irate|node:node_filesystem_usage:|node:node_filesystem_avail:|:node_net_utilisation:sum_irate|node:node_net_utilisation:sum_irate|:node_net_saturation:sum_irate|node:node_net_saturation:sum_irate|node:node_inodes_total:|node:node_inodes_free:", - sourceLabels: [ - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics", - writeRelabelConfigs: [ - { - action: "keep", - regex: "(?:up|prometheus_remote_storage_.*|fluentd_.*|fluentbit.*)", - sourceLabels: [ - "__name__" - ] - } - ] - } - ], + remoteWriteConfigs+: , }, prometheus+:: { prometheus+: { From e551db9e38e5a0d284e3ef2355e85737a0cd4120 Mon Sep 17 00:00:00 2001 From: Vijit Singhal <56007827+vsinghal13@users.noreply.github.com> Date: Tue, 11 Feb 2020 10:39:32 -0800 Subject: [PATCH 4/6] Update kube-prometheus-sumo-logic-mixin.libsonnet --- ...kube-prometheus-sumo-logic-mixin.libsonnet | 144 +++++++++++++++++- 1 file changed, 143 insertions(+), 1 deletion(-) diff --git a/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet b/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet index 7e28cc3bfe..22b0fa582c 100644 --- a/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet +++ b/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet @@ -4,7 +4,149 @@ clusterName: "kubernetes" }, sumologicCollector:: { - remoteWriteConfigs+: , + remoteWriteConfigs+: [ + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.state", + writeRelabelConfigs: [ + { + action: "keep", + regex: "kube-state-metrics;(?:kube_statefulset_status_observed_generation|kube_statefulset_status_replicas|kube_statefulset_replicas|kube_statefulset_metadata_generation|kube_daemonset_status_current_number_scheduled|kube_daemonset_status_desired_number_scheduled|kube_daemonset_status_number_misscheduled|kube_daemonset_status_number_unavailable|kube_daemonset_metadata_generation|kube_deployment_metadata_generation|kube_deployment_spec_paused|kube_deployment_spec_replicas|kube_deployment_spec_strategy_rollingupdate_max_unavailable|kube_deployment_status_replicas_available|kube_deployment_status_observed_generation|kube_deployment_status_replicas_unavailable|kube_node_info|kube_node_spec_unschedulable|kube_node_status_allocatable|kube_node_status_capacity|kube_node_status_condition|kube_pod_container_info|kube_pod_container_resource_requests|kube_pod_container_resource_limits|kube_pod_container_status_ready|kube_pod_container_status_terminated_reason|kube_pod_container_status_waiting_reason|kube_pod_container_status_restarts_total|kube_pod_status_phase)", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.controller-manager", + writeRelabelConfigs: [ + { + action: "keep", + regex: "kubelet;cloudprovider_.*_api_request_duration_seconds.*", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.scheduler", + writeRelabelConfigs: [ + { + action: "keep", + regex: "kube-scheduler;scheduler_(?:e2e_scheduling|binding|scheduling_algorithm)_latency_microseconds.*", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.apiserver", + writeRelabelConfigs: [ + { + action: "keep", + regex: "apiserver;(?:apiserver_request_count|apiserver_request_latencies.*|etcd_request_cache_get_latencies_summary.*|etcd_request_cache_add_latencies_summary.*|etcd_helper_cache_hit_count|etcd_helper_cache_miss_count)", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.kubelet", + writeRelabelConfigs: [ + { + action: "keep", + regex: "kubelet;(?:kubelet_docker_operations_errors|kubelet_docker_operations_latency_microseconds|kubelet_running_container_count|kubelet_running_pod_count|kubelet_runtime_operations_latency_microseconds.*)", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.container", + writeRelabelConfigs: [ + { + action: "labelmap", + regex: "container_name", + replacement: "container" + }, + { + action: "drop", + regex: "POD", + sourceLabels: [ + "container" + ] + }, + { + action: "keep", + regex: "kubelet;.+;(?:container_cpu_load_average_10s|container_cpu_system_seconds_total|container_cpu_usage_seconds_total|container_cpu_cfs_throttled_seconds_total|container_memory_usage_bytes|container_memory_swap|container_memory_working_set_bytes|container_spec_memory_limit_bytes|container_spec_memory_swap_limit_bytes|container_spec_memory_reservation_limit_bytes|container_spec_cpu_quota|container_spec_cpu_period|container_fs_usage_bytes|container_fs_limit_bytes|container_fs_reads_bytes_total|container_fs_writes_bytes_total|)", + sourceLabels: [ + "job", + "container", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.container", + writeRelabelConfigs: [ + { + action: "keep", + regex: "kubelet;(?:container_network_receive_bytes_total|container_network_transmit_bytes_total|container_network_receive_errors_total|container_network_transmit_errors_total|container_network_receive_packets_dropped_total|container_network_transmit_packets_dropped_total)", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.node", + writeRelabelConfigs: [ + { + action: "keep", + regex: "node-exporter;(?:node_load1|node_load5|node_load15|node_cpu_seconds_total|node_memory_MemAvailable_bytes|node_memory_MemTotal_bytes|node_memory_Buffers_bytes|node_memory_SwapCached_bytes|node_memory_Cached_bytes|node_memory_MemFree_bytes|node_memory_SwapFree_bytes|node_ipvs_incoming_bytes_total|node_ipvs_outgoing_bytes_total|node_ipvs_incoming_packets_total|node_ipvs_outgoing_packets_total|node_disk_reads_completed_total|node_disk_writes_completed_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_filesystem_avail_bytes|node_filesystem_free_bytes|node_filesystem_size_bytes|node_filesystem_files)", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.operator.rule", + writeRelabelConfigs: [ + { + action: "keep", + regex: "cluster_quantile:apiserver_request_latencies:histogram_quantile|instance:node_cpu:rate:sum|instance:node_filesystem_usage:sum|instance:node_network_receive_bytes:rate:sum|instance:node_network_transmit_bytes:rate:sum|instance:node_cpu:ratio|cluster:node_cpu:sum_rate5m|cluster:node_cpu:ratio|cluster_quantile:scheduler_e2e_scheduling_latency:histogram_quantile|cluster_quantile:scheduler_scheduling_algorithm_latency:histogram_quantile|cluster_quantile:scheduler_binding_latency:histogram_quantile|node_namespace_pod:kube_pod_info:|:kube_pod_info_node_count:|node:node_num_cpu:sum|:node_cpu_utilisation:avg1m|node:node_cpu_utilisation:avg1m|node:cluster_cpu_utilisation:ratio|:node_cpu_saturation_load1:|node:node_cpu_saturation_load1:|:node_memory_utilisation:|:node_memory_MemFreeCachedBuffers_bytes:sum|:node_memory_MemTotal_bytes:sum|node:node_memory_bytes_available:sum|node:node_memory_bytes_total:sum|node:node_memory_utilisation:ratio|node:cluster_memory_utilisation:ratio|:node_memory_swap_io_bytes:sum_rate|node:node_memory_utilisation:|node:node_memory_utilisation_2:|node:node_memory_swap_io_bytes:sum_rate|:node_disk_utilisation:avg_irate|node:node_disk_utilisation:avg_irate|:node_disk_saturation:avg_irate|node:node_disk_saturation:avg_irate|node:node_filesystem_usage:|node:node_filesystem_avail:|:node_net_utilisation:sum_irate|node:node_net_utilisation:sum_irate|:node_net_saturation:sum_irate|node:node_net_saturation:sum_irate|node:node_inodes_total:|node:node_inodes_free:", + sourceLabels: [ + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics", + writeRelabelConfigs: [ + { + action: "keep", + regex: "(?:up|prometheus_remote_storage_.*|fluentd_.*|fluentbit.*)", + sourceLabels: [ + "__name__" + ] + } + ] + } + ], }, prometheus+:: { prometheus+: { From cd52aeaf753b6368ac7e37b61882a83765f0a922 Mon Sep 17 00:00:00 2001 From: Travis CI Date: Tue, 11 Feb 2020 18:43:57 +0000 Subject: [PATCH 5/6] Generate new overrides yaml/libsonnet file(s). --- ...kube-prometheus-sumo-logic-mixin.libsonnet | 144 +----------------- 1 file changed, 1 insertion(+), 143 deletions(-) diff --git a/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet b/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet index 22b0fa582c..7e28cc3bfe 100644 --- a/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet +++ b/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet @@ -4,149 +4,7 @@ clusterName: "kubernetes" }, sumologicCollector:: { - remoteWriteConfigs+: [ - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.state", - writeRelabelConfigs: [ - { - action: "keep", - regex: "kube-state-metrics;(?:kube_statefulset_status_observed_generation|kube_statefulset_status_replicas|kube_statefulset_replicas|kube_statefulset_metadata_generation|kube_daemonset_status_current_number_scheduled|kube_daemonset_status_desired_number_scheduled|kube_daemonset_status_number_misscheduled|kube_daemonset_status_number_unavailable|kube_daemonset_metadata_generation|kube_deployment_metadata_generation|kube_deployment_spec_paused|kube_deployment_spec_replicas|kube_deployment_spec_strategy_rollingupdate_max_unavailable|kube_deployment_status_replicas_available|kube_deployment_status_observed_generation|kube_deployment_status_replicas_unavailable|kube_node_info|kube_node_spec_unschedulable|kube_node_status_allocatable|kube_node_status_capacity|kube_node_status_condition|kube_pod_container_info|kube_pod_container_resource_requests|kube_pod_container_resource_limits|kube_pod_container_status_ready|kube_pod_container_status_terminated_reason|kube_pod_container_status_waiting_reason|kube_pod_container_status_restarts_total|kube_pod_status_phase)", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.controller-manager", - writeRelabelConfigs: [ - { - action: "keep", - regex: "kubelet;cloudprovider_.*_api_request_duration_seconds.*", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.scheduler", - writeRelabelConfigs: [ - { - action: "keep", - regex: "kube-scheduler;scheduler_(?:e2e_scheduling|binding|scheduling_algorithm)_latency_microseconds.*", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.apiserver", - writeRelabelConfigs: [ - { - action: "keep", - regex: "apiserver;(?:apiserver_request_count|apiserver_request_latencies.*|etcd_request_cache_get_latencies_summary.*|etcd_request_cache_add_latencies_summary.*|etcd_helper_cache_hit_count|etcd_helper_cache_miss_count)", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.kubelet", - writeRelabelConfigs: [ - { - action: "keep", - regex: "kubelet;(?:kubelet_docker_operations_errors|kubelet_docker_operations_latency_microseconds|kubelet_running_container_count|kubelet_running_pod_count|kubelet_runtime_operations_latency_microseconds.*)", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.container", - writeRelabelConfigs: [ - { - action: "labelmap", - regex: "container_name", - replacement: "container" - }, - { - action: "drop", - regex: "POD", - sourceLabels: [ - "container" - ] - }, - { - action: "keep", - regex: "kubelet;.+;(?:container_cpu_load_average_10s|container_cpu_system_seconds_total|container_cpu_usage_seconds_total|container_cpu_cfs_throttled_seconds_total|container_memory_usage_bytes|container_memory_swap|container_memory_working_set_bytes|container_spec_memory_limit_bytes|container_spec_memory_swap_limit_bytes|container_spec_memory_reservation_limit_bytes|container_spec_cpu_quota|container_spec_cpu_period|container_fs_usage_bytes|container_fs_limit_bytes|container_fs_reads_bytes_total|container_fs_writes_bytes_total|)", - sourceLabels: [ - "job", - "container", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.container", - writeRelabelConfigs: [ - { - action: "keep", - regex: "kubelet;(?:container_network_receive_bytes_total|container_network_transmit_bytes_total|container_network_receive_errors_total|container_network_transmit_errors_total|container_network_receive_packets_dropped_total|container_network_transmit_packets_dropped_total)", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.node", - writeRelabelConfigs: [ - { - action: "keep", - regex: "node-exporter;(?:node_load1|node_load5|node_load15|node_cpu_seconds_total|node_memory_MemAvailable_bytes|node_memory_MemTotal_bytes|node_memory_Buffers_bytes|node_memory_SwapCached_bytes|node_memory_Cached_bytes|node_memory_MemFree_bytes|node_memory_SwapFree_bytes|node_ipvs_incoming_bytes_total|node_ipvs_outgoing_bytes_total|node_ipvs_incoming_packets_total|node_ipvs_outgoing_packets_total|node_disk_reads_completed_total|node_disk_writes_completed_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_filesystem_avail_bytes|node_filesystem_free_bytes|node_filesystem_size_bytes|node_filesystem_files)", - sourceLabels: [ - "job", - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics.operator.rule", - writeRelabelConfigs: [ - { - action: "keep", - regex: "cluster_quantile:apiserver_request_latencies:histogram_quantile|instance:node_cpu:rate:sum|instance:node_filesystem_usage:sum|instance:node_network_receive_bytes:rate:sum|instance:node_network_transmit_bytes:rate:sum|instance:node_cpu:ratio|cluster:node_cpu:sum_rate5m|cluster:node_cpu:ratio|cluster_quantile:scheduler_e2e_scheduling_latency:histogram_quantile|cluster_quantile:scheduler_scheduling_algorithm_latency:histogram_quantile|cluster_quantile:scheduler_binding_latency:histogram_quantile|node_namespace_pod:kube_pod_info:|:kube_pod_info_node_count:|node:node_num_cpu:sum|:node_cpu_utilisation:avg1m|node:node_cpu_utilisation:avg1m|node:cluster_cpu_utilisation:ratio|:node_cpu_saturation_load1:|node:node_cpu_saturation_load1:|:node_memory_utilisation:|:node_memory_MemFreeCachedBuffers_bytes:sum|:node_memory_MemTotal_bytes:sum|node:node_memory_bytes_available:sum|node:node_memory_bytes_total:sum|node:node_memory_utilisation:ratio|node:cluster_memory_utilisation:ratio|:node_memory_swap_io_bytes:sum_rate|node:node_memory_utilisation:|node:node_memory_utilisation_2:|node:node_memory_swap_io_bytes:sum_rate|:node_disk_utilisation:avg_irate|node:node_disk_utilisation:avg_irate|:node_disk_saturation:avg_irate|node:node_disk_saturation:avg_irate|node:node_filesystem_usage:|node:node_filesystem_avail:|:node_net_utilisation:sum_irate|node:node_net_utilisation:sum_irate|:node_net_saturation:sum_irate|node:node_net_saturation:sum_irate|node:node_inodes_total:|node:node_inodes_free:", - sourceLabels: [ - "__name__" - ] - } - ] - }, - { - url: $._config.sumologicCollectorSvc + "prometheus.metrics", - writeRelabelConfigs: [ - { - action: "keep", - regex: "(?:up|prometheus_remote_storage_.*|fluentd_.*|fluentbit.*)", - sourceLabels: [ - "__name__" - ] - } - ] - } - ], + remoteWriteConfigs+: , }, prometheus+:: { prometheus+: { From f85bd93196429bac5c9da74caed278fc9a963e8c Mon Sep 17 00:00:00 2001 From: Vijit Singhal <56007827+vsinghal13@users.noreply.github.com> Date: Tue, 11 Feb 2020 19:36:24 -0800 Subject: [PATCH 6/6] Update kube-prometheus-sumo-logic-mixin.libsonnet --- ...kube-prometheus-sumo-logic-mixin.libsonnet | 144 +++++++++++++++++- 1 file changed, 143 insertions(+), 1 deletion(-) diff --git a/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet b/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet index 7e28cc3bfe..22b0fa582c 100644 --- a/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet +++ b/deploy/kubernetes/kube-prometheus-sumo-logic-mixin.libsonnet @@ -4,7 +4,149 @@ clusterName: "kubernetes" }, sumologicCollector:: { - remoteWriteConfigs+: , + remoteWriteConfigs+: [ + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.state", + writeRelabelConfigs: [ + { + action: "keep", + regex: "kube-state-metrics;(?:kube_statefulset_status_observed_generation|kube_statefulset_status_replicas|kube_statefulset_replicas|kube_statefulset_metadata_generation|kube_daemonset_status_current_number_scheduled|kube_daemonset_status_desired_number_scheduled|kube_daemonset_status_number_misscheduled|kube_daemonset_status_number_unavailable|kube_daemonset_metadata_generation|kube_deployment_metadata_generation|kube_deployment_spec_paused|kube_deployment_spec_replicas|kube_deployment_spec_strategy_rollingupdate_max_unavailable|kube_deployment_status_replicas_available|kube_deployment_status_observed_generation|kube_deployment_status_replicas_unavailable|kube_node_info|kube_node_spec_unschedulable|kube_node_status_allocatable|kube_node_status_capacity|kube_node_status_condition|kube_pod_container_info|kube_pod_container_resource_requests|kube_pod_container_resource_limits|kube_pod_container_status_ready|kube_pod_container_status_terminated_reason|kube_pod_container_status_waiting_reason|kube_pod_container_status_restarts_total|kube_pod_status_phase)", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.controller-manager", + writeRelabelConfigs: [ + { + action: "keep", + regex: "kubelet;cloudprovider_.*_api_request_duration_seconds.*", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.scheduler", + writeRelabelConfigs: [ + { + action: "keep", + regex: "kube-scheduler;scheduler_(?:e2e_scheduling|binding|scheduling_algorithm)_latency_microseconds.*", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.apiserver", + writeRelabelConfigs: [ + { + action: "keep", + regex: "apiserver;(?:apiserver_request_count|apiserver_request_latencies.*|etcd_request_cache_get_latencies_summary.*|etcd_request_cache_add_latencies_summary.*|etcd_helper_cache_hit_count|etcd_helper_cache_miss_count)", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.kubelet", + writeRelabelConfigs: [ + { + action: "keep", + regex: "kubelet;(?:kubelet_docker_operations_errors|kubelet_docker_operations_latency_microseconds|kubelet_running_container_count|kubelet_running_pod_count|kubelet_runtime_operations_latency_microseconds.*)", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.container", + writeRelabelConfigs: [ + { + action: "labelmap", + regex: "container_name", + replacement: "container" + }, + { + action: "drop", + regex: "POD", + sourceLabels: [ + "container" + ] + }, + { + action: "keep", + regex: "kubelet;.+;(?:container_cpu_load_average_10s|container_cpu_system_seconds_total|container_cpu_usage_seconds_total|container_cpu_cfs_throttled_seconds_total|container_memory_usage_bytes|container_memory_swap|container_memory_working_set_bytes|container_spec_memory_limit_bytes|container_spec_memory_swap_limit_bytes|container_spec_memory_reservation_limit_bytes|container_spec_cpu_quota|container_spec_cpu_period|container_fs_usage_bytes|container_fs_limit_bytes|container_fs_reads_bytes_total|container_fs_writes_bytes_total|)", + sourceLabels: [ + "job", + "container", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.container", + writeRelabelConfigs: [ + { + action: "keep", + regex: "kubelet;(?:container_network_receive_bytes_total|container_network_transmit_bytes_total|container_network_receive_errors_total|container_network_transmit_errors_total|container_network_receive_packets_dropped_total|container_network_transmit_packets_dropped_total)", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.node", + writeRelabelConfigs: [ + { + action: "keep", + regex: "node-exporter;(?:node_load1|node_load5|node_load15|node_cpu_seconds_total|node_memory_MemAvailable_bytes|node_memory_MemTotal_bytes|node_memory_Buffers_bytes|node_memory_SwapCached_bytes|node_memory_Cached_bytes|node_memory_MemFree_bytes|node_memory_SwapFree_bytes|node_ipvs_incoming_bytes_total|node_ipvs_outgoing_bytes_total|node_ipvs_incoming_packets_total|node_ipvs_outgoing_packets_total|node_disk_reads_completed_total|node_disk_writes_completed_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_filesystem_avail_bytes|node_filesystem_free_bytes|node_filesystem_size_bytes|node_filesystem_files)", + sourceLabels: [ + "job", + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics.operator.rule", + writeRelabelConfigs: [ + { + action: "keep", + regex: "cluster_quantile:apiserver_request_latencies:histogram_quantile|instance:node_cpu:rate:sum|instance:node_filesystem_usage:sum|instance:node_network_receive_bytes:rate:sum|instance:node_network_transmit_bytes:rate:sum|instance:node_cpu:ratio|cluster:node_cpu:sum_rate5m|cluster:node_cpu:ratio|cluster_quantile:scheduler_e2e_scheduling_latency:histogram_quantile|cluster_quantile:scheduler_scheduling_algorithm_latency:histogram_quantile|cluster_quantile:scheduler_binding_latency:histogram_quantile|node_namespace_pod:kube_pod_info:|:kube_pod_info_node_count:|node:node_num_cpu:sum|:node_cpu_utilisation:avg1m|node:node_cpu_utilisation:avg1m|node:cluster_cpu_utilisation:ratio|:node_cpu_saturation_load1:|node:node_cpu_saturation_load1:|:node_memory_utilisation:|:node_memory_MemFreeCachedBuffers_bytes:sum|:node_memory_MemTotal_bytes:sum|node:node_memory_bytes_available:sum|node:node_memory_bytes_total:sum|node:node_memory_utilisation:ratio|node:cluster_memory_utilisation:ratio|:node_memory_swap_io_bytes:sum_rate|node:node_memory_utilisation:|node:node_memory_utilisation_2:|node:node_memory_swap_io_bytes:sum_rate|:node_disk_utilisation:avg_irate|node:node_disk_utilisation:avg_irate|:node_disk_saturation:avg_irate|node:node_disk_saturation:avg_irate|node:node_filesystem_usage:|node:node_filesystem_avail:|:node_net_utilisation:sum_irate|node:node_net_utilisation:sum_irate|:node_net_saturation:sum_irate|node:node_net_saturation:sum_irate|node:node_inodes_total:|node:node_inodes_free:", + sourceLabels: [ + "__name__" + ] + } + ] + }, + { + url: $._config.sumologicCollectorSvc + "prometheus.metrics", + writeRelabelConfigs: [ + { + action: "keep", + regex: "(?:up|prometheus_remote_storage_.*|fluentd_.*|fluentbit.*)", + sourceLabels: [ + "__name__" + ] + } + ] + } + ], }, prometheus+:: { prometheus+: {