diff --git a/.changelog/3140.added.txt b/.changelog/3140.added.txt new file mode 100644 index 0000000000..81ca362ced --- /dev/null +++ b/.changelog/3140.added.txt @@ -0,0 +1 @@ +docs(prometheus): Added a section on prometheus sharding \ No newline at end of file diff --git a/docs/prometheus.md b/docs/prometheus.md index a7893bcb12..d95bb3a960 100644 --- a/docs/prometheus.md +++ b/docs/prometheus.md @@ -20,6 +20,7 @@ installed. - [Using existing Kube Prometheus Stack](#using-existing-kube-prometheus-stack) - [Build Prometheus Configuration](#build-prometheus-configuration) - [Using a load balancing proxy for Prometheus remote write](#using-a-load-balancing-proxy-for-prometheus-remote-write) +- [Horizontal Scaling (Sharding)](#horizontal-scaling-sharding) - [Troubleshooting](#troubleshooting) - [UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com"](#upgrade-failed-failed-to-create-resource-internal-error-occurred-failed-calling-webhook-prometheusrulemutatemonitoringcoreoscom) - [Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(Prometheus.spec)](#error-unable-to-build-kubernetes-objects-from-release-manifest-error-validating--error-validating-data-validationerrorprometheusspec) @@ -351,6 +352,29 @@ intervention to scale. A simpler alternative is to put a HTTP load balancer between Prometheus and the metrics metadata Service. This is enabled in `values.yaml` via the `sumologic.metrics.remoteWriteProxy.enabled` key. +## Horizontal Scaling (Sharding) + +Horizontal scaling, also known as sharding, is supported by setting up a configuration parameter which allows running several prometheus +servers in agent mode to gather your data. + +To define the number of shards, configure the following parameter under the `kube-prometheus-stack` subchart in the `user-values.yaml` file: + +```yaml +kube-prometheus-stack: + prometheus: + prometheusSpec: + shards: 3 +``` + +For configuring an existing prometheus deployment, please add the following to your `user-values.yaml` file: + +```yaml +prometheusSpec: + shards: 3 +``` + +**Note:** Sharding prometheus servers will cause recording rule metrics which require global aggregations (across nodes) to stop working which may also impact the Kubernetes dashboard. + ## Troubleshooting ### UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com" diff --git a/docs/troubleshoot-collection.md b/docs/troubleshoot-collection.md index d0d4098c7d..167e157151 100644 --- a/docs/troubleshoot-collection.md +++ b/docs/troubleshoot-collection.md @@ -403,7 +403,8 @@ The duplicated pod deletion command is there to make sure the pod is not stuck i ### Out of memory (OOM) failures for Prometheus Pod If you observe that Prometheus Pod needs more and more resources (out of memory failures - OOM killed Prometheus) and you are not able to -increase them then you may need to horizontally scale Prometheus. :construction: Add link to Prometheus sharding doc here. +increase them then you may need to horizontally scale Prometheus. For details please refer to - +[Prometheus sharding](./prometheus.md#horizontal-scaling-sharding). ### Prometheus: server returned HTTP status 404 Not Found: 404 page not found