Skip to content

Commit

Permalink
Add prometheus monitors for schedulers and workers (#689)
Browse files Browse the repository at this point in the history
* Add prometheus monitors for schedulers and workers

* Document enabling prometheus scraping
  • Loading branch information
jacobtomlinson committed Mar 29, 2023
1 parent bbc05ab commit 2906547
Show file tree
Hide file tree
Showing 5 changed files with 162 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,24 @@ The following table lists the configurable parameters of the Dask-kubernetes-ope
| `tolerations` | Tolerations | `[]` |
| `affinity` | Affinity | `{}` |
| `kopfArgs` | Command line flags to pass to kopf on start up | `["--all-namespaces"]` |
| `metrics.scheduler.enabled` | Enable scheduler metrics. Pip package [prometheus-client](https://pypi.org/project/prometheus-client/) should be present on scheduler. | `false` |
| `metrics.scheduler.serviceMonitor.enabled` | Enable scheduler servicemonitor. | `false` |
| `metrics.scheduler.serviceMonitor.namespace` | Deploy servicemonitor in different namespace, e.g. monitoring. | `""` |
| `metrics.scheduler.serviceMonitor.namespaceSelector` | Selector to select which namespaces the Endpoints objects are discovered from. | `{}` |
| `metrics.scheduler.serviceMonitor.additionalLabels` | Additional labels to add to the ServiceMonitor metadata. | `{}` |
| `metrics.scheduler.serviceMonitor.interval` | Interval at which metrics should be scraped. | `"15s"` |
| `metrics.scheduler.serviceMonitor.jobLabel` | The label to use to retrieve the job name from. | `""` |
| `metrics.scheduler.serviceMonitor.targetLabels` | TargetLabels transfers labels on the Kubernetes Service onto the target. | `["dask.org/cluster-name"]` |
| `metrics.scheduler.serviceMonitor.metricRelabelings` | MetricRelabelConfigs to apply to samples before ingestion. | `[]` |
| `metrics.worker.enabled` | Enable workers metrics. Pip package [prometheus-client](https://pypi.org/project/prometheus-client/) should be present on workers. | `false` |
| `metrics.worker.podMonitor.enabled` | Enable workers podmonitor | `false` |
| `metrics.worker.podMonitor.namespace` | Deploy podmonitor in different namespace, e.g. monitoring. | `""` |
| `metrics.worker.podMonitor.namespaceSelector` | Selector to select which namespaces the Endpoints objects are discovered from. | `{}` |
| `metrics.worker.podMonitor.additionalLabels` | Additional labels to add to the PodMonitor metadata. | `{}` |
| `metrics.worker.podMonitor.interval` | Interval at which metrics should be scraped. | `"15s"` |
| `metrics.worker.podMonitor.jobLabel` | The label to use to retrieve the job name from. | `""` |
| `metrics.worker.podMonitor.podTargetLabels` | PodTargetLabels transfers labels on the Kubernetes Pod onto the target. | `["dask.org/cluster-name", "dask.org/workergroup-name"]` |
| `metrics.worker.podMonitor.metricRelabelings` | MetricRelabelConfigs to apply to samples before ingestion. | `[]` |



Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{{- if and .Values.metrics.worker.enabled .Values.metrics.worker.podMonitor.enabled -}}
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: {{ include "dask_kubernetes_operator.fullname" . }}-worker-podmonitor
{{- with .Values.metrics.worker.podMonitor.namespace }}
namespace: {{ . | quote }}
{{- end }}
labels:
{{- include "dask_kubernetes_operator.labels" . | nindent 4 }}
dask.org/component: worker
{{- with .Values.metrics.worker.podMonitor.additionalLabels }}
{{- . | toYaml | nindent 4 }}
{{- end }}
spec:
podMetricsEndpoints:
- interval: {{ .Values.metrics.worker.podMonitor.interval }}
port: http-dashboard
{{- with .Values.metrics.worker.podMonitor.metricRelabelings }}
metricRelabelings:
{{- . | toYaml | nindent 8 }}
{{- end }}
{{- if .Values.metrics.worker.podMonitor.namespaceSelector }}
namespaceSelector:
{{- .Values.metrics.worker.podMonitor.namespaceSelector | toYaml | nindent 4 }}
{{- else }}
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
{{- end }}
{{- with .Values.metrics.worker.podMonitor.jobLabel }}
jobLabel: {{ . }}
{{- end }}
{{- with .Values.metrics.worker.podMonitor.targetLabels }}
podTargetLabels:
{{- . | toYaml | nindent 4 }}
{{- end }}
selector:
matchLabels:
dask.org/component: "worker"
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{{- if and .Values.metrics.scheduler.enabled .Values.metrics.scheduler.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: {{ include "dask_kubernetes_operator.fullname" . }}-scheduler-servicemonitor
{{- with .Values.metrics.scheduler.serviceMonitor.namespace }}
namespace: {{ . | quote }}
{{- end }}
labels:
{{- include "dask_kubernetes_operator.labels" . | nindent 4 }}
dask.org/component: scheduler
{{- with .Values.metrics.scheduler.serviceMonitor.additionalLabels }}
{{- . | toYaml | nindent 4 }}
{{- end }}
spec:
endpoints:
- interval: {{ .Values.metrics.scheduler.serviceMonitor.interval }}
port: http-dashboard
{{- with .Values.metrics.scheduler.serviceMonitor.metricRelabelings }}
metricRelabelings:
{{- . | toYaml . | nindent 8 }}
{{- end }}
{{- if .Values.metrics.scheduler.serviceMonitor.namespaceSelector }}
namespaceSelector:
{{- .Values.metrics.scheduler.serviceMonitor.namespaceSelector | toYaml | nindent 4 }}
{{ else }}
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
{{- end }}
{{- with .Values.metrics.scheduler.serviceMonitor.jobLabel }}
jobLabel: {{ . }}
{{- end }}
{{- with .Values.metrics.scheduler.serviceMonitor.targetLabels }}
targetLabels:
{{- . | toYaml | nindent 4 }}
{{- end }}
selector:
matchLabels:
dask.org/component: "scheduler"
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,39 @@ affinity: {} # Affinity

kopfArgs: # Command line flags to pass to kopf on start up
- --all-namespaces

metrics:
scheduler:
enabled: false # Enable scheduler metrics. Pip package [prometheus-client](https://pypi.org/project/prometheus-client/) should be present on scheduler.
serviceMonitor:
enabled: false # Enable scheduler servicemonitor.
namespace: "" # Deploy servicemonitor in different namespace, e.g. monitoring.
namespaceSelector: {} # Selector to select which namespaces the Endpoints objects are discovered from.
# Default: scrape .Release.Namespace only
# To scrape all, use the following:
# namespaceSelector:
# any: true
additionalLabels: {} # Additional labels to add to the ServiceMonitor metadata.
interval: 15s # Interval at which metrics should be scraped.
jobLabel: "" # The label to use to retrieve the job name from.
targetLabels: # TargetLabels transfers labels on the Kubernetes Service onto the target.
- dask.org/cluster-name
metricRelabelings: [] # MetricRelabelConfigs to apply to samples before ingestion.
worker:
enabled: false # Enable workers metrics. Pip package [prometheus-client](https://pypi.org/project/prometheus-client/) should be present on workers.
podMonitor:
enabled: false # Enable workers podmonitor
namespace: "" # Deploy podmonitor in different namespace, e.g. monitoring.
namespaceSelector: {} # Selector to select which namespaces the Endpoints objects are discovered from.
# Default: scrape .Release.Namespace only
# To scrape all, use the following:
# namespaceSelector:
# any: true
# metrics will apply to the additional worker groups as well
additionalLabels: {} # Additional labels to add to the PodMonitor metadata.
interval: 15s # Interval at which metrics should be scraped.
jobLabel: "" # The label to use to retrieve the job name from.
podTargetLabels: # PodTargetLabels transfers labels on the Kubernetes Pod onto the target.
- dask.org/cluster-name
- dask.org/workergroup-name
metricRelabelings: [] # MetricRelabelConfigs to apply to samples before ingestion.
26 changes: 26 additions & 0 deletions doc/source/operator_installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,32 @@ You can also just install it into a single namespace by setting the following op
NOTES:
Operator has been installed successfully.
Prometheus
^^^^^^^^^^

The operator helm chart also contains some optional `ServiceMonitor` and `PodMonitor` resources to enable Prometheus scraping of Dask components.
As not all clusters have the Prometheus operator installed these are disabled by default. You can enable them with the following comfig options.

.. code-block:: yaml
metrics:
scheduler:
enabled: true
serviceMonitor:
enabled: true
worker:
enabled: true
serviceMonitor:
enabled: true
You'll also need to ensure the container images you choose for your Dask components have the ``prometheus_client`` library installed.
If you're using the official Dask images you can install this at runtime.

.. code-block:: python
from dask_kubernetes.operator import KubeCluster
cluster = KubeCluster(name="monitored", env={"EXTRA_PIP_PACKAGES": "prometheus_client"})
Installing with Manifests
-------------------------

Expand Down

0 comments on commit 2906547

Please sign in to comment.