Extend prometheus metrics endpoint (#2792) #2833

gsailer · 2019-07-12T12:10:15Z

This PR introduces the following prometheus metrics:

scheduler
- number of tasks that were received
- number of unrunnable tasks
worker
- tasks: stored, ready, waiting and serving
- number of connections to other workers
- number of worker threads
- latency
- median tick duration
- median task duration
- median transfer bandwidth

The requirement of crick for the digest-based metrics is handled by checking if crick is available.
If crick is not available there is a log message on loglevel info regarding the missing crick and the metrics, which require crick are not exposed.

Regarding what @mrocklin mentioned in the issue(#2792) about the digest metrics not being exposed individually. The problem here is that the metrics names would not be compliant with prometheus naming. Additionally expressive descriptions would not be available either if the metrics were not exposed individually.

Number of tasks in states and number of threads are exposed on the workers /metrics endpoints.

mrocklin · 2019-07-12T15:58:51Z

@jacobtomlinson would you mind reviewing this?

jacobtomlinson

This all seems reasonable to me. Thanks for putting in the effort!

I've made a couple of comments but generally happy.

distributed/dashboard/worker_html.py

jacobtomlinson · 2019-07-15T11:45:30Z

distributed/dashboard/worker_html.py

-        #         'Number of connections currently open.',
-        #         value=???,
-        #     )
+        from prometheus_client.core import GaugeMetricFamily


Switching to importing here seems reasonable, but does this mean there is now a stray unused import somewhere?

Previously the import was done at the init of PrometheusHandler when the PrometheusCollector is registered in the registry.
All imports which are there are currently still required in the PrometheusHandler, which was the only point the changed _PrometheusCollector class is used.
So there is no dangling import for prometheus_client anywhere in my opinion.

jacobtomlinson

This all looks good. Thanks again!

mrocklin · 2019-07-16T15:24:05Z

Merging this in. Thanks @sublinus ! Thanks also to @jacobtomlinson for the review.

Also, I notice that this is your first code contribution to this repository. Welcome!

* upstream/master: (33 commits) SpecCluster: move init logic into start (dask#2850) Dont reuse closed worker in get_worker (dask#2841) Add alternative SSHCluster implementation (dask#2827) Extend prometheus metrics endpoint (dask#2792) (dask#2833) Include type name in SpecCluster repr (dask#2834) Don't make False add-keys report to scheduler (dask#2421) Add Nanny to worker docs (dask#2826) Respect security configuration in LocalCluster (dask#2822) bump version to 2.1.0 Fix typo that prevented error message (dask#2825) Remove dask-mpi (dask#2824) Updates to use update_graph in task journey docs (dask#2821) Fix Client repr with memory_info=None (dask#2816) Fix case where key, rather than TaskState, could end up in ts.waiting_on (dask#2819) Use Keyword-only arguments (dask#2814) Relax check for worker references in cluster context manager (dask#2813) Add HTTPS support for the dashboard (dask#2812) CLN: Use dask.utils.format_bytes (dask#2810) bump version to 2.0.1 Add python_requires entry to setup.py (dask#2807) ...

gsailer added 5 commits July 12, 2019 14:46

Expose tasks prometheus metric at scheduler

0cad98d

Add basic task metrics to worker

fe2c0e0

Number of tasks in states and number of threads are exposed on the workers /metrics endpoints.

Add worker metrics and reformat tasks

4a0bd2a

Change prometheus worker test to check for a specific metric

f4d7bf2

Change unused import handling to linter ignore notation

eb8e591

gsailer force-pushed the prometheus-metrics branch from 2d42cfb to eb8e591 Compare July 12, 2019 12:46

jacobtomlinson approved these changes Jul 15, 2019

View reviewed changes

Change log mesage in case of missing crick

b044bc4

jacobtomlinson approved these changes Jul 16, 2019

View reviewed changes

mrocklin merged commit af64e07 into dask:master Jul 16, 2019

gsailer deleted the prometheus-metrics branch July 24, 2019 15:19

arpit1997 mentioned this pull request Sep 18, 2019

document prometheus support and available metrics #3065

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend prometheus metrics endpoint (#2792) #2833

Extend prometheus metrics endpoint (#2792) #2833

gsailer commented Jul 12, 2019 •

edited

Loading

mrocklin commented Jul 12, 2019

jacobtomlinson left a comment

jacobtomlinson Jul 15, 2019

gsailer Jul 16, 2019

jacobtomlinson left a comment

mrocklin commented Jul 16, 2019

Extend prometheus metrics endpoint (#2792) #2833

Extend prometheus metrics endpoint (#2792) #2833

Conversation

gsailer commented Jul 12, 2019 • edited Loading

mrocklin commented Jul 12, 2019

jacobtomlinson left a comment

Choose a reason for hiding this comment

jacobtomlinson Jul 15, 2019

Choose a reason for hiding this comment

gsailer Jul 16, 2019

Choose a reason for hiding this comment

jacobtomlinson left a comment

Choose a reason for hiding this comment

mrocklin commented Jul 16, 2019

gsailer commented Jul 12, 2019 •

edited

Loading