-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend prometheus metrics endpoint (#2792) #2833
Conversation
Number of tasks in states and number of threads are exposed on the workers /metrics endpoints.
2d42cfb
to
eb8e591
Compare
@jacobtomlinson would you mind reviewing this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all seems reasonable to me. Thanks for putting in the effort!
I've made a couple of comments but generally happy.
# 'Number of connections currently open.', | ||
# value=???, | ||
# ) | ||
from prometheus_client.core import GaugeMetricFamily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching to importing here seems reasonable, but does this mean there is now a stray unused import somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the import was done at the init of PrometheusHandler when the PrometheusCollector is registered in the registry.
All imports which are there are currently still required in the PrometheusHandler, which was the only point the changed _PrometheusCollector class is used.
So there is no dangling import for prometheus_client anywhere in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good. Thanks again!
Merging this in. Thanks @sublinus ! Thanks also to @jacobtomlinson for the review. Also, I notice that this is your first code contribution to this repository. Welcome! |
* upstream/master: (33 commits) SpecCluster: move init logic into start (dask#2850) Dont reuse closed worker in get_worker (dask#2841) Add alternative SSHCluster implementation (dask#2827) Extend prometheus metrics endpoint (dask#2792) (dask#2833) Include type name in SpecCluster repr (dask#2834) Don't make False add-keys report to scheduler (dask#2421) Add Nanny to worker docs (dask#2826) Respect security configuration in LocalCluster (dask#2822) bump version to 2.1.0 Fix typo that prevented error message (dask#2825) Remove dask-mpi (dask#2824) Updates to use update_graph in task journey docs (dask#2821) Fix Client repr with memory_info=None (dask#2816) Fix case where key, rather than TaskState, could end up in ts.waiting_on (dask#2819) Use Keyword-only arguments (dask#2814) Relax check for worker references in cluster context manager (dask#2813) Add HTTPS support for the dashboard (dask#2812) CLN: Use dask.utils.format_bytes (dask#2810) bump version to 2.0.1 Add python_requires entry to setup.py (dask#2807) ...
This PR introduces the following prometheus metrics:
The requirement of crick for the digest-based metrics is handled by checking if crick is available.
If crick is not available there is a log message on loglevel info regarding the missing crick and the metrics, which require crick are not exposed.
Regarding what @mrocklin mentioned in the issue(#2792) about the digest metrics not being exposed individually. The problem here is that the metrics names would not be compliant with prometheus naming. Additionally expressive descriptions would not be available either if the metrics were not exposed individually.