Tighter coupling of task metadata to associated worker metrics

As part of the work on [dask-sql](https://github.com/dask-contrib/dask-sql), there has been some demand for machine readable logs of worker metrics (such as GPU utilization / memory usage) coupled with the tasks these workers are currently running / have recently run (along with some additional metadata of these tasks, such as when they were scheduled/started/completed); with this data readily available, it would be easier to diagnose why certain tasks are bottlenecks in a given computation by tracking what was happening with the worker while the task was running.

To give an idea of what might be wanted here, some RAPIDS folk have developed and are currently using [dask-metrics](https://github.com/rapidsai/rapids-examples/tree/main/dask-metrics) for this purpose, which is able to generate per-worker CSV files containing this information (with only GPU-relevant metrics). @jakirkham also suggested adding something like an "N slowest running tasks" table to the performance reports, although I think we would want the granular data as well.

For context, all of this information is readily available through the scheduler, though it would need to be merged together manually:

```python
cluster.scheduler.get_task_stream()  # gives us task metadata along with workers running the tasks
await cluster.scheduler.get_worker_monitor_info()  # gives us timestamped worker metrics 
```

Some options I've considered for this:

- Adding task metadata to the `SystemMonitor` or `WorkerState` metrics; not sure if/how this could be done but would make it easier to stream this data somewhere
- Adding a scheduler function to merge the existing task/worker metadata and return it in a machine readable format

It would be nice to have some discussion on if this is doable and worthwhile for troubleshooting performance in Distributed.

cc @randerzander

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tighter coupling of task metadata to associated worker metrics #5288

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tighter coupling of task metadata to associated worker metrics #5288

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions