Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine performance metrics dashboard crashes w/ spill activity #7875

Closed
crusaderky opened this issue Jun 1, 2023 · 0 comments · Fixed by #7878
Closed

Fine performance metrics dashboard crashes w/ spill activity #7875

crusaderky opened this issue Jun 1, 2023 · 0 comments · Fixed by #7878
Assignees

Comments

@crusaderky
Copy link
Collaborator

This code is sized so that it triggers spill/unspill activity:

import dask.array as da
import distributed

client = distributed.Client(n_workers=4, threads_per_worker=1, memory_limit="2 GiB")
a = da.random.random((14_000, 14_000))
b = (a @ a.T).sum()
b.compute()

If I open the Fine Performance Metrics dashboard after I run it, I get a blank page with 500 Internal server error, while on the stderr I get

BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('activity', 11), ('angle', 11), ('color', 0), ('text', 11), ('value', 11)
2023-06-01 15:33:53,229 - bokeh.core.property.validation - ERROR - Keyword argument sequences for broadcasting must all be the same lengths. Got lengths: [0, 11]
Traceback (most recent call last):
  File "/home/crusaderky/github/distributed/distributed/utils.py", line 760, in wrapper
    return func(*args, **kwargs)
  File "/home/crusaderky/github/distributed/distributed/dashboard/components/scheduler.py", line 3510, in update
    task_exec_barchart = self._build_task_execution_by_prefix_chart(
  File "/home/crusaderky/github/distributed/distributed/dashboard/components/scheduler.py", line 3598, in _build_task_execution_by_prefix_chart
    renderers = barchart.vbar_stack(
  File "/home/crusaderky/miniconda3/envs/distributed39/lib/python3.9/site-packages/bokeh/plotting/figure.py", line 588, in vbar_stack
    for kw in double_stack(stackers, "bottom", "top", **kw):
  File "/home/crusaderky/miniconda3/envs/distributed39/lib/python3.9/site-packages/bokeh/plotting/_stack.py", line 83, in double_stack
    raise ValueError("Keyword argument sequences for broadcasting must all be the same lengths. Got lengths: %r" % sorted(list(lengths)))
ValueError: Keyword argument sequences for broadcasting must all be the same lengths. Got lengths: [0, 11]

I reproduced the above both with bokeh 2 and bokeh 3.

The problem disappears if I size either my problem or the cluster so that spill/unspill events are never generated.
I'm attaching a trimmed-down dump of the metrics, which will allow you to reproduce it deterministically. Just copy-paste the below in a jupyter notebook and open the fine performance metrics dashboard page:

import distributed

client = distributed.Client()
metrics = {
    ("execute", "chunk_sum-aggregate-sum", "disk-read", "seconds"): 1.0051379720016484,
    ("execute", "chunk_sum-aggregate-sum", "disk-read", "count"): 16.0,
    ("execute", "chunk_sum-aggregate-sum", "disk-read", "bytes"): 2059931767.0,
    ("execute", "chunk_sum-aggregate-sum", "disk-write", "seconds"): 0.1692888050001784,
    ("execute", "chunk_sum-aggregate-sum", "disk-write", "count"): 2.0,
    ("execute", "chunk_sum-aggregate-sum", "disk-write", "bytes"): 268435938.0,
}
client.cluster.scheduler.cumulative_worker_metrics.clear()
client.cluster.scheduler.cumulative_worker_metrics.update(metrics)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants