Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make disk access in system monitor configurable #6537

Merged
merged 2 commits into from
Jun 10, 2022

Conversation

mrocklin
Copy link
Member

@mrocklin mrocklin commented Jun 8, 2022

Disk access is of marginal value, and somewhat of a pain to deal with.
This makes it configurable.

We might also ask ourselves if we should turn it off by default.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrocklin -- I expect CI will complain that we need to add a corresponding entry to distributed-schema.yaml

Disk access is of marginal value, and somewhat of a pain to deal with.
This makes it configurable.

We might also ask ourselves if we should turn it off by default.
@mrocklin
Copy link
Member Author

mrocklin commented Jun 8, 2022 via email

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrocklin, this looks reasonable to me. The only issue I can see is existing code that expects disk information to always be reported. For example, if I go to the /individual-workers-disk-timeseries figure in the dashboard when distributed.admin.system-monitor.disk is set to False I get a 500: Internal Server Error error on the browser and

2022-06-08 16:57:28,292 - tornado.application - ERROR - Uncaught exception GET /individual-workers-disk-timeseries (127.0.0.1)
HTTPServerRequest(protocol='http', host='127.0.0.1:8787', method='GET', uri='/individual-workers-disk-timeseries', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/Users/james/mambaforge/envs/distributed/lib/python3.9/site-packages/tornado/web.py", line 1704, in _execute
    result = await result
  File "/Users/james/mambaforge/envs/distributed/lib/python3.9/site-packages/bokeh/server/views/doc_handler.py", line 54, in get
    session = await self.get_session()
  File "/Users/james/mambaforge/envs/distributed/lib/python3.9/site-packages/bokeh/server/views/session_handler.py", line 144, in get_session
    session = await self.application_context.create_session_if_needed(session_id, self.request, token)
  File "/Users/james/mambaforge/envs/distributed/lib/python3.9/site-packages/bokeh/server/contexts.py", line 243, in create_session_if_needed
    self._application.initialize_document(doc)
  File "/Users/james/mambaforge/envs/distributed/lib/python3.9/site-packages/bokeh/application/application.py", line 194, in initialize_document
    h.modify_document(doc)
  File "/Users/james/mambaforge/envs/distributed/lib/python3.9/site-packages/bokeh/application/handlers/function.py", line 143, in modify_document
    self._func(doc)
  File "cytoolz/functoolz.pyx", line 250, in cytoolz.functoolz.curry.__call__
  File "/Users/james/projects/dask/distributed/distributed/dashboard/components/scheduler.py", line 3982, in individual_doc
    fig = cls(scheduler, sizing_mode="stretch_both", **kwargs)
  File "/Users/james/projects/dask/distributed/distributed/utils.py", line 767, in wrapper
    return func(*args, **kwargs)
  File "/Users/james/projects/dask/distributed/distributed/dashboard/components/scheduler.py", line 1049, in __init__
    update(self.source, self.get_data())
  File "/Users/james/projects/dask/distributed/distributed/dashboard/components/scheduler.py", line 1176, in get_data
    read_bytes_disk += ws.metrics["read_bytes_disk"]
KeyError: 'read_bytes_disk'

logged in my local Python session

@mrocklin
Copy link
Member Author

mrocklin commented Jun 8, 2022

That can already be the case if, for example, the system doesn't have a disk (somewhat common in HPC scenarios).

I've gone ahead and fixed the dashboard and modified tests to cover this scenario.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 8, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±0         15 suites  ±0   6h 4m 59s ⏱️ - 12m 27s
  2 854 tests +1    2 770 ✔️  - 1    82 💤 +1  2 +1 
21 144 runs  +7  20 197 ✔️ +6  945 💤 ±0  2 +1 

For more details on these failures, see this check.

Results for commit 0ed6b47. ± Comparison against base commit 9e4e3ab.

@mrocklin
Copy link
Member Author

@jrbourbeau good to merge?

@jrbourbeau jrbourbeau merged commit c62acab into dask:main Jun 10, 2022
@mrocklin mrocklin deleted the sysmon-disk branch June 10, 2022 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants