Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server close faster #6415

Merged
merged 4 commits into from
May 24, 2022
Merged

Server close faster #6415

merged 4 commits into from
May 24, 2022

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented May 23, 2022

This supersedes #6414
closes #6413
xref closes #6365
Follow up of #4805 where I dealt with other ongoing handlers but didn't special case terminate / close

This offers a few benefits over #6414

  • Removes any special treatments of any op / handler-names by referencing the currently running asyncio task.
  • It removes polling. Subjectively this makes the code a bit better readable and intentions are clearer. This also prepares us to eventually deal with exceptions. I haven't implemented any exception handling this this would require much more thorough testing to define the behavior of the server after an exception was caught (e.g. ensure it is still closing properly)

cc @mrocklin @hendrikmakait

mrocklin and others added 2 commits May 22, 2022 21:51
This permits servers to allow "terminate" comms to persist  when
shutting down.

Closes dask#6413
@github-actions
Copy link
Contributor

github-actions bot commented May 23, 2022

Unit Test Results

       15 files  ±  0         15 suites  ±0   6h 41m 50s ⏱️ - 3m 45s
  2 808 tests +  2    2 725 ✔️ +  1    79 💤  - 1  4 +2 
20 818 runs  +15  19 886 ✔️ +15  928 💤  - 2  4 +2 

For more details on these failures, see this check.

Results for commit 2551272. ± Comparison against base commit 9bb999d.

♻️ This comment has been updated with latest results.

Comment on lines +421 to +425
assert w.status in {
Status.closing,
Status.closed,
Status.failed,
}, w.status
Copy link
Member Author

@fjetter fjetter May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the scheduler.close does no longer use an explicit comm to close the remote worker.
This behavior was introduced in #6363

@mrocklin
Copy link
Member

Seems to be fast on subsequent close calls

In [1]: from dask.distributed import LocalCluster

In [2]: %%time
   ...: with LocalCluster(processes=False, n_workers=3):
   ...:     pass
   ...: 
CPU times: user 853 ms, sys: 120 ms, total: 972 ms
Wall time: 2.29 s

In [3]: %%time
   ...: with LocalCluster(processes=False, n_workers=3):
   ...:     pass
   ...: 
CPU times: user 106 ms, sys: 20.3 ms, total: 127 ms
Wall time: 132 ms

👍

@fjetter
Copy link
Member Author

fjetter commented May 23, 2022

Im surprised to see the first one taking that long. In my tests, the first call took about 150ms and the second one 30ms

distributed/core.py Outdated Show resolved Hide resolved
@fjetter
Copy link
Member Author

fjetter commented May 23, 2022

timing for me

Python 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:42:03) [Clang 12.0.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from dask.distributed import LocalCluster

In [2]: %%time
   ...: with LocalCluster(processes=False, n_workers=0):
   ...:     pass
   ...:
CPU times: user 237 ms, sys: 52.1 ms, total: 289 ms
Wall time: 358 ms

In [3]: %%time
   ...: with LocalCluster(processes=False, n_workers=0):
   ...:     pass
   ...:
CPU times: user 20.5 ms, sys: 2.52 ms, total: 23 ms
Wall time: 22.9 ms

I just realize in #6415 (comment) there are three workers. Maybe the first call is slow/fast enough to actually get a worker process up such that the closing takes longer

@mrocklin
Copy link
Member

mrocklin commented May 23, 2022 via email

@fjetter fjetter merged commit 60f0886 into dask:main May 24, 2022
@fjetter fjetter deleted the server_close_faster branch May 24, 2022 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subsecond cluster startup time
4 participants