New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive: recommend close workers when any are idle #2330
Merged
martindurant
merged 10 commits into
dask:master
from
delgadom:recommend_close_workers_adaptive
Apr 24, 2019
Merged
Changes from 5 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
1a97a2f
adaptive: recommend close workers if idle
delgadom 3ea0d5c
adaptive: check for idle workers before recommending scale up
delgadom 2487e2e
adaptive: check for waiting tasks in should_scale_up
delgadom ef70b0e
revert to changing needs_cpu only
delgadom 519785c
performance bump in adaptive.needs_cpu by looping through workers
delgadom 62b94f0
switch to checking number of cores, not workers
delgadom 597c166
Merge branch 'master' into recommend_close_workers_adaptive
9c16405
apply black
a085e8e
remove xfail
3bb8e25
flake
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -123,17 +123,32 @@ def needs_cpu(self): | |
Notes | ||
----- | ||
Returns ``True`` if the occupancy per core is some factor larger | ||
than ``startup_cost``. | ||
than ``startup_cost`` and the number of tasks exceeds the number of | ||
workers | ||
""" | ||
total_occupancy = self.scheduler.total_occupancy | ||
total_cores = sum([ws.ncores for ws in self.scheduler.workers.values()]) | ||
|
||
if total_occupancy / (total_cores + 1e-9) > self.startup_cost * 2: | ||
logger.info("CPU limit exceeded [%d occupancy / %d cores]", | ||
total_occupancy, total_cores) | ||
return True | ||
else: | ||
return False | ||
|
||
num_workers = len(self.scheduler.workers) | ||
|
||
tasks_processing = 0 | ||
|
||
for w in self.scheduler.workers.values(): | ||
tasks_processing += len(w.processing) | ||
|
||
if tasks_processing > num_workers: | ||
logger.info( | ||
"pending tasks exceed number of workers " | ||
"[%d tasks / %d workers]", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this message, or probably the one above, should be changed. |
||
tasks_processing, num_workers) | ||
|
||
return True | ||
|
||
return False | ||
|
||
def needs_memory(self): | ||
""" | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you use total_cores for the number of tasks comparison? Do you have one thread per worker in your setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm - I'm not sure I follow, but I am on a dask_kubernetes setup with 1:1 threads to workers, so maybe my use case isn't universal. I think I see, though... in many cases the number of dask processes/threads which can take on tasks is greater than the number of workers? Is total_cores the number of workers * (threads per worker or procs per worker)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
I think in most cases having several threads per worker is better, except for GIL boud tasks. Dask-kubernetes propose 2 threads by default, which is low but it depends on VMs you use:
https://github.com/dask/dask-kubernetes/blob/master/dask_kubernetes/kubernetes.yaml#L24-L26
With dask-jobqueue, I often use 4, 8 or even 24 cores per worker.
Yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks! Just so I understand before modifying the code again... should I take this to mean that dask uses the term "core" to refer specifically to the number of tasks that can be handled simultaneously by a worker, regardless of the hardware? Or are you doing something clever with the relationship between physical cores and CPU needs that I'm not following? Thanks again for the helping hand in this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see https://github.com/dask/distributed/blob/master/distributed/cli/dask_worker.py#L62-L63, https://github.com/dask/distributed/blob/master/distributed/cli/dask_worker.py#L217.
I like it better when it is called nthreads, like in dask-worker options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this makes a lot of sense. so I should count the number of available threads or processes, not the number of workers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Worker = 1 process = several threads, on scheduler side. So just using the already computed
total_cores
should be fine.