-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using grouped workers and Adaptive #1987
Comments
Yeah, there are definitely bugs here. Thanks for raising the issue.
What happens if you change your interval to something very fast, like
I suspect that they refer to the number of python
Yup, I agree |
That's right (see dask/dask-jobqueue#11)
That is my assumption now too. This should be documented (here and in jobqueue I think). |
@jhamman is this now resolved? |
Yes, @mrocklin. We can close this now. |
I am seeing what appears to be some buggy behavior when using
Adaptive
with grouped workers.A fully reproducible example here is a bit tough because this also includes dask-jobqueue (dask/dask-jobqueue#26) but I hopefully can lay out what I see as potential problems and we can go from there.
Here's my current workflow:
In jobqueue, this leads to calling the following command for each scale up call:
(hence the grouped workers)
Problem description:
Initializing the cluster / client goes as expected. The problem occurs when using the Adaptive scheduler.
minimum=2
calls scaled up twice and is translated into two groups of workers (24 in total). These workers come online and are immediately culled.So problem 1 may just be a semantics issue. Do the minimum/maximum kwargs to Adaptive correspond to individual workers (processes), and not to grouped workers (executions of `dask-worker)?
Problem 2 is perhaps a bit harder to see. Even if we're treating each group incorrectly, 1 of the 2 workers should have survived and I should be left with 12 processes/workers. But all the workers are culled so this seems like a bug.
I should note that manually scaling the
PBSCluster
usingscale_up/scale_down
works just fine.cc @mrocklin @guillaumeeb
The text was updated successfully, but these errors were encountered: