Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster.scale is not robust to multiple calls #2257

Closed
guillaumeeb opened this issue Sep 18, 2018 · 2 comments
Closed

Cluster.scale is not robust to multiple calls #2257

guillaumeeb opened this issue Sep 18, 2018 · 2 comments

Comments

@guillaumeeb
Copy link
Member

As experienced in dask/dask-jobqueue#112 and a related PR dask/dask-jobqueue#97, Cluster.scale behavior is unstable if called multiple times in a row.

I suspect part of this problem is due to how asynchronism is used here:

If we want scale to run asynchronously, I propose to just add a _scale() method here (a corountine?) to be called in an async manner from scale(). In this scale, we would get the state and perform the modifications at the same time:

def _scale(self, n):
        with log_errors():
            if n >= len(self.scheduler.workers):
                self.scale_up(n)
            else:
                to_close = self.scheduler.workers_to_close(
                    n=len(self.scheduler.workers) - n)
                logger.debug("Closing workers: %s", to_close)
                self.scheduler.retire_workers(workers=to_close)
                self.scale_down(to_close)

@jhamman @mrocklin any opinion, advice?

@mrocklin
Copy link
Member

mrocklin commented Sep 18, 2018 via email

@GenevieveBuckley
Copy link
Contributor

Closing this issue in favour of #2235

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants