Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should dask-mpi section be marked as not necessary in the docs? #3889

Open
guillaumeeb opened this issue Aug 20, 2018 · 2 comments
Open

Should dask-mpi section be marked as not necessary in the docs? #3889

guillaumeeb opened this issue Aug 20, 2018 · 2 comments
Labels
documentation Improve or add to documentation

Comments

@guillaumeeb
Copy link
Member

See http://dask.pydata.org/en/latest/setup/hpc.html#using-mpi.

While working on and with dask-jobqueue, in particular on doc issues (See #118), I stumble on the question of dask-jobqueue for batch processing. I believe this is OK and will work in many cases (I've already used it this way), but it won't in some others where dask-mpi would be appropriate. See the part of the doc I'm working on:

While dask-jobqueue can perfectly be used to submit batch processing, it is
better suited to interactive processing, using tools like ipython or jupyter
notebooks. Batch processing with dask-jobqueue can be tricky in some cases
depending on how your cluster is configured and which resources and queues you
have access to: scheduler might hang on for a long time before having some
connected workers, and you could end up with less computing power than you
expected. Another good solution for batch processing on HPC system using dask
is the dask-mpi <http://dask.pydata.org/en/latest/setup/hpc.html#using-mpi>_
command.

Maybe we should clarify this between us and also in Dask docs?

cc @jhamman

@jhamman
Copy link
Member

jhamman commented Aug 20, 2018

I generally agree with the comments above. Though, dask/distributed#2138, if implemented, would alleviate many of the relevant concerns here.

@guillaumeeb
Copy link
Member Author

At first I thought so to, but I'm not so sure anymore. Depending on cluster use, resolving this issue could lead to scale method never returning, or returning at a ressource optimum that could decay right after depending on jobs start time and walltime. Different outcome, but mainly same concerns as described above.

The only way to solve this with dask-jobqueue would be to make it able to launch multi-node jobs with worker on it. But currently we've agreed on it being out of the scope of dask-jobqueue.

@GenevieveBuckley GenevieveBuckley added the documentation Improve or add to documentation label Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improve or add to documentation
Projects
None yet
Development

No branches or pull requests

3 participants