Skip to content

Could dask-mpi run the client script too? #2402

@kmpaul

Description

@kmpaul

I've been dealing with an issue that...well, I was convinced shouldn't be an issue, so I never said anything about it until dask/dask-blog#5. And after a discussion with @guillaumeeb, I was convinced that maybe I'm not as crazy (or as ill-informed) as I thought I was. So, here's the issue...

I've been trying to figure out a way of launching the Dask Scheduler, Workers, and the Client script in the same MPI environment. Currently, the way dask-mpi works is that the Scheduler and the Workers are started, and you separately connect your client (in your separate script) to the Scheduler via, for example, the scheduler.json file.

I discussed with @guillaumeeb one approach that should work, something like the following:

# [PBS header info requesting N MPI processes]

mpirun -np N dask-mpi [dask-mpi options] &
python my_dask_script.py

However, this launches Scheduler/Worker processes on all N allocated MPI processes, and then the python my_dask_script.py process could, potentially, run on the same process as the Scheduler, for example. If you have a compute-intensive client script, this could be problematic.

What I was originally hoping for was a solution that allowed something more like this:

# [PBS header info requesting N MPI processes]

mpirun -np N dask-mpi [dask-mpi options] --script my_dask_script

But after thinking about it for a while, I found that what I really wanted was something that worked like this:

# [PBS header info requesting N MPI processes]

mpirun -np N python my_dask_mpi_script.py

where the my_dask_mpi_script.py has something like an import dask-mpi line that does the following:

  1. let's MPI rank 0 pass through,
  2. launches the Scheduler on MPI rank 1 and runs an IOLoop until the rank 0 process is complete,
  3. launches the Workers on MPI ranks >1, which also run until rank 0 process is complete.

At this point, I feel like I could write this myself...except that I don't know how to implement the "run the IOLoop until rank 0 process is complete" part.

Any thoughts? Are there different solutions? Would you recommend something different?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions