-
Notifications
You must be signed in to change notification settings - Fork 204
Closed
Description
Problem
Currently, it's not possible to use dstack with mpirun (which is for example required for many important things, incl. nccl-tests) when it comes to working with clusters.
Solution
Support mpi: true so dstack first sets up worker nodes, and only then the master node, and at the same time, it let the master node to connect to each worker node via SSH (implied the hosted network mode is used).
This will allow to use mpi freely from the tasks running on such fleets.
Workaround
Before the implementation, it's important to test it quickly on a cluster using a simple docker run.
Would you like to help us implement this feature by sending a PR?
No