Skip to content

[Feature]: Support mpi with fleets #2368

@peterschmidt85

Description

@peterschmidt85

Problem

Currently, it's not possible to use dstack with mpirun (which is for example required for many important things, incl. nccl-tests) when it comes to working with clusters.

Solution

Support mpi: true so dstack first sets up worker nodes, and only then the master node, and at the same time, it let the master node to connect to each worker node via SSH (implied the hosted network mode is used).
This will allow to use mpi freely from the tasks running on such fleets.

Workaround

Before the implementation, it's important to test it quickly on a cluster using a simple docker run.

Would you like to help us implement this feature by sending a PR?

No

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions