NCCL tests do not run correctly with EFA on the dstack-cuda VM image, and it may be hard to fix/configure. The proposal is to use AWS DLAMI by default that works with EFA and has everything dstack needs.
DLAMI is not supposed to work with p3 instances (V100), so we may continue using our image there and deprecate V100 meanwhile.
NCCL tests do not run correctly with EFA on the dstack-cuda VM image, and it may be hard to fix/configure. The proposal is to use AWS DLAMI by default that works with EFA and has everything dstack needs.
DLAMI is not supposed to work with p3 instances (V100), so we may continue using our image there and deprecate V100 meanwhile.