Currently, enabling Infiniband support for a Dask cluster using UCX requires that the workers and client be configured w/ the same connection arguments as the scheduler. As an example of a realistic configuration in a CUDA environment, a user might need to specify options to enable tcp-over-ucx, nvlink, infiniband and potentially rdmacm. In addition, they might need to manually specify a value for UCX_MAX_RNDV_RAILS. These options currently need to be specified for the scheduler, workers, and the client, which becomes painful for users who might be working to test multiple different configurations and might not know to specify one or more of them.
The more I think through the problem, I can't think of a reason why the values for these arguments would ever differ between a client/worker and the scheduler. It would make the deployment much more straightforward and alleviate a lot of the pain by storing these configuration arguments with the scheduler.json file.
Tagging @rlratzel and @pentschev for awareness and further thoughts.
Currently, enabling Infiniband support for a Dask cluster using UCX requires that the workers and client be configured w/ the same connection arguments as the scheduler. As an example of a realistic configuration in a CUDA environment, a user might need to specify options to enable
tcp-over-ucx,nvlink,infinibandand potentiallyrdmacm. In addition, they might need to manually specify a value forUCX_MAX_RNDV_RAILS. These options currently need to be specified for the scheduler, workers, and the client, which becomes painful for users who might be working to test multiple different configurations and might not know to specify one or more of them.The more I think through the problem, I can't think of a reason why the values for these arguments would ever differ between a client/worker and the scheduler. It would make the deployment much more straightforward and alleviate a lot of the pain by storing these configuration arguments with the
scheduler.jsonfile.Tagging @rlratzel and @pentschev for awareness and further thoughts.