Skip to content

scheduler.json to store additional connection arguments to the scheduler #4877

@cjnolet

Description

@cjnolet

Currently, enabling Infiniband support for a Dask cluster using UCX requires that the workers and client be configured w/ the same connection arguments as the scheduler. As an example of a realistic configuration in a CUDA environment, a user might need to specify options to enable tcp-over-ucx, nvlink, infiniband and potentially rdmacm. In addition, they might need to manually specify a value for UCX_MAX_RNDV_RAILS. These options currently need to be specified for the scheduler, workers, and the client, which becomes painful for users who might be working to test multiple different configurations and might not know to specify one or more of them.

The more I think through the problem, I can't think of a reason why the values for these arguments would ever differ between a client/worker and the scheduler. It would make the deployment much more straightforward and alleviate a lot of the pain by storing these configuration arguments with the scheduler.json file.

Tagging @rlratzel and @pentschev for awareness and further thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions