Supporting scheduling on specific GPU

I'm trying to support the following scenario: We have multiple machines with multiple NVIDIA GPUs. On each machine, we run one worker per GPU. When a task gets scheduled on a worker, we want to make sure that any GPU routine invoked by the task runs on a GPU that is not in use by any other worker. Libraries such as TensorFlow and pytorch will schedule GPU work on the visible GPU with the lowest bus id, so by default, tasks running in parallel will both try to schedule on the same GPU if they have the same visible GPU set.  CUDA supports masking available GPUs to processes via setting the `CUDA_VISIBLE_DEVICES` environment variable. I want to ensure that if two tasks are running on workers on the same machine, the GPUs they can see (and hence the GPUs they can schedule on) do not overlap.

I can think of two possible approaches here:
- Assigning each worker a GPU to use on startup by running `dask-worker` with `CUDA_VISIBLE_DEVICES` set such that each worker has a unique GPU, and `--resources "GPU=1"`
- Add a scheduler plugin that will assign tasks to available GPUs on the host of the worker they were scheduled on and add a preamble to the task's target that will set the environment variables properly.

I'm new to Dask, so I'm not sure if either of these is appropriate. Is there prior art on handling situations like this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Supporting scheduling on specific GPU #1758

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Supporting scheduling on specific GPU #1758

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions