Skip to content

Supporting scheduling on specific GPU #1758

@aadamson

Description

@aadamson

I'm trying to support the following scenario: We have multiple machines with multiple NVIDIA GPUs. On each machine, we run one worker per GPU. When a task gets scheduled on a worker, we want to make sure that any GPU routine invoked by the task runs on a GPU that is not in use by any other worker. Libraries such as TensorFlow and pytorch will schedule GPU work on the visible GPU with the lowest bus id, so by default, tasks running in parallel will both try to schedule on the same GPU if they have the same visible GPU set. CUDA supports masking available GPUs to processes via setting the CUDA_VISIBLE_DEVICES environment variable. I want to ensure that if two tasks are running on workers on the same machine, the GPUs they can see (and hence the GPUs they can schedule on) do not overlap.

I can think of two possible approaches here:

  • Assigning each worker a GPU to use on startup by running dask-worker with CUDA_VISIBLE_DEVICES set such that each worker has a unique GPU, and --resources "GPU=1"
  • Add a scheduler plugin that will assign tasks to available GPUs on the host of the worker they were scheduled on and add a preamble to the task's target that will set the environment variables properly.

I'm new to Dask, so I'm not sure if either of these is appropriate. Is there prior art on handling situations like this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions