Skip to content
This repository was archived by the owner on Jul 6, 2020. It is now read-only.
This repository was archived by the owner on Jul 6, 2020. It is now read-only.

Multiple GPU on the same server #2

@surajkamal

Description

@surajkamal

This handy tool is a great addition to dask-distributed especially for those who re using tensorflow distributed. It would much better if there are options to specify GPU resources started by each dask tasks as in such situations like multiple workers resides on a same machine. When a tensorflow server initializes it grab all the available GPU RAM of all GPU cards availabe on a machine by default (as mentioned here). I have managed to achieve this by just adding:

import os
def start_and_attach_server(spec, job_name=None, task_index=None, dask_worker=None):
     os.environ["CUDA_VISIBLE_DEVICES"]=str(task_index)
     server = tf.train.Server(spec, job_name=job_name, task_index=task_index)
     dask_worker.tensorflow_server = server
     dask_worker.tensorflow_queue = Queue()
     return 'OK'

but this rudimentary, it will not check for whether it is on the same node or not which is essential and there is no clean interface to specify that as well. Is it possible to add such functionality in a more better way, so many tensorflow tasks can share a same node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions