Mixing up GPU names on slowest first GPU bus ID #18

eliorc · 2018-11-13T09:26:09Z

I understand that GPUtil infers the GPUs attributes so it will match the nvidia-smi output.

The thing is, that GPUtil is commonly used with TensorFlow or other GPU utilizing frameworks - these frameworks usually use the IDs in a manner that is sorted by their quality.
For example, in TensorFlow if you set CUDA_VISIBLE_DEVICES = '0' in your environment variables, only the fastest GPU will be exposed to the library.

In my setup, I have two different GPUs on the same machine - during runtime I use GPUtil to figure out which GPU has most memory available and using the GPU ID I designate a GPU to use. But since my slowest GPU is installed in the first bus, then it shows up in GPUtil as 0 and the faster one as 1.

I would suggest that there will be a parameter to pass to GPUtil.getGPUs() that will help sort that out, so that any downstream frameworks that rely on CUDA_VISIBLE_DEVICES would be able to get the IDs right.

The text was updated successfully, but these errors were encountered:

anderskm · 2018-11-13T10:11:34Z

I am not sure what your main concern is. Is it 1) that the IDs does not match between Tensorflow (CUDA_VISIBLE_DEVICES) and GPUtil (nvidia-smi) or 2) that the GPUs returned by GPUtil (nvidia-smi) are not ordered according to their processing speed?

In case of 1), that can be solved by setting the CUDA environment variable CUDA_DEVICE_ORDER = "PCI_BUS_ID".
See the example Occupy only 1 GPU in TensorFlow in the GPUtil readme.
See also NVIDIAs description of the CUDA environment variables.

In case of 2), NVIDIA only guarantees that the first GPU is the fastest. The rest of the GPUs are returned in unspecified order.
From the CUDA environment variables:

FASTEST_FIRST causes CUDA to guess which device is fastest using a simple heuristic, and make that device 0, leaving the order of the rest of the devices unspecified.

As such there is no guarantee, that the GPU#2 is faster than GPU#3. And if GPU#1 is already occupied, you are back to the original problem.
Unfortunately, I do not see a solution to case 2) at the moment. However, you or anyone else are very welcome to suggest a solution :-)

Edit: Fixed some spelling.

eliorc · 2018-11-13T11:26:23Z

Thanks for quick response. I am talking about issue 1).

Yeah this is how I deal with it now, setting the CUDA_DEVICE_ORDER, my suggestion was that since GPUtil is a standard choice when working with CUDA backed frameworks, it would be helpful if on the libraries side (GPUtil's side) there will be support for that default behavior since it is such a common use case (using the CUDA defaults)

anderskm · 2018-11-19T08:53:46Z

@eliorc I'm sorry for not getting back to you sooner.

As far as I can tell, there is no way of sorting the GPUs according to fastest in nvidia-smi. Likewise, CUDAs heuristics for ordering the GPUs according to fastest is proprietary, which means there is no way of replicating the order. Secondly, they do not guarantee the order of the remaining GPUs.
In short, I do not see a reliable solution for GPUtil to deal with the default behavior of CUDAs GPU ordering (fastest).

I will keep the issue open.

tashrifbillah · 2021-07-23T15:44:40Z

Here are my two cents--GPUtil.get*() functions should respect the environment variable CUDA_VISIBLE_DEVICES. Let's say I have 4 GPUs but I make only 2 visible. Then, the above methods should return assuming only 2 are available. Currently, it looks at nvidia-smi output and returns whatever that returns.

anderskm added the help wanted label Nov 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixing up GPU names on slowest first GPU bus ID #18

Mixing up GPU names on slowest first GPU bus ID #18

eliorc commented Nov 13, 2018

anderskm commented Nov 13, 2018 •

edited

Loading

eliorc commented Nov 13, 2018 •

edited

Loading

anderskm commented Nov 19, 2018

tashrifbillah commented Jul 23, 2021

Mixing up GPU names on slowest first GPU bus ID #18

Mixing up GPU names on slowest first GPU bus ID #18

Comments

eliorc commented Nov 13, 2018

anderskm commented Nov 13, 2018 • edited Loading

eliorc commented Nov 13, 2018 • edited Loading

anderskm commented Nov 19, 2018

tashrifbillah commented Jul 23, 2021

anderskm commented Nov 13, 2018 •

edited

Loading

eliorc commented Nov 13, 2018 •

edited

Loading