-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixing up GPU names on slowest first GPU bus ID #18
Comments
I am not sure what your main concern is. Is it 1) that the IDs does not match between Tensorflow ( In case of 1), that can be solved by setting the CUDA environment variable In case of 2), NVIDIA only guarantees that the first GPU is the fastest. The rest of the GPUs are returned in unspecified order.
As such there is no guarantee, that the GPU#2 is faster than GPU#3. And if GPU#1 is already occupied, you are back to the original problem.
|
Thanks for quick response. I am talking about issue 1). Yeah this is how I deal with it now, setting the |
@eliorc I'm sorry for not getting back to you sooner. As far as I can tell, there is no way of sorting the GPUs according to fastest in nvidia-smi. Likewise, CUDAs heuristics for ordering the GPUs according to fastest is proprietary, which means there is no way of replicating the order. Secondly, they do not guarantee the order of the remaining GPUs. I will keep the issue open. |
Here are my two cents-- |
I understand that GPUtil infers the GPUs attributes so it will match the
nvidia-smi
output.The thing is, that GPUtil is commonly used with TensorFlow or other GPU utilizing frameworks - these frameworks usually use the IDs in a manner that is sorted by their quality.
For example, in TensorFlow if you set
CUDA_VISIBLE_DEVICES = '0'
in your environment variables, only the fastest GPU will be exposed to the library.In my setup, I have two different GPUs on the same machine - during runtime I use GPUtil to figure out which GPU has most memory available and using the GPU ID I designate a GPU to use. But since my slowest GPU is installed in the first bus, then it shows up in GPUtil as 0 and the faster one as 1.
I would suggest that there will be a parameter to pass to
GPUtil.getGPUs()
that will help sort that out, so that any downstream frameworks that rely onCUDA_VISIBLE_DEVICES
would be able to get the IDs right.The text was updated successfully, but these errors were encountered: