-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The command to pass specific gpu ids to docker run command needs gpu ids in quotes #11010
Comments
This is ridiculous the way --gpus arguements are parsed. If you have a job in a batch system like SLURM with CUDA_VISIBLE_DEVICES set how are you to use it properly in a docker call? If I have a box like the DGX A100 with 8 GPUS and get a job to use all so CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 and do
which is the most obvious thing to do it silently ignores all the numbers except the final "7" and gives you just 7 GPUs to use. Because of the way the quoting works, I see no way to actually pass this properly
obviously fails since the single quotes make the variable not extrapolate. Doing it without quotes gives you the "cannot set both Count and DeviceIDs on device request" error. I mean what brain dead parser cannot tell from the device= string being there that this is NOT A COUNT! So having CUDA_VISIBLE_DEVICES set to some list of the GPU IDs, how is one supposed to use this with the docker --gpus argument in a script? Am I missing something obvious? |
This is what I have resorted to:
|
Yes, this is really annoying! |
Fixed in 4f12a72 |
Thanks for documenting this quirk! I can only advocate fixing the actual parsing error: docker/cli#2937 |
I found another, simpler solution: where $g can be set to the CUDA number of whatever GPU you wish to direct as your target |
This version without the eval and echo works:
|
Closed issues are locked after 30 days of inactivity. If you have found a problem that seems similar to this, please open a new issue. /lifecycle locked |
File: config/containers/resource_constraints.md
Referencing this issue
The below command:
$ docker run -it --rm --gpus device=0,2 nvidia-smi
Should be:
$ docker run -it --rm --gpus '"device=0,2"' nvidia-smi
I'm using
Docker version 19.03.10, build 9424aeaee9
The text was updated successfully, but these errors were encountered: