Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The command to pass specific gpu ids to docker run command needs gpu ids in quotes #11010

Closed
Shreeyak opened this issue Jun 17, 2020 · 8 comments

Comments

@Shreeyak
Copy link

Shreeyak commented Jun 17, 2020

File: config/containers/resource_constraints.md

Referencing this issue

The below command:
$ docker run -it --rm --gpus device=0,2 nvidia-smi

Should be:
$ docker run -it --rm --gpus '"device=0,2"' nvidia-smi

I'm using Docker version 19.03.10, build 9424aeaee9

@Shreeyak Shreeyak changed the title The command to pass specific devices to docker run command needs gpu ids in quotes The command to pass specific gpu ids to docker run command needs gpu ids in quotes Jun 17, 2020
@paulraines68
Copy link

This is ridiculous the way --gpus arguements are parsed.

If you have a job in a batch system like SLURM with CUDA_VISIBLE_DEVICES set how are you to use it properly in a docker call?

If I have a box like the DGX A100 with 8 GPUS and get a job to use all so CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 and do

docker run --gpus $CUDA_VISIBLE_DEVICES ...

which is the most obvious thing to do it silently ignores all the numbers except the final "7" and gives you just 7 GPUs to use.

Because of the way the quoting works, I see no way to actually pass this properly

docker run --gpus '"device=$CUDA_VISIBLE_DEVICES"' ...

obviously fails since the single quotes make the variable not extrapolate. Doing it without quotes gives you the "cannot set both Count and DeviceIDs on device request" error. I mean what brain dead parser cannot tell from the device= string being there that this is NOT A COUNT!

So having CUDA_VISIBLE_DEVICES set to some list of the GPU IDs, how is one supposed to use this with the docker --gpus argument in a script? Am I missing something obvious?

@paulraines68
Copy link

This is what I have resorted to:

eval "echo docker run --gpus "'\"device=$CUDA_VISIBLE_DEVICES\"'" -it --rm --network host tf2-py3-x11"

@moi90
Copy link

moi90 commented Jan 18, 2021

Yes, this is really annoying!

@craig-osterhout
Copy link
Contributor

Fixed in 4f12a72

@moi90
Copy link

moi90 commented Jun 7, 2022

Thanks for documenting this quirk!

I can only advocate fixing the actual parsing error: docker/cli#2937

@BWBrook
Copy link

BWBrook commented Jun 14, 2022

I found another, simpler solution:
docker run --env CUDA_VISIBLE_DEVICES=$g --gpus all ...

where $g can be set to the CUDA number of whatever GPU you wish to direct as your target

@nouiz
Copy link

nouiz commented Jun 21, 2022

This version without the eval and echo works:

docker run --gpus \"device=${CUDA_VISIBLE_DEVICES2}\" -it --rm --network host tf2-py3-x11

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

/lifecycle locked

@docker docker locked and limited conversation to collaborators Mar 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants