Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to setup docker to use multiple GPUs for inference #16

Closed
594zyc opened this issue Mar 26, 2022 · 2 comments
Closed

How to setup docker to use multiple GPUs for inference #16

594zyc opened this issue Mar 26, 2022 · 2 comments

Comments

@594zyc
Copy link

594zyc commented Mar 26, 2022

Hi,

I am trying to set up the evaluation environment on a multi-GPU AWS following the instructions in TEACh Benchmark Challenge. However, I encounter two problems:
(1) The model can use only 1 GPU even if I have set the value of API_GPUS to multiple GPUs.
(2) When I start the inference runner, although it is able to launch multiple ai2thor instances by specifying --num_process X, the processes are all on one GPU instead of on X GPUs. Also, I have to manually specify --model_api_host_and_port to include multiple API ports (e.g. "@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT" for --num_processes 3), which seems weird.

Besides, I notice that in this line it mentions that the model container will have access to only one GPU, while this line says that the model can use all GPUs of a p3.16xlarge instance. I wonder which would be the case, and if multiple GPUs are allowed, how to correctly setup the docker container.

Thanks!

@hangjieshi
Copy link
Contributor

Hi,

In the Docker setup, the inference runner uses RemoteModel. An instance of RemoteModel is mapped to an API container. So it's treated as a single model in InferenceRunner. To utilize multiple GPUs in Docker testing, you'll need to start multiple API containers (each of them uses one of the GPUs, e.g. --gpus "device=0"), pass a comma separated list to --model_api_host_and_port and add the process count (same as the API container count) to --num_processes. For a single API container, its process_index and num_processes are defaulted to 1 (code path). So it only runs one process in the container.
In our evaluation script (in EvalAI), we do use all GPUs in our p3.16xlarge EC2 instance. It basically follows the same setup as above. We have one Inference container and seven API containers. Each API container is assigned to a unique GPU.
Hope this answers your question.

@594zyc
Copy link
Author

594zyc commented Mar 31, 2022

It works. Thanks for your reply @hangjieshi !

@594zyc 594zyc closed this as completed Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants