You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to set up the evaluation environment on a multi-GPU AWS following the instructions in TEACh Benchmark Challenge. However, I encounter two problems:
(1) The model can use only 1 GPU even if I have set the value of API_GPUS to multiple GPUs.
(2) When I start the inference runner, although it is able to launch multiple ai2thor instances by specifying --num_process X, the processes are all on one GPU instead of on X GPUs. Also, I have to manually specify --model_api_host_and_port to include multiple API ports (e.g. "@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT" for --num_processes 3), which seems weird.
Besides, I notice that in this line it mentions that the model container will have access to only one GPU, while this line says that the model can use all GPUs of a p3.16xlarge instance. I wonder which would be the case, and if multiple GPUs are allowed, how to correctly setup the docker container.
Thanks!
The text was updated successfully, but these errors were encountered:
In the Docker setup, the inference runner uses RemoteModel. An instance of RemoteModel is mapped to an API container. So it's treated as a single model in InferenceRunner. To utilize multiple GPUs in Docker testing, you'll need to start multiple API containers (each of them uses one of the GPUs, e.g. --gpus "device=0"), pass a comma separated list to --model_api_host_and_port and add the process count (same as the API container count) to --num_processes. For a single API container, its process_index and num_processes are defaulted to 1 (code path). So it only runs one process in the container.
In our evaluation script (in EvalAI), we do use all GPUs in our p3.16xlarge EC2 instance. It basically follows the same setup as above. We have one Inference container and seven API containers. Each API container is assigned to a unique GPU.
Hope this answers your question.
Hi,
I am trying to set up the evaluation environment on a multi-GPU AWS following the instructions in
TEACh Benchmark Challenge
. However, I encounter two problems:(1) The model can use only 1 GPU even if I have set the value of
API_GPUS
to multiple GPUs.(2) When I start the inference runner, although it is able to launch multiple ai2thor instances by specifying
--num_process X
, the processes are all on one GPU instead of on X GPUs. Also, I have to manually specify--model_api_host_and_port
to include multiple API ports (e.g."@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT,@TeachModelAPI:$API_PORT"
for--num_processes 3
), which seems weird.Besides, I notice that in this line it mentions that the model container will have access to only one GPU, while this line says that the model can use all GPUs of a
p3.16xlarge
instance. I wonder which would be the case, and if multiple GPUs are allowed, how to correctly setup the docker container.Thanks!
The text was updated successfully, but these errors were encountered: