-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Related to Fastpitch/Pytorch(s)
(https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/FastPitch)
Describe the bug
We are using FastPitch to generate the Melspectrogram for Thai Language. We have an inference server that occupies two T4 GPUs. Our approach is to use both GPUs in one triton server as follows,
When we run only one GPU per triton server, TTS model could generate Mel-Spectrogram without an issue. But once we use two GPUs per triton server we had to face the following issue.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
To Reproduce
Steps to reproduce the behavior:
-
Follow the Fastpitch triton example to generate a triton model and the config.
-
Copy the model folder into specific location (example ~/models)
-
Edit the triton model configuration config.pbxt (two instance groups)
instance_group[ { count: 2 kind: KIND_GPU gpus: [0] }, { count: 2 kind: KIND_GPU gpus: [1] } ]
-
Pull the triton server 21.05 and run the Triton server as follows with CUDA_VISIBLE_DEVICES=0,1 environment.
docker run -it --rm --runtime=nvidia --env CUDA_VISIBLE_DEVICES=0,1 --gpus=all -v ~/models:/models -p 8000:8000 -p 8001:8001 -p 8002:8002 nvcr.io/nvidia/tritonserver:21.05-py3 tritonserver --model-repository=/models --strict-model-config=false
-
Modify and run a simple client from Client libraries (HTTP, GRPC) to infer the model.
Expected behavior
Runtime Error will show up in the triton client-side log as follows,
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Code that relates the issue
Environment
- Container version: nvcr.io/nvidia/tritonserver:21.05-py3 (Triton server image)
- GPUs in the system: 2x Tesla T4-16GB
- CUDA driver version: Driver Version: 515.65.01