You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run a transformer model using parallel inference on 4 workers on a machine that has 4 GPUs. The 4 workers are able to load the model but the issue is that they are all using the same GPU. This is a snippet of the code used to load the model:
import torch
from mlserver import MLModel
from transformers import XLMRobertaForSequenceClassification
class MyCustomRuntime(MLModel):
async def load(self) -> bool:
self.device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = XLMRobertaForSequenceClassification.from_pretrained(dir_path)
self.model = model.to(self.device)
return True
Basically each worker would need to be aware of which GPU it should use for loading the model but I couldn't find a way to do that looking at documentation and source code. Looking forward to your reply :)
The text was updated successfully, but these errors were encountered:
I am trying to run a transformer model using parallel inference on 4 workers on a machine that has 4 GPUs. The 4 workers are able to load the model but the issue is that they are all using the same GPU. This is a snippet of the code used to load the model:
Basically each worker would need to be aware of which GPU it should use for loading the model but I couldn't find a way to do that looking at documentation and source code. Looking forward to your reply :)
The text was updated successfully, but these errors were encountered: