Select GPU to be used for each worker on parallel inference #1570

teddy-ambona · 2024-02-15T03:06:54Z

I am trying to run a transformer model using parallel inference on 4 workers on a machine that has 4 GPUs. The 4 workers are able to load the model but the issue is that they are all using the same GPU. This is a snippet of the code used to load the model:

import torch
from mlserver import MLModel
from transformers import XLMRobertaForSequenceClassification

class MyCustomRuntime(MLModel):
    async def load(self) -> bool:
        self.device = "cuda:0" if torch.cuda.is_available() else "cpu"
    
        model = XLMRobertaForSequenceClassification.from_pretrained(dir_path)
        self.model = model.to(self.device)
    
        return True

Basically each worker would need to be aware of which GPU it should use for loading the model but I couldn't find a way to do that looking at documentation and source code. Looking forward to your reply :)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select GPU to be used for each worker on parallel inference #1570

Select GPU to be used for each worker on parallel inference #1570

teddy-ambona commented Feb 15, 2024 •

edited

Loading

Select GPU to be used for each worker on parallel inference #1570

Select GPU to be used for each worker on parallel inference #1570

Comments

teddy-ambona commented Feb 15, 2024 • edited Loading

teddy-ambona commented Feb 15, 2024 •

edited

Loading