Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select GPU to be used for each worker on parallel inference #1570

Open
teddy-ambona opened this issue Feb 15, 2024 · 0 comments
Open

Select GPU to be used for each worker on parallel inference #1570

teddy-ambona opened this issue Feb 15, 2024 · 0 comments

Comments

@teddy-ambona
Copy link

teddy-ambona commented Feb 15, 2024

I am trying to run a transformer model using parallel inference on 4 workers on a machine that has 4 GPUs. The 4 workers are able to load the model but the issue is that they are all using the same GPU. This is a snippet of the code used to load the model:

import torch
from mlserver import MLModel
from transformers import XLMRobertaForSequenceClassification

class MyCustomRuntime(MLModel):
    async def load(self) -> bool:
        self.device = "cuda:0" if torch.cuda.is_available() else "cpu"
    
        model = XLMRobertaForSequenceClassification.from_pretrained(dir_path)
        self.model = model.to(self.device)
    
        return True

Basically each worker would need to be aware of which GPU it should use for loading the model but I couldn't find a way to do that looking at documentation and source code. Looking forward to your reply :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant