New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClassificationModel: predict() hangs forever in uwsgi worker #761
Comments
Setting |
@ThilinaRajapakse Thank you for your response, as I was indicating in my first message, all tests were already run with: from simpletransformers.classification import ClassificationModel
# ...
model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args) And the issue was noticed with this code. Isn't that enough to set |
Sorry, I missed that you had already turned off multiprocessing. Can you try doing the prediction without going through the Something like this. from simpletransformers.classification import ClassificationModel
from transformers import RobertaTokenizer
# ...
model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)
tokenizer = RobertaTokenizer.from_pretrained("model")
def prediction_test(text):
"""Simple function for Flask with no bells and whistles"""
inputs = tokenizer(text, return_tensors="pt")
# outputs = model(**inputs) Corrected
outputs = model.model(**inputs)
return outputs |
Any updates on this? I'm running into the same issue 👎 |
@ThilinaRajapakse There is an issue in your snippet: model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)
# ...
outputs = model(**inputs) If I run that I get I looked at the code of from simpletransformers.classification import ClassificationModel
from transformers import RobertaTokenizer
# ...
model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)
tokenizer = RobertaTokenizer.from_pretrained("model")
def prediction_test(text):
"""Simple function for Flask with no bells and whistles"""
inputs = tokenizer(text, return_tensors="pt")
outputs = model.model(**inputs)
return outputs And it still hangs the same way on For now, we've updated the server so it loads the model in each worker (last point of my initial message) which means that the first request of a worker after its spawned is always slower. Is that the recommended approach ? |
Can you send me a gist of how you preloaded this model in your workers? Are you using a wsgi like gunicorn or asgi like uvicorn? |
@jmeisele I use uwsgi (wsgi). To delay the model loading into the worker you can use a singleton:
from simpletransformers.classification import ClassificationModel
model = None
def get_model():
global model
if model is None:
model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', args=model_args)
return model
# get_model() # If you un-comment this line, the model will be created before the workers are spawned. If you leave it commented, it will be created the first time `predict` is invoked
def predict(text):
cl_model = get_model()
predictions, raw_outputs = cl_model.predict([text])
# here goes your handling of the output
from flask import Flask
from classifier import predict
app = Flask(__name__)
@app.route('/prediction/<text>', methods=['GET'])
def predict_get(text):
v = predict(text) But I am still unsure if this is the proper way to load and use the model. |
Appreciate it, this gives me a couple of ideas I can run with. Thanks again 🤝 |
I'm not sure what's causing this issue so, I'm afraid I don't really have any useful advice. Could it be something to do with the Pytorch dataloaders using multithreading? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
We are facing this issue today as well with FastAPI with Guinicorn. When we run the predict function it takes a really long time. If we run the same code within a development flask server, it responds quickly. |
We bypassed it with the use_multiprocessing=False. @ThilinaRajapakse Would the predictions speed up with multiprocessing? If so, would you have any thoughts on how we could get it to work with async workers? |
The predictions will only speed up with multiprocessing if you call the predict() method with a large number of sentences at once. In such a case, the parallelization of the tokenization can speed up the overall prediction time. In a typical server/production scenario, you'd likely be sending a single sentence at a time to the predict function and there will be no speedup from using multiprocessing. Overall, I would recommend keeping multiprocessing turned off when running the model on a production server. |
Hi, @ThilinaRajapakse thank you for the help! I can confirm from testing yesterday that multiprocessing=True was slower for our use-case of getting predictions for one sentence at a time. |
Hi, I was facing the same issue while referring Transformer model in Flask for inference. |
Did it ( above solution) work for you ? I tried implementing above fix, but it is also getting stuck at the 'predict' point. |
same problem,what is the best way? |
I had the same problem and now I solved it. My args dict is like below.
|
@sukrubezen, this also fixed it for me! (predict getting stuck on too large input lists) |
@sukrubezen Thanks for the solution it worked for me as well. |
Describe the bug
When
model.predict
is invoked in a uwsgi worker, it never resolves (hangs on the lineoutputs = model(**inputs)
)To Reproduce
Steps to reproduce the behavior:
{"use_multiprocessing": False}
before spawning workers, and then runsmodel.predict()
when it receives a request (I used the docker image tiangolo/uwsgi-nginx-flask as a base, and install transformers, pytorch and simpletransformers)outputs = model(**inputs)
model.predict()
is called on the same server before the uwsgi workers are spawn (when the server loads, as opposed to when responding to a request), it returns normally with the expected result.predict()
to return normally is to load the model inside each worker, meaning the first request handled by each worker is delayed by the loading of the model.Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: