ClassificationModel: predict() hangs forever in uwsgi worker #761

AdrienDS · 2020-10-12T00:21:45Z

Describe the bug

When model.predict is invoked in a uwsgi worker, it never resolves (hangs on the line outputs = model(**inputs) )

To Reproduce
Steps to reproduce the behavior:

Train a roberta-base model with simpletransformers 0.48.9
Run a uwsgi + flask server that loads the model with {"use_multiprocessing": False} before spawning workers, and then runs model.predict() when it receives a request (I used the docker image tiangolo/uwsgi-nginx-flask as a base, and install transformers, pytorch and simpletransformers)
Emit a request, it hangs on the line outputs = model(**inputs)
However, if model.predict() is called on the same server before the uwsgi workers are spawn (when the server loads, as opposed to when responding to a request), it returns normally with the expected result.
Another way for predict() to return normally is to load the model inside each worker, meaning the first request handled by each worker is delayed by the loading of the model.

Desktop (please complete the following information):

Docker image with Debian Buster + python 3.8 + flask + nginx + uwsgi
transformers version 3.3.1
simpletransformers version 0.48.9
torch version 1.6.0
uwsgi: tested with versions 2.0.17, 2.0.18, 2.0.19, 2.0.19.1

The text was updated successfully, but these errors were encountered:

ThilinaRajapakse · 2020-10-12T08:59:43Z

Setting use_multiprocessing=False should fix it.

AdrienDS · 2020-10-12T13:37:04Z

@ThilinaRajapakse Thank you for your response, as I was indicating in my first message, all tests were already run with:

from simpletransformers.classification import ClassificationModel

# ...

model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)

And the issue was noticed with this code. Isn't that enough to set use_multiprocessing=False ? Or should it be set elsewhere ?

ThilinaRajapakse · 2020-10-22T10:01:44Z

Sorry, I missed that you had already turned off multiprocessing. Can you try doing the prediction without going through the predict() function?

Something like this.

from simpletransformers.classification import ClassificationModel
from transformers import RobertaTokenizer

# ...

model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)
tokenizer = RobertaTokenizer.from_pretrained("model")


def prediction_test(text):
    """Simple function for Flask with no bells and whistles"""

    inputs = tokenizer(text, return_tensors="pt")
    # outputs = model(**inputs)   Corrected
    outputs = model.model(**inputs)

    return outputs

jmeisele · 2020-10-28T16:36:00Z

Any updates on this? I'm running into the same issue 👎

AdrienDS · 2020-10-29T14:28:47Z

@ThilinaRajapakse There is an issue in your snippet:

model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)

# ...

outputs = model(**inputs)

If I run that I get TypeError: 'ClassificationModel' object is not callable.

I looked at the code of ClassificationModel.predict and it calls this.model(**inputs) so I instead ran outputs = model.model(**inputs)

from simpletransformers.classification import ClassificationModel
from transformers import RobertaTokenizer

# ...

model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)
tokenizer = RobertaTokenizer.from_pretrained("model")


def prediction_test(text):
    """Simple function for Flask with no bells and whistles"""

    inputs = tokenizer(text, return_tensors="pt")
    outputs = model.model(**inputs)

    return outputs

And it still hangs the same way on model.model(**inputs) when the model is loaded before the workers are spawned, and prediction_test is called from a worker.

For now, we've updated the server so it loads the model in each worker (last point of my initial message) which means that the first request of a worker after its spawned is always slower. Is that the recommended approach ?

jmeisele · 2020-11-02T12:51:14Z

@AdrienDS

For now, we've updated the server so it loads the model in each worker (last point of my initial message) which means that the first request of a worker after its spawned is always slower. Is that the recommended approach ?

Can you send me a gist of how you preloaded this model in your workers? Are you using a wsgi like gunicorn or asgi like uvicorn?

AdrienDS · 2020-11-02T15:31:48Z

@jmeisele I use uwsgi (wsgi).

To delay the model loading into the worker you can use a singleton:

classifier.py (with a very basic lazy singleton):

from simpletransformers.classification import ClassificationModel

model = None

def get_model():
    global model
    if model is None:
        model_args = {"use_multiprocessing": False}
        model = ClassificationModel('roberta', 'model/', args=model_args)
    return model

# get_model()  # If you un-comment this line, the model will be created before the workers are spawned. If you leave it commented, it will be created the first time `predict` is invoked

def predict(text):
    cl_model = get_model()
    predictions, raw_outputs = cl_model.predict([text])
    # here goes your handling of the output

In my main.py file, referenced in uwsgi.ini:

from flask import Flask
from classifier import predict 

app = Flask(__name__)

@app.route('/prediction/<text>', methods=['GET'])
def predict_get(text):
    v =  predict(text)

But I am still unsure if this is the proper way to load and use the model.

jmeisele · 2020-11-02T17:33:31Z

Appreciate it, this gives me a couple of ideas I can run with. Thanks again 🤝

ThilinaRajapakse · 2020-11-09T10:06:20Z

I'm not sure what's causing this issue so, I'm afraid I don't really have any useful advice. Could it be something to do with the Pytorch dataloaders using multithreading?

stale · 2021-01-08T15:20:18Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

hirenumradia · 2021-02-06T01:10:59Z

We are facing this issue today as well with FastAPI with Guinicorn. When we run the predict function it takes a really long time. If we run the same code within a development flask server, it responds quickly.

hirenumradia · 2021-02-12T15:02:30Z

We bypassed it with the use_multiprocessing=False. @ThilinaRajapakse Would the predictions speed up with multiprocessing? If so, would you have any thoughts on how we could get it to work with async workers?

ThilinaRajapakse · 2021-02-13T07:38:34Z

The predictions will only speed up with multiprocessing if you call the predict() method with a large number of sentences at once. In such a case, the parallelization of the tokenization can speed up the overall prediction time. In a typical server/production scenario, you'd likely be sending a single sentence at a time to the predict function and there will be no speedup from using multiprocessing.

Overall, I would recommend keeping multiprocessing turned off when running the model on a production server.

hirenumradia · 2021-02-13T09:36:39Z

Hi, @ThilinaRajapakse thank you for the help! I can confirm from testing yesterday that multiprocessing=True was slower for our use-case of getting predictions for one sentence at a time.

anirindg · 2021-06-08T17:36:55Z

@jmeisele I use uwsgi (wsgi).

To delay the model loading into the worker you can use a singleton:

classifier.py (with a very basic lazy singleton):

from simpletransformers.classification import ClassificationModel

model = None

def get_model():
    global model
    if model is None:
        model_args = {"use_multiprocessing": False}
        model = ClassificationModel('roberta', 'model/', args=model_args)
    return model

# get_model()  # If you un-comment this line, the model will be created before the workers are spawned. If you leave it commented, it will be created the first time `predict` is invoked

def predict(text):
    cl_model = get_model()
    predictions, raw_outputs = cl_model.predict([text])
    # here goes your handling of the output

In my main.py file, referenced in uwsgi.ini:

from flask import Flask
from classifier import predict 

app = Flask(__name__)

@app.route('/prediction/<text>', methods=['GET'])
def predict_get(text):
    v =  predict(text)

But I am still unsure if this is the proper way to load and use the model.

Hi, I was facing the same issue while referring Transformer model in Flask for inference.
So, I tried the above solution suggested by you, but there as well my program execution gets stuck at the 'predict' point.
I use gunicorn for running the server.
Any idea why ?

anirindg · 2021-06-08T17:40:31Z

Appreciate it, this gives me a couple of ideas I can run with. Thanks again

Did it ( above solution) work for you ? I tried implementing above fix, but it is also getting stuck at the 'predict' point.

siqiniao · 2021-08-16T18:33:57Z

same problem,what is the best way?

sukrubezen · 2021-11-06T15:04:56Z

I had the same problem and now I solved it.

My args dict is like below.

args={"use_multiprocessing": False, "use_multiprocessing_for_evaluation": False, "process_count": 1}

skullyhoofd · 2022-08-09T11:41:45Z

@sukrubezen, this also fixed it for me! (predict getting stuck on too large input lists)
Maybe this should be somehow implemented as default for xlm-roberta based models?

irdanish11 · 2023-05-24T07:49:25Z

@sukrubezen Thanks for the solution it worked for me as well.

stale bot added the stale This issue has become stale label Jan 8, 2021

stale bot closed this as completed Jan 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClassificationModel: predict() hangs forever in uwsgi worker #761

ClassificationModel: predict() hangs forever in uwsgi worker #761

AdrienDS commented Oct 12, 2020 •

edited

ThilinaRajapakse commented Oct 12, 2020

AdrienDS commented Oct 12, 2020 •

edited

ThilinaRajapakse commented Oct 22, 2020 •

edited

jmeisele commented Oct 28, 2020

AdrienDS commented Oct 29, 2020 •

edited

jmeisele commented Nov 2, 2020

AdrienDS commented Nov 2, 2020 •

edited

jmeisele commented Nov 2, 2020

ThilinaRajapakse commented Nov 9, 2020

stale bot commented Jan 8, 2021

hirenumradia commented Feb 6, 2021

hirenumradia commented Feb 12, 2021

ThilinaRajapakse commented Feb 13, 2021

hirenumradia commented Feb 13, 2021

anirindg commented Jun 8, 2021

anirindg commented Jun 8, 2021

siqiniao commented Aug 16, 2021

sukrubezen commented Nov 6, 2021

skullyhoofd commented Aug 9, 2022 •

edited

irdanish11 commented May 24, 2023

ClassificationModel: predict() hangs forever in uwsgi worker #761

ClassificationModel: predict() hangs forever in uwsgi worker #761

Comments

AdrienDS commented Oct 12, 2020 • edited

ThilinaRajapakse commented Oct 12, 2020

AdrienDS commented Oct 12, 2020 • edited

ThilinaRajapakse commented Oct 22, 2020 • edited

jmeisele commented Oct 28, 2020

AdrienDS commented Oct 29, 2020 • edited

jmeisele commented Nov 2, 2020

AdrienDS commented Nov 2, 2020 • edited

jmeisele commented Nov 2, 2020

ThilinaRajapakse commented Nov 9, 2020

stale bot commented Jan 8, 2021

hirenumradia commented Feb 6, 2021

hirenumradia commented Feb 12, 2021

ThilinaRajapakse commented Feb 13, 2021

hirenumradia commented Feb 13, 2021

anirindg commented Jun 8, 2021

anirindg commented Jun 8, 2021

siqiniao commented Aug 16, 2021

sukrubezen commented Nov 6, 2021

skullyhoofd commented Aug 9, 2022 • edited

irdanish11 commented May 24, 2023

AdrienDS commented Oct 12, 2020 •

edited

AdrienDS commented Oct 12, 2020 •

edited

ThilinaRajapakse commented Oct 22, 2020 •

edited

AdrienDS commented Oct 29, 2020 •

edited

AdrienDS commented Nov 2, 2020 •

edited

skullyhoofd commented Aug 9, 2022 •

edited