Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClassificationModel: predict() hangs forever in uwsgi worker #761

Closed
AdrienDS opened this issue Oct 12, 2020 · 20 comments
Closed

ClassificationModel: predict() hangs forever in uwsgi worker #761

AdrienDS opened this issue Oct 12, 2020 · 20 comments
Labels
stale This issue has become stale

Comments

@AdrienDS
Copy link

AdrienDS commented Oct 12, 2020

Describe the bug

When model.predict is invoked in a uwsgi worker, it never resolves (hangs on the line outputs = model(**inputs) )

To Reproduce
Steps to reproduce the behavior:

  • Train a roberta-base model with simpletransformers 0.48.9
  • Run a uwsgi + flask server that loads the model with {"use_multiprocessing": False} before spawning workers, and then runs model.predict() when it receives a request (I used the docker image tiangolo/uwsgi-nginx-flask as a base, and install transformers, pytorch and simpletransformers)
  • Emit a request, it hangs on the line outputs = model(**inputs)
  • However, if model.predict() is called on the same server before the uwsgi workers are spawn (when the server loads, as opposed to when responding to a request), it returns normally with the expected result.
  • Another way for predict() to return normally is to load the model inside each worker, meaning the first request handled by each worker is delayed by the loading of the model.

Desktop (please complete the following information):

  • Docker image with Debian Buster + python 3.8 + flask + nginx + uwsgi
  • transformers version 3.3.1
  • simpletransformers version 0.48.9
  • torch version 1.6.0
  • uwsgi: tested with versions 2.0.17, 2.0.18, 2.0.19, 2.0.19.1
@ThilinaRajapakse
Copy link
Owner

Setting use_multiprocessing=False should fix it.

@AdrienDS
Copy link
Author

AdrienDS commented Oct 12, 2020

@ThilinaRajapakse Thank you for your response, as I was indicating in my first message, all tests were already run with:

from simpletransformers.classification import ClassificationModel

# ...

model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)

And the issue was noticed with this code. Isn't that enough to set use_multiprocessing=False ? Or should it be set elsewhere ?

@ThilinaRajapakse
Copy link
Owner

ThilinaRajapakse commented Oct 22, 2020

Sorry, I missed that you had already turned off multiprocessing. Can you try doing the prediction without going through the predict() function?

Something like this.

from simpletransformers.classification import ClassificationModel
from transformers import RobertaTokenizer

# ...

model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)
tokenizer = RobertaTokenizer.from_pretrained("model")


def prediction_test(text):
    """Simple function for Flask with no bells and whistles"""

    inputs = tokenizer(text, return_tensors="pt")
    # outputs = model(**inputs)   Corrected
    outputs = model.model(**inputs)

    return outputs

@jmeisele
Copy link

Any updates on this? I'm running into the same issue 👎

@AdrienDS
Copy link
Author

AdrienDS commented Oct 29, 2020

@ThilinaRajapakse There is an issue in your snippet:

model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)

# ...

outputs = model(**inputs)

If I run that I get TypeError: 'ClassificationModel' object is not callable.

I looked at the code of ClassificationModel.predict and it calls this.model(**inputs) so I instead ran outputs = model.model(**inputs)

from simpletransformers.classification import ClassificationModel
from transformers import RobertaTokenizer

# ...

model_args = {"use_multiprocessing": False}
model = ClassificationModel('roberta', 'model/', use_cuda=False, num_labels=n, args=model_args)
tokenizer = RobertaTokenizer.from_pretrained("model")


def prediction_test(text):
    """Simple function for Flask with no bells and whistles"""

    inputs = tokenizer(text, return_tensors="pt")
    outputs = model.model(**inputs)

    return outputs

And it still hangs the same way on model.model(**inputs) when the model is loaded before the workers are spawned, and prediction_test is called from a worker.


For now, we've updated the server so it loads the model in each worker (last point of my initial message) which means that the first request of a worker after its spawned is always slower. Is that the recommended approach ?

@jmeisele
Copy link

jmeisele commented Nov 2, 2020

@AdrienDS

For now, we've updated the server so it loads the model in each worker (last point of my initial message) which means that the first request of a worker after its spawned is always slower. Is that the recommended approach ?

Can you send me a gist of how you preloaded this model in your workers? Are you using a wsgi like gunicorn or asgi like uvicorn?

@AdrienDS
Copy link
Author

AdrienDS commented Nov 2, 2020

@jmeisele I use uwsgi (wsgi).

To delay the model loading into the worker you can use a singleton:

  • classifier.py (with a very basic lazy singleton):
from simpletransformers.classification import ClassificationModel

model = None

def get_model():
    global model
    if model is None:
        model_args = {"use_multiprocessing": False}
        model = ClassificationModel('roberta', 'model/', args=model_args)
    return model

# get_model()  # If you un-comment this line, the model will be created before the workers are spawned. If you leave it commented, it will be created the first time `predict` is invoked

def predict(text):
    cl_model = get_model()
    predictions, raw_outputs = cl_model.predict([text])
    # here goes your handling of the output
  • In my main.py file, referenced in uwsgi.ini:
from flask import Flask
from classifier import predict 

app = Flask(__name__)

@app.route('/prediction/<text>', methods=['GET'])
def predict_get(text):
    v =  predict(text)

But I am still unsure if this is the proper way to load and use the model.

@jmeisele
Copy link

jmeisele commented Nov 2, 2020

Appreciate it, this gives me a couple of ideas I can run with. Thanks again 🤝

@ThilinaRajapakse
Copy link
Owner

I'm not sure what's causing this issue so, I'm afraid I don't really have any useful advice. Could it be something to do with the Pytorch dataloaders using multithreading?

@stale
Copy link

stale bot commented Jan 8, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale This issue has become stale label Jan 8, 2021
@stale stale bot closed this as completed Jan 16, 2021
@hirenumradia
Copy link

We are facing this issue today as well with FastAPI with Guinicorn. When we run the predict function it takes a really long time. If we run the same code within a development flask server, it responds quickly.

@hirenumradia
Copy link

We bypassed it with the use_multiprocessing=False. @ThilinaRajapakse Would the predictions speed up with multiprocessing? If so, would you have any thoughts on how we could get it to work with async workers?

@ThilinaRajapakse
Copy link
Owner

The predictions will only speed up with multiprocessing if you call the predict() method with a large number of sentences at once. In such a case, the parallelization of the tokenization can speed up the overall prediction time. In a typical server/production scenario, you'd likely be sending a single sentence at a time to the predict function and there will be no speedup from using multiprocessing.

Overall, I would recommend keeping multiprocessing turned off when running the model on a production server.

@hirenumradia
Copy link

Hi, @ThilinaRajapakse thank you for the help! I can confirm from testing yesterday that multiprocessing=True was slower for our use-case of getting predictions for one sentence at a time.

@anirindg
Copy link

anirindg commented Jun 8, 2021

@jmeisele I use uwsgi (wsgi).

To delay the model loading into the worker you can use a singleton:

  • classifier.py (with a very basic lazy singleton):
from simpletransformers.classification import ClassificationModel

model = None

def get_model():
    global model
    if model is None:
        model_args = {"use_multiprocessing": False}
        model = ClassificationModel('roberta', 'model/', args=model_args)
    return model

# get_model()  # If you un-comment this line, the model will be created before the workers are spawned. If you leave it commented, it will be created the first time `predict` is invoked

def predict(text):
    cl_model = get_model()
    predictions, raw_outputs = cl_model.predict([text])
    # here goes your handling of the output
  • In my main.py file, referenced in uwsgi.ini:
from flask import Flask
from classifier import predict 

app = Flask(__name__)

@app.route('/prediction/<text>', methods=['GET'])
def predict_get(text):
    v =  predict(text)

But I am still unsure if this is the proper way to load and use the model.

Hi, I was facing the same issue while referring Transformer model in Flask for inference.
So, I tried the above solution suggested by you, but there as well my program execution gets stuck at the 'predict' point.
I use gunicorn for running the server.
Any idea why ?

@anirindg
Copy link

anirindg commented Jun 8, 2021

Appreciate it, this gives me a couple of ideas I can run with. Thanks again

Did it ( above solution) work for you ? I tried implementing above fix, but it is also getting stuck at the 'predict' point.

@siqiniao
Copy link

same problem,what is the best way?

@sukrubezen
Copy link

I had the same problem and now I solved it.

My args dict is like below.

args={"use_multiprocessing": False, "use_multiprocessing_for_evaluation": False, "process_count": 1}

@skullyhoofd
Copy link

skullyhoofd commented Aug 9, 2022

@sukrubezen, this also fixed it for me! (predict getting stuck on too large input lists)
Maybe this should be somehow implemented as default for xlm-roberta based models?

@irdanish11
Copy link

@sukrubezen Thanks for the solution it worked for me as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This issue has become stale
Projects
None yet
Development

No branches or pull requests

9 participants