Use Gunicorn with Flask to serve a Pytorch Model #2157

kalyangvs · 2019-11-09T04:38:54Z

gunicorn is used as gunicorn app:app --preload --workers 3
Preload is used to share the resources among the workers.
Set the OMP_NUM_THREADS to 2.

app.py contains the following code

from flask import Flask, jsonify
import torch  
from create_model import testtype or paste code here
app = Flask(__name__) 
model = torch.load('model.pt')

@app.route('/predict',methods = ['POST', 'GET']) 
def prediction(): 
    constant_input = torch.randn(20, 16, 50, 100)
    prediction = model(constant_input)
    return jsonify(prediction)

model.pt is created using create_model.py containing

import torch
import torch.nn as nn
import torch.nn.functional as F

class test(nn.Module):
    def __init__(self):
        super(test, self).__init__()
        self.conv1 = nn.Conv2d(16, 33, 3, stride=2)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return x

m = test()
input = torch.randn(20, 16, 50, 100)
print(m(input))
torch.save(m, 'model.pt')

But I am not able to infer it .
Instead of using a torch model if I use some numpy operation and just return its output, it is able to.

Though using gunicorn app:app --preload --workers 3 --threads 2 I am able to infer. But anyone please tell me why does it differ only when threads are used .
Even with -k gthreads it works fine instead of threads
Thanks.

The text was updated successfully, but these errors were encountered:

benoitc · 2019-11-10T20:03:56Z

what do you mean by "But I am not able to infer it ." does it fails?

kalyangvs · 2019-11-11T03:50:05Z

Yes it fails.
But works in the other two cases.

benoitc · 2019-11-11T06:10:48Z

do you have a trace of the failure? what does it raises?

On Mon 11 Nov 2019 at 04:50 gvskalyan ***@***.***> wrote: Yes it fails. But works in the other two cases. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2157?email_source=notifications&email_token=AAADRIXBUKCTZXPQQX5RRH3QTDI65A5CNFSM4JLDFBUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDVSU6Q#issuecomment-552282746>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADRIUZYX5UOSKPPAIMXC3QTDI65ANCNFSM4JLDFBUA> .

-- Sent from my Mobile

kalyangvs · 2019-11-11T07:37:29Z

The worker gets timed out . I tried with various high timeout values and with working cases as in threads=2 it works!.

tilgovi · 2019-11-12T00:19:35Z

I tried with various high timeout values and with working cases as in threads=2 it works!

Does this mean that the issue can be closed?

kalyangvs · 2019-11-12T04:59:23Z

Can you please address why this does not work with not having threads?
Else can you please elaborate on why having gthreads works?

The worker gthread is a threaded worker. It accepts connections in the main loop, accepted connections are added to the thread pool as a connection job. On keepalive connections are put back in the loop waiting for an event. If no event happen after the keep alive timeout, the connection is closed.

tilgovi · 2019-11-12T19:11:57Z

The default, synchronous worker will be killed if it does not generate a response within the timeout (default 30s). The threaded worker can signal liveness on a separate thread, so it generally does not time out (unless there is a bug in the interpreter, or unsafe C code that leads to a deadlock, or something like this).

How long do your requestts take?

kalyangvs · 2019-11-13T03:51:09Z

It depends on no of words present in a sentence. But for the same sentence which took around 250 ms for serving the inference. The default setting with sync workers got timedout.

kalyangvs · 2019-11-20T05:13:45Z

The reload option is not reloading the app. it detects a change is made, the debug log shows workers got reinitiated, but the changes I made which caused the reload did not get reflected.

Even put tried using on_reload in conf_ini but it does not enter here.

tilgovi · 2019-11-21T17:45:08Z

Reload is incompatible with preload.

benoitc · 2019-11-22T13:26:02Z

closing the issue as it seems answered. Feel free to reopen it if needed.

benoitc assigned kalyangvs Nov 10, 2019

benoitc added the Feedback Requested label Nov 10, 2019

benoitc closed this as completed Nov 22, 2019

cosimo mentioned this issue Dec 3, 2020

Serving ML models with multiple workers linearly adds the RAM's load. tiangolo/fastapi#2425

Closed

This was referenced Dec 17, 2020

Gunicorn preload flag not working with PyTorch library #2478

Closed

Gunicorn preload flag not working with Stanza stanfordnlp/stanza#570

Closed

Gunicorn preload flag not working with PyTorch library pytorch/pytorch#49555

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Gunicorn with Flask to serve a Pytorch Model #2157

Use Gunicorn with Flask to serve a Pytorch Model #2157

kalyangvs commented Nov 9, 2019 •

edited

benoitc commented Nov 10, 2019

kalyangvs commented Nov 11, 2019

benoitc commented Nov 11, 2019 via email

kalyangvs commented Nov 11, 2019 •

edited

tilgovi commented Nov 12, 2019

kalyangvs commented Nov 12, 2019

tilgovi commented Nov 12, 2019

kalyangvs commented Nov 13, 2019

kalyangvs commented Nov 20, 2019 •

edited

tilgovi commented Nov 21, 2019

benoitc commented Nov 22, 2019

Use Gunicorn with Flask to serve a Pytorch Model #2157

Use Gunicorn with Flask to serve a Pytorch Model #2157

Comments

kalyangvs commented Nov 9, 2019 • edited

benoitc commented Nov 10, 2019

kalyangvs commented Nov 11, 2019

benoitc commented Nov 11, 2019 via email

kalyangvs commented Nov 11, 2019 • edited

tilgovi commented Nov 12, 2019

kalyangvs commented Nov 12, 2019

tilgovi commented Nov 12, 2019

kalyangvs commented Nov 13, 2019

kalyangvs commented Nov 20, 2019 • edited

tilgovi commented Nov 21, 2019

benoitc commented Nov 22, 2019

kalyangvs commented Nov 9, 2019 •

edited

kalyangvs commented Nov 11, 2019 •

edited

kalyangvs commented Nov 20, 2019 •

edited