New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRITICAL WORKER TIMEOUT when there is not enough memory , need better warning message #1646

Open
eromoe opened this Issue Nov 22, 2017 · 4 comments

Comments

Projects
None yet
4 participants
@eromoe

eromoe commented Nov 22, 2017

Hello,

I have a flask application which hold a article classification api(sklearn model) . And I found that one instance works well, but got CRITICAL WORKER TIMEOUT when I start 5 workers.

Then I visualize the memory usage and found one instance eat up 1.5G , and 5 workers 5 * 1.5 > machine's total memory . I think this cause CRITICAL WORKER TIMEOUT .

I think it would be better to warn user this instead of showing CRITICAL WORKER TIMEOUT (very unclear, which make I think my app has memory leak ..)

And this docs http://docs.gunicorn.org/en/stable/settings.html#worker-processes
recommend 2-4 x $(NUM_CORES) worker number should also warn the memory problem ...

@tilgovi

This comment has been minimized.

Collaborator

tilgovi commented Nov 23, 2017

Thanks for your report.

I suspect that the memory use causes swapping or otherwise slows down operation such that a timeout does occur. Unless you can show a stack trace or some other evidence that a memory failure, like a memory allocation failure, causes this timeout, I think Gunicorn is doing everything correctly in this case.

The documentation says to "vary this a bit to find the best for your particular application’s work load." If you have some way you think the wording of that warning could be better, please feel free to open a PR.

@freedrone

This comment has been minimized.

freedrone commented Oct 23, 2018

Same here,
I spent last 24h trying to figure out the cause of WORKER TIMEOUT s.

Only by the suggestion of a collaborating friend's, I was able to come to the realization that the system was consuming ~15 GB for each worker. Which easily goes over total RAM of 32GB for 3 workers.

The setup is similar. A Flask application for a Tensorflow model that needs to load huge chunks of data, served with Gunicorn and Nginx on Ubuntu 16.04.

An explanation such as System Out of Memory would help a lot. I had the messages before since it was taking too long to load models, however I solved them with Threading.

Hope you can figure this out,
Thank you

@benoitc

This comment has been minimized.

Owner

benoitc commented Oct 23, 2018

@freedrone Gunicorn has no way to know the system is out of memory, so returning such error is not possible.

To prevent the workers to use too much of memory you could limit the number of requests accepted by a worker. Also probably design your application in a way you can distribute it to not overload one machine.

@freedrone

This comment has been minimized.

freedrone commented Oct 23, 2018

@benoitc thanks for informing

ML model will consume that much RAM, and there is little I can do about it if I want it to work properly. The best I could do was to make it run on a daemonic thread so I would not get TIMEOUT because of a script that is taking too long. Rest is future work though, it is not for public eye as is. But that is a story for different time.

At least having this issue on the web helps. It can help people who are getting started to be aware that this might also be one of the reasons for TIMEOUT message (or anything of sort).

Thanks a lot for replying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment