Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gunicorn stops working every few months (light load) #2876

Open
BrainAnnex opened this issue Oct 4, 2022 · 5 comments
Open

gunicorn stops working every few months (light load) #2876

BrainAnnex opened this issue Oct 4, 2022 · 5 comments

Comments

@BrainAnnex
Copy link

I have an HTTPS site with very light load - I'm the only one using it.
Hosted on Debian server on Google cloud.

Runs fine for months - then the next day it's completely unresponsive. Happened SEVERAL times. Re-starting gunicorn always fixes the issue... but it's clearly an unreliable situation!

No Ngix, nor load balancers, nor anything else: JUST gunicorn (version 20.1.0) + Flask site (with an SSL certificate.)

Here's how I start gunicorn:

gunicorn --certfile=/etc/letsencrypt/live/MY_DOMAIN.org/fullchain.pem 
--keyfile=/etc/letsencrypt/live/MY_DOMAIN.org/privkey.pem  
--worker-class gthread 
--threads 3 -w 1 
--error-logfile gunicorn_error_log.txt  
-b 0.0.0.0:443 main:app &> app_log.txt &

There's nothing unusual in the app log. In the gunicorn_error_log.txt , the last message before becoming unresponsive was:

[2022-09-16 07:50:48 +0000] [32653] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 266, in handle
    req = next(conn.parser)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/parser.py", line 42, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.source_addr, self.req_count)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 180, in __init__
    super().__init__(cfg, unreader, peer_addr)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 192, in parse
    self.get_data(unreader, buf, stop=True)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 183, in get_data
    data = unreader.read()
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/unreader.py", line 37, in read
    d = self.chunk()
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/unreader.py", line 64, in chunk
    return self.sock.recv(self.mxchunk)
  File "/usr/lib/python3.7/ssl.py", line 1037, in recv
    return self.read(buflen)
  File "/usr/lib/python3.7/ssl.py", line 913, in read
    return self._sslobj.read(len)
OSError: [Errno 0] Error

However, this is a message that is sprinkled throughout the error log site! So, maybe not serious - until it dies! (Or maybe it dies silently from something else?)

Strangely, even though the app is 100% un-responsive, if I do:

ps -e | grep gunicorn

it shows 2 processes, as it normally does.

11354 ?        00:02:58 gunicorn
11357 ?        00:06:43 gunicorn

So, it seems that gunicorn has NOT exited; it has simply become un-responsive...

As stated at the beginning, re-starting gunicorn fixes the issue, but then a few months later - again! This has happened at the very least 3-4 times :(

@benoitc
Copy link
Owner

benoitc commented Oct 14, 2022

what does your flask application? can you kill a worker manually and the worker is restarted?

@BrainAnnex
Copy link
Author

My Flask application serves web pages... very common usage; nothing "exotic".

I will try to kill a worker manually - and see if it restarts, thanks.

2 weeks ago, I dropped the part --worker-class gthread --threads 3 -w 1 ... and I'm waiting to see if it will be able to run without problem for some months. So far, so good

@benoitc
Copy link
Owner

benoitc commented Oct 15, 2022

@BrainAnnex well hard to say since i don't reproduce it myself. Did you try to kill manually a worker with same configuration? Do you have none ssl connection landing on gunicorn from time to time? In any case coming update has new SSL handling.

@BrainAnnex
Copy link
Author

BrainAnnex commented Oct 23, 2022

I dropped the part --worker-class gthread --threads 3 -w 1
Now, I start gunicorn with:

gunicorn --certfile=/etc/letsencrypt/live/MY_DOMAIN.org/fullchain.pem 
--keyfile=/etc/letsencrypt/live/MY_DOMAIN.org/privkey.pem  
--error-logfile gunicorn_error_log.txt  
-b 0.0.0.0:443 main:app &> app_log.txt &

Now, while not crashing, it produces mountains of errors in the log file; they all say:

[2022-10-22 23:09:34 +0000] [15259] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 135, in handle
    req = next(parser)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/parser.py", line 42, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.source_addr, self.req_count)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 180, in __init__
    super().__init__(cfg, unreader, peer_addr)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 192, in parse
    self.get_data(unreader, buf, stop=True)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 183, in get_data
    data = unreader.read()
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/unreader.py", line 37, in read
    d = self.chunk()
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/unreader.py", line 64, in chunk
    return self.sock.recv(self.mxchunk)
  File "/usr/lib/python3.7/ssl.py", line 1037, in recv
    return self.read(buflen)
  File "/usr/lib/python3.7/ssl.py", line 913, in read
    return self._sslobj.read(len)
OSError: [Errno 0] Error

After the last of many, it also said:

[2022-10-23 00:08:46 +0000] [4683] [CRITICAL] WORKER TIMEOUT (pid:15259)
[2022-10-23 00:08:46 +0000] [15259] [INFO] Worker exiting (pid: 15259)
[2022-10-23 00:08:46 +0000] [21995] [INFO] Booting worker with pid: 21995
[2022-10-23 00:58:59 +0000] [4683] [CRITICAL] WORKER TIMEOUT (pid:21995)
[2022-10-23 00:58:59 +0000] [21995] [INFO] Worker exiting (pid: 21995)

It presumably re-starts: I'm getting logged out of my Flask web app about once/day... sometimes mid-use, and sometimes at times of no apparent use (I'm the only person on that web app)

As usual, I'm seeing two processes:

 4683 ?        00:02:38 gunicorn
22480 ?        00:00:01 gunicorn

Maybe the short running time on the 2nd process is related to the fact that it restarted minutes ago? (And logged me out as a result?)

Following your advice, I manually killed a process:

sudo kill -9 22480

and it re-started (logging me out of the Flask web app by doing so)

Now the end of the gunicorn error log says:

[2022-10-23 01:11:07 +0000] [4683] [WARNING] Worker with pid 22480 was terminated due to signal 9
[2022-10-23 01:11:07 +0000] [22560] [INFO] Booting worker with pid: 22560

So, it seems that we have:

  1. mountains of errors
  2. about once/day, one of those errors causes a worker process to restart (logging me out of the Flask app as a result)

@sachalchandio
Copy link

same issue starting with gunicorn app:app and works fine for a day then after 1 day one of the actions stops responding while everything else is working fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants