gunicorn stops working every few months (light load) #2876

BrainAnnex · 2022-10-04T21:09:40Z

I have an HTTPS site with very light load - I'm the only one using it.
Hosted on Debian server on Google cloud.

Runs fine for months - then the next day it's completely unresponsive. Happened SEVERAL times. Re-starting gunicorn always fixes the issue... but it's clearly an unreliable situation!

No Ngix, nor load balancers, nor anything else: JUST gunicorn (version 20.1.0) + Flask site (with an SSL certificate.)

Here's how I start gunicorn:

gunicorn --certfile=/etc/letsencrypt/live/MY_DOMAIN.org/fullchain.pem 
--keyfile=/etc/letsencrypt/live/MY_DOMAIN.org/privkey.pem  
--worker-class gthread 
--threads 3 -w 1 
--error-logfile gunicorn_error_log.txt  
-b 0.0.0.0:443 main:app &> app_log.txt &

There's nothing unusual in the app log. In the gunicorn_error_log.txt , the last message before becoming unresponsive was:

[2022-09-16 07:50:48 +0000] [32653] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/workers/gthread.py", line 266, in handle
    req = next(conn.parser)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/parser.py", line 42, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.source_addr, self.req_count)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 180, in __init__
    super().__init__(cfg, unreader, peer_addr)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 192, in parse
    self.get_data(unreader, buf, stop=True)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 183, in get_data
    data = unreader.read()
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/unreader.py", line 37, in read
    d = self.chunk()
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/unreader.py", line 64, in chunk
    return self.sock.recv(self.mxchunk)
  File "/usr/lib/python3.7/ssl.py", line 1037, in recv
    return self.read(buflen)
  File "/usr/lib/python3.7/ssl.py", line 913, in read
    return self._sslobj.read(len)
OSError: [Errno 0] Error

However, this is a message that is sprinkled throughout the error log site! So, maybe not serious - until it dies! (Or maybe it dies silently from something else?)

Strangely, even though the app is 100% un-responsive, if I do:

ps -e | grep gunicorn

it shows 2 processes, as it normally does.

11354 ?        00:02:58 gunicorn
11357 ?        00:06:43 gunicorn

So, it seems that gunicorn has NOT exited; it has simply become un-responsive...

As stated at the beginning, re-starting gunicorn fixes the issue, but then a few months later - again! This has happened at the very least 3-4 times :(

The text was updated successfully, but these errors were encountered:

benoitc · 2022-10-14T07:09:53Z

what does your flask application? can you kill a worker manually and the worker is restarted?

BrainAnnex · 2022-10-14T22:03:31Z

My Flask application serves web pages... very common usage; nothing "exotic".

I will try to kill a worker manually - and see if it restarts, thanks.

2 weeks ago, I dropped the part --worker-class gthread --threads 3 -w 1 ... and I'm waiting to see if it will be able to run without problem for some months. So far, so good

benoitc · 2022-10-15T18:47:13Z

@BrainAnnex well hard to say since i don't reproduce it myself. Did you try to kill manually a worker with same configuration? Do you have none ssl connection landing on gunicorn from time to time? In any case coming update has new SSL handling.

BrainAnnex · 2022-10-23T01:26:39Z

I dropped the part --worker-class gthread --threads 3 -w 1
Now, I start gunicorn with:

gunicorn --certfile=/etc/letsencrypt/live/MY_DOMAIN.org/fullchain.pem 
--keyfile=/etc/letsencrypt/live/MY_DOMAIN.org/privkey.pem  
--error-logfile gunicorn_error_log.txt  
-b 0.0.0.0:443 main:app &> app_log.txt &

Now, while not crashing, it produces mountains of errors in the log file; they all say:

[2022-10-22 23:09:34 +0000] [15259] [ERROR] Socket error processing request.
Traceback (most recent call last):
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 135, in handle
    req = next(parser)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/parser.py", line 42, in __next__
    self.mesg = self.mesg_class(self.cfg, self.unreader, self.source_addr, self.req_count)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 180, in __init__
    super().__init__(cfg, unreader, peer_addr)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 54, in __init__
    unused = self.parse(self.unreader)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 192, in parse
    self.get_data(unreader, buf, stop=True)
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/message.py", line 183, in get_data
    data = unreader.read()
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/unreader.py", line 37, in read
    d = self.chunk()
  File "/brain_annex/venv/lib/python3.7/site-packages/gunicorn/http/unreader.py", line 64, in chunk
    return self.sock.recv(self.mxchunk)
  File "/usr/lib/python3.7/ssl.py", line 1037, in recv
    return self.read(buflen)
  File "/usr/lib/python3.7/ssl.py", line 913, in read
    return self._sslobj.read(len)
OSError: [Errno 0] Error

After the last of many, it also said:

[2022-10-23 00:08:46 +0000] [4683] [CRITICAL] WORKER TIMEOUT (pid:15259)
[2022-10-23 00:08:46 +0000] [15259] [INFO] Worker exiting (pid: 15259)
[2022-10-23 00:08:46 +0000] [21995] [INFO] Booting worker with pid: 21995
[2022-10-23 00:58:59 +0000] [4683] [CRITICAL] WORKER TIMEOUT (pid:21995)
[2022-10-23 00:58:59 +0000] [21995] [INFO] Worker exiting (pid: 21995)

It presumably re-starts: I'm getting logged out of my Flask web app about once/day... sometimes mid-use, and sometimes at times of no apparent use (I'm the only person on that web app)

As usual, I'm seeing two processes:

 4683 ?        00:02:38 gunicorn
22480 ?        00:00:01 gunicorn

Maybe the short running time on the 2nd process is related to the fact that it restarted minutes ago? (And logged me out as a result?)

Following your advice, I manually killed a process:

sudo kill -9 22480

and it re-started (logging me out of the Flask web app by doing so)

Now the end of the gunicorn error log says:

[2022-10-23 01:11:07 +0000] [4683] [WARNING] Worker with pid 22480 was terminated due to signal 9
[2022-10-23 01:11:07 +0000] [22560] [INFO] Booting worker with pid: 22560

So, it seems that we have:

mountains of errors
about once/day, one of those errors causes a worker process to restart (logging me out of the Flask app as a result)

sachalchandio · 2023-04-01T08:36:14Z

same issue starting with gunicorn app:app and works fine for a day then after 1 day one of the actions stops responding while everything else is working fine.

Amaimersion mentioned this issue Jan 29, 2023

Gunicorn gthread deadlock #2917

Closed

mickmis mentioned this issue Oct 16, 2023

CI sometimes fails due to timeout on query to service interuss/monitoring#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gunicorn stops working every few months (light load) #2876

gunicorn stops working every few months (light load) #2876

BrainAnnex commented Oct 4, 2022

benoitc commented Oct 14, 2022

BrainAnnex commented Oct 14, 2022

benoitc commented Oct 15, 2022

BrainAnnex commented Oct 23, 2022 •

edited

sachalchandio commented Apr 1, 2023

gunicorn stops working every few months (light load) #2876

gunicorn stops working every few months (light load) #2876

Comments

BrainAnnex commented Oct 4, 2022

benoitc commented Oct 14, 2022

BrainAnnex commented Oct 14, 2022

benoitc commented Oct 15, 2022

BrainAnnex commented Oct 23, 2022 • edited

sachalchandio commented Apr 1, 2023

BrainAnnex commented Oct 23, 2022 •

edited