New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout exception after greenlet.switch with gevent worker on gunicorn newer than 0.15 #880

Closed
w0rse opened this Issue Sep 10, 2014 · 16 comments

Comments

9 participants
@w0rse

w0rse commented Sep 10, 2014

Hello, gunicorn!

We are seeing a lot of the following errors after upgrading gunicorn to the latest version:
2014-09-10 03:19:12 [3367] [ERROR] Socket error processing request.
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/gunicorn/workers/async.py", line 42, in handle
req = six.next(parser)
File "/usr/lib/python2.6/site-packages/gunicorn/http/parser.py", line 39, in next
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/usr/lib/python2.6/site-packages/gunicorn/http/message.py", line 151, in init
super(Request, self).init(cfg, unreader)
File "/usr/lib/python2.6/site-packages/gunicorn/http/message.py", line 48, in init
unused = self.parse(self.unreader)
File "/usr/lib/python2.6/site-packages/gunicorn/http/message.py", line 164, in parse
self.get_data(unreader, buf, stop=True)
File "/usr/lib/python2.6/site-packages/gunicorn/http/message.py", line 155, in get_data
data = unreader.read()
File "/usr/lib/python2.6/site-packages/gunicorn/http/unreader.py", line 37, in read
d = self.chunk()
File "/usr/lib/python2.6/site-packages/gunicorn/http/unreader.py", line 63, in chunk
return self.sock.recv(self.mxchunk)
File "/usr/lib64/python2.6/site-packages/gevent/socket.py", line 392, in recv
self._wait(self._read_event)
File "/usr/lib64/python2.6/site-packages/gevent/socket.py", line 298, in _wait
self.hub.wait(watcher)
File "/usr/lib64/python2.6/site-packages/gevent/hub.py", line 341, in wait
result = waiter.get()
File "/usr/lib64/python2.6/site-packages/gevent/hub.py", line 568, in get
return self.hub.switch()
File "/usr/lib64/python2.6/site-packages/gevent/hub.py", line 331, in switch
return greenlet.switch(self)
timeout: timed out

Further tests revealed that the same code produces that errors on any version of gunicorn newer than 0.15. Is there anything we can do to fight this behavior? We use gevent workers version 1.0.1 and gunicorn is behind a proxy with keepalive enabled.

@benoitc

This comment has been minimized.

Owner

benoitc commented Sep 22, 2014

do you have a way to reproduce it?

@w0rse

This comment has been minimized.

w0rse commented Sep 22, 2014

Thank you for your reply. I have it on our production servers. Will try to produce a local example of this error.

@benoitc

This comment has been minimized.

Owner

benoitc commented Sep 22, 2014

thanks!

@w0rse

This comment has been minimized.

w0rse commented Sep 30, 2014

Hey @benoitc
I was unable to reproduce this error locally, however I was able to fix it on production. I checked the diff between 0.15 and 0.16 and this one line fixed the problem: https://gist.github.com/w0rse/b45cacf53a9aad71f985

Please tell me is it the correct fix and why it's working :) Because there should've been some reason why this line was removed in 0.16.

@w0rse

This comment has been minimized.

w0rse commented Oct 6, 2014

Any update on this one?

@benoitc

This comment has been minimized.

Owner

benoitc commented Oct 6, 2014

I didn't have time to check it ye sorry :/ I will have a look on it later in the day

@stefanfoulis

This comment has been minimized.

stefanfoulis commented Nov 4, 2014

any update on this? I'm also getting this exception with gunicorn 18.0

@benoitc

This comment has been minimized.

Owner

benoitc commented Nov 4, 2014

@stefanfoulis i wil make sure to test that tomorrow, for the new release coming on friday.

@ochronus

This comment has been minimized.

ochronus commented Feb 6, 2015

@benoitc ping? :) Any updates?

@tilgovi

This comment has been minimized.

Collaborator

tilgovi commented Feb 13, 2015

That might be an appropriate change. It explains the issue but it's worth discussing the solution.

ggevent.py doesn't set a timeout on the socket, but puts it in blocking mode because that's how gevent is meant to act. Reading the next request off the wire takes longer than the socket timeout but less time than the keep-alive timeout, which is the timeout_ctx around this call stack. As a result, a socket timeout bubbles into async.py and it gets logged.

Since the default keep-alive timeout is two seconds and the default worker timeout is thirty seconds (or, at least, A < B) the patch submitted by @w0rse works. The gevent worker aborts the request parsing and returns None and the core of the async worker closes the request.

It's also curious why the default for the socket is not a timeout of None / blocking such that this should never occur. Maybe you have a library that is calling socket.setdefaultimeout().

The worker timeout is probably not the right timeout to set on the client socket for async. It's a worker non-responsive timeout for gunicorn to kill dead processes, not really a request timeout, though it serves this purpose on the sync worker. The patch could be changed to set a different value, but I'm not sure what it should be. Maybe we could introduce one.

@benoitc @berkerpeksag does that all sound right to you?

@berkerpeksag

This comment has been minimized.

Collaborator

berkerpeksag commented Feb 13, 2015

I'm not that familiar with the gevent worker, but SGTM.

@tilgovi

This comment has been minimized.

Collaborator

tilgovi commented Feb 13, 2015

@ochronus you could check what getdefaulttimeout returns in your app.

@pierrei

This comment has been minimized.

pierrei commented Feb 18, 2015

We have this issue in production and been having for a while. The exceptions do not show up until after a day or two, then suddenly they start increasing and after a few days we get about one exception per second.

This graph shows the number of errors we get over time (the drop at the end is a new deploy = all workers are restarted):
gunicorn_errors

@ZsoltFabok

This comment has been minimized.

ZsoltFabok commented Jun 12, 2015

I'm working with @pierrei, and setting keepalive = 2 solved our issue. Originally, it was keepalive = 0, and when we made external calls, gunicorn killed the threads before the mentioned calls had the opportunity to return.

@MrKiven

This comment has been minimized.

MrKiven commented Sep 1, 2016

I have the same problem, but my gevent version is 1.1.2, and gunicorn 19.3.0.

gevent/gevent#853

@benoitc

This comment has been minimized.

Owner

benoitc commented Nov 12, 2017

closed by #1616

@benoitc benoitc closed this Nov 12, 2017

endreymarcell added a commit to prezi/gunicorn that referenced this issue Jan 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment