Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout exception after greenlet.switch with gevent worker on gunicorn newer than 0.15 #880

Closed
w0rse opened this issue Sep 10, 2014 · 16 comments
Closed

Comments

@w0rse
Copy link

@w0rse w0rse commented Sep 10, 2014

Hello, gunicorn!

We are seeing a lot of the following errors after upgrading gunicorn to the latest version:
2014-09-10 03:19:12 [3367] [ERROR] Socket error processing request.
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/gunicorn/workers/async.py", line 42, in handle
req = six.next(parser)
File "/usr/lib/python2.6/site-packages/gunicorn/http/parser.py", line 39, in next
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/usr/lib/python2.6/site-packages/gunicorn/http/message.py", line 151, in init
super(Request, self).init(cfg, unreader)
File "/usr/lib/python2.6/site-packages/gunicorn/http/message.py", line 48, in init
unused = self.parse(self.unreader)
File "/usr/lib/python2.6/site-packages/gunicorn/http/message.py", line 164, in parse
self.get_data(unreader, buf, stop=True)
File "/usr/lib/python2.6/site-packages/gunicorn/http/message.py", line 155, in get_data
data = unreader.read()
File "/usr/lib/python2.6/site-packages/gunicorn/http/unreader.py", line 37, in read
d = self.chunk()
File "/usr/lib/python2.6/site-packages/gunicorn/http/unreader.py", line 63, in chunk
return self.sock.recv(self.mxchunk)
File "/usr/lib64/python2.6/site-packages/gevent/socket.py", line 392, in recv
self._wait(self._read_event)
File "/usr/lib64/python2.6/site-packages/gevent/socket.py", line 298, in _wait
self.hub.wait(watcher)
File "/usr/lib64/python2.6/site-packages/gevent/hub.py", line 341, in wait
result = waiter.get()
File "/usr/lib64/python2.6/site-packages/gevent/hub.py", line 568, in get
return self.hub.switch()
File "/usr/lib64/python2.6/site-packages/gevent/hub.py", line 331, in switch
return greenlet.switch(self)
timeout: timed out

Further tests revealed that the same code produces that errors on any version of gunicorn newer than 0.15. Is there anything we can do to fight this behavior? We use gevent workers version 1.0.1 and gunicorn is behind a proxy with keepalive enabled.

@benoitc
Copy link
Owner

@benoitc benoitc commented Sep 22, 2014

do you have a way to reproduce it?

@w0rse
Copy link
Author

@w0rse w0rse commented Sep 22, 2014

Thank you for your reply. I have it on our production servers. Will try to produce a local example of this error.

@benoitc
Copy link
Owner

@benoitc benoitc commented Sep 22, 2014

thanks!

@w0rse
Copy link
Author

@w0rse w0rse commented Sep 30, 2014

Hey @benoitc
I was unable to reproduce this error locally, however I was able to fix it on production. I checked the diff between 0.15 and 0.16 and this one line fixed the problem: https://gist.github.com/w0rse/b45cacf53a9aad71f985

Please tell me is it the correct fix and why it's working :) Because there should've been some reason why this line was removed in 0.16.

@w0rse
Copy link
Author

@w0rse w0rse commented Oct 6, 2014

Any update on this one?

@benoitc
Copy link
Owner

@benoitc benoitc commented Oct 6, 2014

I didn't have time to check it ye sorry :/ I will have a look on it later in the day

@stefanfoulis
Copy link

@stefanfoulis stefanfoulis commented Nov 4, 2014

any update on this? I'm also getting this exception with gunicorn 18.0

@benoitc
Copy link
Owner

@benoitc benoitc commented Nov 4, 2014

@stefanfoulis i wil make sure to test that tomorrow, for the new release coming on friday.

@ochronus
Copy link

@ochronus ochronus commented Feb 6, 2015

@benoitc ping? :) Any updates?

@tilgovi
Copy link
Collaborator

@tilgovi tilgovi commented Feb 13, 2015

That might be an appropriate change. It explains the issue but it's worth discussing the solution.

ggevent.py doesn't set a timeout on the socket, but puts it in blocking mode because that's how gevent is meant to act. Reading the next request off the wire takes longer than the socket timeout but less time than the keep-alive timeout, which is the timeout_ctx around this call stack. As a result, a socket timeout bubbles into async.py and it gets logged.

Since the default keep-alive timeout is two seconds and the default worker timeout is thirty seconds (or, at least, A < B) the patch submitted by @w0rse works. The gevent worker aborts the request parsing and returns None and the core of the async worker closes the request.

It's also curious why the default for the socket is not a timeout of None / blocking such that this should never occur. Maybe you have a library that is calling socket.setdefaultimeout().

The worker timeout is probably not the right timeout to set on the client socket for async. It's a worker non-responsive timeout for gunicorn to kill dead processes, not really a request timeout, though it serves this purpose on the sync worker. The patch could be changed to set a different value, but I'm not sure what it should be. Maybe we could introduce one.

@benoitc @berkerpeksag does that all sound right to you?

@berkerpeksag
Copy link
Collaborator

@berkerpeksag berkerpeksag commented Feb 13, 2015

I'm not that familiar with the gevent worker, but SGTM.

@tilgovi
Copy link
Collaborator

@tilgovi tilgovi commented Feb 13, 2015

@ochronus you could check what getdefaulttimeout returns in your app.

@pierrei
Copy link

@pierrei pierrei commented Feb 18, 2015

We have this issue in production and been having for a while. The exceptions do not show up until after a day or two, then suddenly they start increasing and after a few days we get about one exception per second.

This graph shows the number of errors we get over time (the drop at the end is a new deploy = all workers are restarted):
gunicorn_errors

@ZsoltFabok
Copy link

@ZsoltFabok ZsoltFabok commented Jun 12, 2015

I'm working with @pierrei, and setting keepalive = 2 solved our issue. Originally, it was keepalive = 0, and when we made external calls, gunicorn killed the threads before the mentioned calls had the opportunity to return.

@MrKiven
Copy link

@MrKiven MrKiven commented Sep 1, 2016

I have the same problem, but my gevent version is 1.1.2, and gunicorn 19.3.0.

gevent/gevent#853

@benoitc
Copy link
Owner

@benoitc benoitc commented Nov 12, 2017

closed by #1616

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Bugs
Other
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants