New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force gevent connected socket to blocking mode #1616

Merged
merged 1 commit into from Nov 12, 2017

Conversation

Projects
None yet
3 participants
@jkemp101
Contributor

jkemp101 commented Oct 6, 2017

The gevent worker assumes sockets are running in blocking mode. The current code forces the listening socket into blocking mode here. But the connected socket is not forced to blocking mode. Therefore if other code has set the default socket timeout with a call to setdefaulttimeout it will be set as the connected socket's timeout in the gevent initialization code here.

The gevent worker relies on the keepalive timeout to close idle connections. Setting the default timeout below the keepalive timeout causes the worker to run in a undefined mode. I believe this may fix the issues in #880 but it was not clear if everyone having that issue were also calling setdefaulttimeout. In our particular case we are calling setdefaulttimeout which caused sockets to close before the keepalive expired. This causes a bunch of socket timeout error messages and potentially 504 errors with AWS ELBs.

@jkemp101 jkemp101 closed this Oct 6, 2017

@jkemp101 jkemp101 reopened this Oct 9, 2017

@tilgovi

This comment has been minimized.

Collaborator

tilgovi commented Oct 10, 2017

Thank you for providing such detailed information and a patch.

The patch is simple and it sounds like it would work, but do you think that the assumption Gunicorn makes about the socket being blocking is a bad one? Should we discuss any alternative changes?

@jkemp101

This comment has been minimized.

Contributor

jkemp101 commented Oct 11, 2017

I was also concerned about that but came to the conclusion its probably ok. This gets a bit confusing since we are talking about blocking sockets that in the gevent world aren't necessarily really blocking anything.

My thoughts:

  1. The default value for the timeout puts the connected socket in blocking mode. Therefore the majority of the gevent worker users are already running with blocking sockets. I would guess most people calling setdefaulttimeout don't realize the implications to the gevent worker.
  2. The gevent.Timeout object in timeout_ctx seems to handle "idle" sockets that are waiting to receive data. The keepalive timeout seems appropriate here to close connections. This was important to me because AWS CLBs do a lot of pre-opened connections that may never send a request. I wanted to make sure if the CLB didn't close these connections after a reasonable amount of time they wouldn't stack up in gunicorn. The keepalive timer kicked in and closed these connections even when no request had been set.
  3. I believe setting the default timeout also affects sending data to clients. With blocking sockets the send would theoretically block the greenlet until the underlying TCP timers detected the socket was broken and closed it. I don't think we would want to use the keepalive timeout for this anyway so a new send timeout would probably be needed with an associated refactoring of the send response code. This seems like a pretty unlikely edge case.
@tilgovi

This comment has been minimized.

Collaborator

tilgovi commented Oct 11, 2017

Thanks for all that analysis. I agree with you. It does sound like it would fix #880, too. At least one comment there suggests that changing the socket timeout fixed the problem.

@benoitc benoitc merged commit 61431d4 into benoitc:master Nov 12, 2017

1 of 8 checks passed

couverture-io/py26 Code coverage has worsened
Details
couverture-io/py27 Code coverage has worsened
Details
couverture-io/py34 Code coverage has worsened
Details
couverture-io/py35 Code coverage has worsened
Details
couverture-io/py36 Code coverage has worsened
Details
couverture-io/py36-dev Code coverage has worsened
Details
couverture-io/py37 Code coverage has worsened
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

endreymarcell added a commit to prezi/gunicorn that referenced this pull request Jan 3, 2018

fofanov pushed a commit to fofanov/gunicorn that referenced this pull request Mar 16, 2018

Merge pull request benoitc#1616 from closeio/fix-non-blocking-connect…
…ed-socket

Force gevent connected socket to blocking mode

@jkemp101 jkemp101 deleted the closeio:fix-non-blocking-connected-socket branch Mar 16, 2018

@altaurog altaurog referenced this pull request Mar 18, 2018

Open

gunicorn 20 #1195

1 of 8 tasks complete

sokac added a commit to sokac/thrift that referenced this pull request Apr 27, 2018

Remove python socket timeout
Timeout is already set in HttpClient. The reason for removal is issue
with gunicorn and gevent as described here
benoitc/gunicorn#1616

sokac added a commit to sokac/thrift that referenced this pull request Apr 27, 2018

THRIFT-4561: Remove python socket timeout
Timeout is already set in HttpClient. The reason for removal is issue
with gunicorn and gevent as described here
benoitc/gunicorn#1616

jeking3 added a commit to apache/thrift that referenced this pull request May 7, 2018

THRIFT-4561: Remove python socket timeout
Timeout is already set in HttpClient. The reason for removal is issue
with gunicorn and gevent as described here
benoitc/gunicorn#1616
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment