-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set max_accept
on gevent
worker-class to 1 when workers > 1
#2266
Conversation
We've had really terrible tail latencies with gevent and gunicorn under load. Inspecting our services with strace we see the following: ``` 23:11:01.651529 accept4(5, {sa_family=AF_UNIX}, [110->2], SOCK_CLOEXEC) = 223 <0.000015> ..{18 successful calls to accept4}... 23:11:01.652590 accept4(5, {sa_family=AF_UNIX}, [110->2], SOCK_CLOEXEC) = 249 <0.000010> 23:11:01.652647 accept4(5, 0x7ffcd46c09d0, [110], SOCK_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) <0.000012> 23:11:01.657622 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000009> 23:11:01.657682 recvfrom(223, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000011> ..{16 calls to recvfrom}... 23:11:01.740726 recvfrom(243, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000012> 23:11:01.746074 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000013> 23:11:01.746153 recvfrom(246, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000014> 23:11:01.751540 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000010> 23:11:01.751599 recvfrom(249, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000013> ``` Notice we see a flury of 20 `accept4`s followed by 20 calls to to `recvfrom`. Each call to `recvfrom` happens 5ms after the previous, so the last `recvfrom` is called ~100ms after the call to `accept4` for that fd. gevent suggest setting `max_accept` to a lower value when there's multiple working processes on the same listening socket: https://github.com/gevent/gevent/blob/785b7b5546fcd0a184ea954f5d358539c530d95f/src/gevent/baseserver.py#L89-L102 gevent sets `max_accept` to `1` when `wsgi.multiprocess` is True: https://github.com/gevent/gevent/blob/9d27d269ed01a7e752966caa7a6f85d773780a1a/src/gevent/pywsgi.py#L1470-L1472 gunicorn does in fact set this when the number of workers is > 1: https://github.com/benoitc/gunicorn/blob/e4e20f273e95f505277a8dadf390bbdd162cfff4/gunicorn/http/wsgi.py#L73 and this gets passed to `gevent.pywsgi.WSGIServer`: https://github.com/benoitc/gunicorn/blob/e4e20f273e95f505277a8dadf390bbdd162cfff4/gunicorn/workers/ggevent.py#L67-L75 However, when `worker-class` is `gevent` we directly create a `gevent.server.StreamServer`: https://github.com/benoitc/gunicorn/blob/e4e20f273e95f505277a8dadf390bbdd162cfff4/gunicorn/workers/ggevent.py#L77-L78 Fixing this dropped the p50 response time on an especially probelmatic benchmark from 250ms to 115ms.
@jamadden mind taking a look at this as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
mmm? why? Server should be accepting as much as they want, then it's limited by the pool just like the thread worker. What could explain the latency actually if the socket is in a select loop? |
Quoting the referenced libuv documentation:
With values above 1, a single worker may accept many connections, preventing them from being accepted by other processes. But because the worker is single-threaded and non-preemptive, it may then only service the connections in a fairly sequential manner. If they'd been accepted by many processes much more parallelism could be exploited. |
yes, though the system should normally take care of load balancing connections by system threads first. What's surprising is that this is only appearing today on the radar. Was there any recent change that could have added that? Maybe at OS level? |
Maybe. On linux, it's highly dependent on Without |
We've had this problem for years and generally ran with really poor CPU utilization in response. We finally tracked down part of the reason now. |
I'd like to merge this if there are no objections. For heavily loaded systems this is an important fix; for lightly loaded systems I think it's unlikely to have an observable effect. |
It'd also be good to backport to 19.x. |
We are running gunicorn v20 with |
We've had really terrible tail latencies with gevent and gunicorn under load.
Inspecting our services with strace we see the following:
Notice we see a flury of 20
accept4
s followed by 20 calls to torecvfrom
. Each call torecvfrom
happens 5ms after the previous,so the last
recvfrom
is called ~100ms after the call toaccept4
for that fd.gevent suggest setting
max_accept
to a lower value when there's multiple working processes on the same listening socket: https://github.com/gevent/gevent/blob/785b7b5546fcd0a184ea954f5d358539c530d95f/src/gevent/baseserver.py#L89-L102gevent sets
max_accept
to1
whenwsgi.multiprocess
is True: https://github.com/gevent/gevent/blob/9d27d269ed01a7e752966caa7a6f85d773780a1a/src/gevent/pywsgi.py#L1470-L1472gunicorn does in fact set this when the number of workers is > 1:
gunicorn/gunicorn/http/wsgi.py
Line 73 in e4e20f2
and this gets passed to
gevent.pywsgi.WSGIServer
:gunicorn/gunicorn/workers/ggevent.py
Lines 67 to 75 in e4e20f2
However, when
worker-class
isgevent
we directly create agevent.server.StreamServer
:gunicorn/gunicorn/workers/ggevent.py
Lines 77 to 78 in e4e20f2
Fixing this dropped the p50 response time on an especially probelmatic benchmark from 250ms to 115ms.