Set `max_accept` on `gevent` worker-class to 1 when workers > 1 #2266

rowillia · 2020-02-13T01:35:32Z

We've had really terrible tail latencies with gevent and gunicorn under load.

Inspecting our services with strace we see the following:

23:11:01.651529 accept4(5, {sa_family=AF_UNIX}, [110->2], SOCK_CLOEXEC) = 223 <0.000015>
..{18 successful calls to accept4}...
23:11:01.652590 accept4(5, {sa_family=AF_UNIX}, [110->2], SOCK_CLOEXEC) = 249 <0.000010>
23:11:01.652647 accept4(5, 0x7ffcd46c09d0, [110], SOCK_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) <0.000012>
23:11:01.657622 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000009>
23:11:01.657682 recvfrom(223, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000011>
..{16 calls to recvfrom}...
23:11:01.740726 recvfrom(243, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000012>
23:11:01.746074 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000013>
23:11:01.746153 recvfrom(246, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000014>
23:11:01.751540 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000010>
23:11:01.751599 recvfrom(249, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000013>

Notice we see a flury of 20 accept4s followed by 20 calls to to recvfrom. Each call to recvfrom happens 5ms after the previous,
so the last recvfrom is called ~100ms after the call to accept4 for that fd.

gevent suggest setting max_accept to a lower value when there's multiple working processes on the same listening socket: https://github.com/gevent/gevent/blob/785b7b5546fcd0a184ea954f5d358539c530d95f/src/gevent/baseserver.py#L89-L102
gevent sets max_accept to 1 when wsgi.multiprocess is True: https://github.com/gevent/gevent/blob/9d27d269ed01a7e752966caa7a6f85d773780a1a/src/gevent/pywsgi.py#L1470-L1472
gunicorn does in fact set this when the number of workers is > 1:

gunicorn/gunicorn/http/wsgi.py

Line 73 in e4e20f2

"wsgi.multiprocess": (cfg.workers > 1),

and this gets passed to gevent.pywsgi.WSGIServer:

gunicorn/gunicorn/workers/ggevent.py

Lines 67 to 75 in e4e20f2

    
           environ = base_environ(self.cfg) 
        
           environ.update({ 
        
               "wsgi.multithread": True, 
        
               "SERVER_SOFTWARE": VERSION, 
        
           }) 
        
           server = self.server_class( 
        
               s, application=self.wsgi, spawn=pool, log=self.log, 
        
               handler_class=self.wsgi_handler, environ=environ, 
        
               **ssl_args)

However, when worker-class is gevent we directly create a gevent.server.StreamServer:

gunicorn/gunicorn/workers/ggevent.py

Lines 77 to 78 in e4e20f2

    
           hfun = partial(self.handle, s) 
        
           server = StreamServer(s, handle=hfun, spawn=pool, **ssl_args)

Fixing this dropped the p50 response time on an especially probelmatic benchmark from 250ms to 115ms.

We've had really terrible tail latencies with gevent and gunicorn under load. Inspecting our services with strace we see the following: ``` 23:11:01.651529 accept4(5, {sa_family=AF_UNIX}, [110->2], SOCK_CLOEXEC) = 223 <0.000015> ..{18 successful calls to accept4}... 23:11:01.652590 accept4(5, {sa_family=AF_UNIX}, [110->2], SOCK_CLOEXEC) = 249 <0.000010> 23:11:01.652647 accept4(5, 0x7ffcd46c09d0, [110], SOCK_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable) <0.000012> 23:11:01.657622 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000009> 23:11:01.657682 recvfrom(223, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000011> ..{16 calls to recvfrom}... 23:11:01.740726 recvfrom(243, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000012> 23:11:01.746074 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000013> 23:11:01.746153 recvfrom(246, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000014> 23:11:01.751540 getsockname(5, {sa_family=AF_UNIX, sun_path="/run/gunicorn/gunicorn.sock"}, [110->30]) = 0 <0.000010> 23:11:01.751599 recvfrom(249, "XXX"..., 8192, 0, NULL, NULL) = 511 <0.000013> ``` Notice we see a flury of 20 `accept4`s followed by 20 calls to to `recvfrom`. Each call to `recvfrom` happens 5ms after the previous, so the last `recvfrom` is called ~100ms after the call to `accept4` for that fd. gevent suggest setting `max_accept` to a lower value when there's multiple working processes on the same listening socket: https://github.com/gevent/gevent/blob/785b7b5546fcd0a184ea954f5d358539c530d95f/src/gevent/baseserver.py#L89-L102 gevent sets `max_accept` to `1` when `wsgi.multiprocess` is True: https://github.com/gevent/gevent/blob/9d27d269ed01a7e752966caa7a6f85d773780a1a/src/gevent/pywsgi.py#L1470-L1472 gunicorn does in fact set this when the number of workers is > 1: https://github.com/benoitc/gunicorn/blob/e4e20f273e95f505277a8dadf390bbdd162cfff4/gunicorn/http/wsgi.py#L73 and this gets passed to `gevent.pywsgi.WSGIServer`: https://github.com/benoitc/gunicorn/blob/e4e20f273e95f505277a8dadf390bbdd162cfff4/gunicorn/workers/ggevent.py#L67-L75 However, when `worker-class` is `gevent` we directly create a `gevent.server.StreamServer`: https://github.com/benoitc/gunicorn/blob/e4e20f273e95f505277a8dadf390bbdd162cfff4/gunicorn/workers/ggevent.py#L77-L78 Fixing this dropped the p50 response time on an especially probelmatic benchmark from 250ms to 115ms.

rowillia · 2020-02-13T01:41:44Z

@jamadden mind taking a look at this as well?

jamadden

Makes sense to me.

benoitc · 2020-02-13T10:33:29Z

Makes sense to me.

mmm? why? Server should be accepting as much as they want, then it's limited by the pool just like the thread worker. What could explain the latency actually if the socket is in a select loop?

jamadden · 2020-02-13T10:58:43Z

Quoting the referenced libuv documentation:

Having simultaneous accepts can significantly improve the rate of accepting connections (which is why it is enabled by default) but may lead to uneven load distribution in multi-process setups.

With values above 1, a single worker may accept many connections, preventing them from being accepted by other processes. But because the worker is single-threaded and non-preemptive, it may then only service the connections in a fairly sequential manner. If they'd been accepted by many processes much more parallelism could be exploited.

benoitc · 2020-02-13T11:06:39Z

Quoting the referenced libuv documentation:

Having simultaneous accepts can significantly improve the rate of accepting connections (which is why it is enabled by default) but may lead to uneven load distribution in multi-process setups.

With values above 1, a single worker may accept many connections, preventing them from being accepted by other processes. But because the worker is single-threaded and non-preemptive, it may then only service the connections in a fairly sequential manner. If they'd been accepted by many processes much more parallelism could be exploited.

yes, though the system should normally take care of load balancing connections by system threads first. What's surprising is that this is only appearing today on the radar. Was there any recent change that could have added that? Maybe at OS level?

jamadden · 2020-02-13T11:19:22Z

yes, though the system should normally take care of load balancing connections by system threads first.

Maybe. On linux, it's highly dependent on SO_REUSEPORT, and gunicorn's default value for reuse-port has flip-flopped a few times. For a while it was true, but as of 19.8, the default is false.

Without SO_REUSEPORT, connections tend to go to the busiest worker process on the machine, whereas with it turned on, connections are routed fairly.

rowillia · 2020-02-13T11:56:42Z

yes, though the system should normally take care of load balancing connections by system threads first. What's surprising is that this is only appearing today on the radar.

We've had this problem for years and generally ran with really poor CPU utilization in response. We finally tracked down part of the reason now.

jamadden · 2020-02-19T13:38:32Z

I'd like to merge this if there are no objections. For heavily loaded systems this is an important fix; for lightly loaded systems I think it's unlikely to have an observable effect.

jamadden · 2020-02-19T15:03:02Z

It'd also be good to backport to 19.x.

JordanP · 2020-02-25T11:06:21Z

We are running gunicorn v20 with gevent workers on servers that service around 250 RPS. We have an adequate monitoring to assess whether this will have an impact on p50 or P90. It's worth a shot.

Backport #2266 from master

jamadden approved these changes Feb 13, 2020

View reviewed changes

benoitc self-requested a review February 13, 2020 10:33

rowillia mentioned this pull request Feb 14, 2020

Support EPOLLEXCLUSIVE for epoll gevent/gevent#1536

Closed

jamadden self-assigned this Mar 1, 2020

jamadden merged commit 46b2cff into benoitc:master Mar 3, 2020

jamadden mentioned this pull request Mar 3, 2020

Backport #2266 from master #2283

Merged

jamadden added a commit that referenced this pull request Mar 3, 2020

Merge pull request #2283 from benoitc/backport-2266

0486da3

Backport #2266 from master

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set `max_accept` on `gevent` worker-class to 1 when workers > 1 #2266

Set `max_accept` on `gevent` worker-class to 1 when workers > 1 #2266

rowillia commented Feb 13, 2020

rowillia commented Feb 13, 2020

jamadden left a comment

benoitc commented Feb 13, 2020 •

edited

Loading

jamadden commented Feb 13, 2020

benoitc commented Feb 13, 2020

jamadden commented Feb 13, 2020

rowillia commented Feb 13, 2020

jamadden commented Feb 19, 2020

jamadden commented Feb 19, 2020

JordanP commented Feb 25, 2020

	environ = base_environ(self.cfg)
	environ.update({
	"wsgi.multithread": True,
	"SERVER_SOFTWARE": VERSION,
	})
	server = self.server_class(
	s, application=self.wsgi, spawn=pool, log=self.log,
	handler_class=self.wsgi_handler, environ=environ,
	**ssl_args)

	hfun = partial(self.handle, s)
	server = StreamServer(s, handle=hfun, spawn=pool, **ssl_args)

Set max_accept on gevent worker-class to 1 when workers > 1 #2266

Set max_accept on gevent worker-class to 1 when workers > 1 #2266

Conversation

rowillia commented Feb 13, 2020

rowillia commented Feb 13, 2020

jamadden left a comment

Choose a reason for hiding this comment

benoitc commented Feb 13, 2020 • edited Loading

jamadden commented Feb 13, 2020

benoitc commented Feb 13, 2020

jamadden commented Feb 13, 2020

rowillia commented Feb 13, 2020

jamadden commented Feb 19, 2020

jamadden commented Feb 19, 2020

JordanP commented Feb 25, 2020

Set `max_accept` on `gevent` worker-class to 1 when workers > 1 #2266

Set `max_accept` on `gevent` worker-class to 1 when workers > 1 #2266

benoitc commented Feb 13, 2020 •

edited

Loading