New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix thundering herd #792

Open
benoitc opened this Issue Jun 14, 2014 · 20 comments

Comments

9 participants
@benoitc
Owner

benoitc commented Jun 14, 2014

Currently all workers are accepting in // with no dialog which make them sometimes accepting the same connection triggering EAGAIN and increasing the CPU usage for nothing.

While modern OSes have mostly fixed that it still can happen in Gunicorn since we can listen on multiple interface.

Solution

The solution I see for that is to introduce some communication between the arbiter and the workers. The accept will still be executed directly in the callers workers if the socket accept returns ok. Otherwise the listen socket is "selected" in the arbiter using an eventloop and an input ready callback will run socket accept from a worker when the event is triggered.

Implementation details:

While it can change in the future by adding more methods like sharing memory between the arbiter and the worker, we will took the simple path for now:

  • 1 pipe will be maintained between the arbiter and the worker. This pipe will be used for the signaling.
  • The arbiter will put all listener sockets in an eventloop. Once the read event is triggered it will notify one of the available workers to accept.
  • For the eventloop it will use the selectors in python 3. It will backported for python 2.x

Usual garbage collection will take care about closing the pipe when needed.

* note* Possibly, the pipe will also let the workers notify the arbiter they are alive.

Problems to solve

Each async worker are accepting using their own method without much consideration to gunicorn right now. For example the gevent worker is using the gevent Server object, tornado and eventlet use similar system. We should find a way to adapt them to use the new socket signaling system.

Thoughts? Any other suggestions?

@diwu1989

This comment has been minimized.

Show comment
Hide comment
@diwu1989

diwu1989 Jun 15, 2014

Glad to see that there's going to be a thunder-lock in gunicorn.

diwu1989 commented Jun 15, 2014

Glad to see that there's going to be a thunder-lock in gunicorn.

@methane

This comment has been minimized.

Show comment
Hide comment
@methane

This comment has been minimized.

Show comment
Hide comment
@methane

methane Jun 15, 2014

Contributor

I think lock based approach is better than signaling based one.
Arbiter doesn't know about which worker is busy and how many connection coming to socket.

Contributor

methane commented Jun 15, 2014

I think lock based approach is better than signaling based one.
Arbiter doesn't know about which worker is busy and how many connection coming to socket.

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Jun 15, 2014

Owner

@methane not sure to follow, using IPC is about adding a lock system somehow... ( a semaphore or sort of is just that ;) .

The arbiter will know that a worker is busy or not because it will notify arbiter about it (releasing the lock he put on accept).

Owner

benoitc commented Jun 15, 2014

@methane not sure to follow, using IPC is about adding a lock system somehow... ( a semaphore or sort of is just that ;) .

The arbiter will know that a worker is busy or not because it will notify arbiter about it (releasing the lock he put on accept).

@diwu1989

This comment has been minimized.

Show comment
Hide comment
@diwu1989

diwu1989 Jun 20, 2014

Asking as an outsider, is this something that is feasible to do for the next minor version release or is this a giant feature?

diwu1989 commented Jun 20, 2014

Asking as an outsider, is this something that is feasible to do for the next minor version release or is this a giant feature?

@davisp

This comment has been minimized.

Show comment
Hide comment
@davisp

davisp Jun 20, 2014

Collaborator

Have their been reports about this being an issue? Seems awfully complex. Reading the link from @methane I'd probably vote for the signaling approach as well but as you point out that means we have to alter each worker so that they aren't selecting on the TCP socket and instead wait for the signal on the pipe. Seems reasonable I guess, just complicated.

Collaborator

davisp commented Jun 20, 2014

Have their been reports about this being an issue? Seems awfully complex. Reading the link from @methane I'd probably vote for the signaling approach as well but as you point out that means we have to alter each worker so that they aren't selecting on the TCP socket and instead wait for the signal on the pipe. Seems reasonable I guess, just complicated.

@methane

This comment has been minimized.

Show comment
Hide comment
@methane

methane Jun 20, 2014

Contributor

Following is comparing flows accepting new connection.

arbiter solution

  1. New connection coming
  2. arbiter wake up from epoll
  3. arbitor selects worker and send signal from pipe
  4. worker wake up from epoll
  5. Try accept

Lock solition

-2. Worker wake up and get lock
-1. Start epoll
0. New connection coming

  1. Worker wake up from epoll.
  2. accept connection and releases lock.

My thought

Lock solution is fewer context switch.

Lock solution is also better on concurrency. Under situation of massive new connection coming,
arbiter may be bottleneck and workers can't work while many cores idle.

So I prefer lock solution.

Contributor

methane commented Jun 20, 2014

Following is comparing flows accepting new connection.

arbiter solution

  1. New connection coming
  2. arbiter wake up from epoll
  3. arbitor selects worker and send signal from pipe
  4. worker wake up from epoll
  5. Try accept

Lock solition

-2. Worker wake up and get lock
-1. Start epoll
0. New connection coming

  1. Worker wake up from epoll.
  2. accept connection and releases lock.

My thought

Lock solution is fewer context switch.

Lock solution is also better on concurrency. Under situation of massive new connection coming,
arbiter may be bottleneck and workers can't work while many cores idle.

So I prefer lock solution.

@davisp

This comment has been minimized.

Show comment
Hide comment
@davisp

davisp Jun 20, 2014

Collaborator

@methane The down side of the lock is that its a single point of contention. With the signaling approach there's room for optimizations like running multiple accepts that don't require the synchronization under load. Not to mention the sheer complexity of attempting to write and support the cross-platform IPC locking scheme. Given the caveats in the article you linked to earlier I'm not really keen on attempting such a thing.

Contemplating the uwsgi article that @methane linked to earlier I'm still not convinced that this is even an issue we should be attempting to "fix" seeing as its really not an issue for modern kernels. I'd vote to tell people that actually experience this that they just need to upgrade their deployment targets. Then again I'm fairly adverse to introducing complexity.

Collaborator

davisp commented Jun 20, 2014

@methane The down side of the lock is that its a single point of contention. With the signaling approach there's room for optimizations like running multiple accepts that don't require the synchronization under load. Not to mention the sheer complexity of attempting to write and support the cross-platform IPC locking scheme. Given the caveats in the article you linked to earlier I'm not really keen on attempting such a thing.

Contemplating the uwsgi article that @methane linked to earlier I'm still not convinced that this is even an issue we should be attempting to "fix" seeing as its really not an issue for modern kernels. I'd vote to tell people that actually experience this that they just need to upgrade their deployment targets. Then again I'm fairly adverse to introducing complexity.

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Jun 20, 2014

Collaborator

@davisp if we were simply blocking on accept() in our workers that would be one thing, but, partly because we allow multiple listening sockets, our workers generally select on them, which means the kernel will wake them all.

Collaborator

tilgovi commented Jun 20, 2014

@davisp if we were simply blocking on accept() in our workers that would be one thing, but, partly because we allow multiple listening sockets, our workers generally select on them, which means the kernel will wake them all.

@davisp

This comment has been minimized.

Show comment
Hide comment
@davisp

davisp Jun 20, 2014

Collaborator

Oh right.

Collaborator

davisp commented Jun 20, 2014

Oh right.

@pypeng

This comment has been minimized.

Show comment
Hide comment
@pypeng

pypeng Jul 17, 2014

According the article of uwsgi: (Note: Apache is really smart about that, when it only needs to wait on a single file descriptor, it only calls accept() taking advantage of modern kernels anti-thundering herd policies)

How about we fix this common case where we only have one listening socket?

pypeng commented Jul 17, 2014

According the article of uwsgi: (Note: Apache is really smart about that, when it only needs to wait on a single file descriptor, it only calls accept() taking advantage of modern kernels anti-thundering herd policies)

How about we fix this common case where we only have one listening socket?

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Jul 17, 2014

Collaborator

+1
On Jul 16, 2014 9:10 PM, "pypeng" notifications@github.com wrote:

According the article of uwsgi: (Note: Apache is really smart about that,
when it only needs to wait on a single file descriptor, it only calls
accept() taking advantage of modern kernels anti-thundering herd policies)

How about we fix this common case where we only have one listening socket?


Reply to this email directly or view it on GitHub
#792 (comment).

Collaborator

tilgovi commented Jul 17, 2014

+1
On Jul 16, 2014 9:10 PM, "pypeng" notifications@github.com wrote:

According the article of uwsgi: (Note: Apache is really smart about that,
when it only needs to wait on a single file descriptor, it only calls
accept() taking advantage of modern kernels anti-thundering herd policies)

How about we fix this common case where we only have one listening socket?


Reply to this email directly or view it on GitHub
#792 (comment).

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Sep 12, 2014

Owner

@diwu1989 forgot to answer but this feature will appear in 20.0 in October.

Owner

benoitc commented Sep 12, 2014

@diwu1989 forgot to answer but this feature will appear in 20.0 in October.

@benoitc benoitc added this to the R20.0 milestone Sep 22, 2014

@benoitc benoitc removed this from the R20.0 milestone Dec 6, 2015

@RyPeck

This comment has been minimized.

Show comment
Hide comment
@RyPeck

RyPeck Mar 25, 2016

Contributor

@benoitc was this fixed? You may want to update the documentation here if so - http://docs.gunicorn.org/en/stable/faq.html#does-gunicorn-suffer-from-the-thundering-herd-problem

Contributor

RyPeck commented Mar 25, 2016

@benoitc was this fixed? You may want to update the documentation here if so - http://docs.gunicorn.org/en/stable/faq.html#does-gunicorn-suffer-from-the-thundering-herd-problem

@methane

This comment has been minimized.

Show comment
Hide comment
@methane
Contributor

methane commented Mar 25, 2016

@themanifold

This comment has been minimized.

Show comment
Hide comment
@themanifold

themanifold Apr 13, 2016

So this was added to the R20.0 mile stone, then removed. Have we decided not to work on this anymore then?

themanifold commented Apr 13, 2016

So this was added to the R20.0 mile stone, then removed. Have we decided not to work on this anymore then?

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Apr 13, 2016

Collaborator

I made the 20 milestone and provisionally added things without discussion or input from others. It was aspirational.

As far as I know we don't have a consensus work plan for the milestone. We should probably discuss soon :-)

Collaborator

tilgovi commented Apr 13, 2016

I made the 20 milestone and provisionally added things without discussion or input from others. It was aspirational.

As far as I know we don't have a consensus work plan for the milestone. We should probably discuss soon :-)

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Apr 13, 2016

Collaborator

Ah, I see Benoit added this one, then removed it. I would guess similar thoughts to mine.

Collaborator

tilgovi commented Apr 13, 2016

Ah, I see Benoit added this one, then removed it. I would guess similar thoughts to mine.

@benoitc benoitc added this to Acknowledged in Mailing List Feb 26, 2017

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Apr 28, 2018

Collaborator

Python has select.EPOLLEXCLUSIVE now. If someone wants to implement that, I would gladly review the PR.

Collaborator

tilgovi commented Apr 28, 2018

Python has select.EPOLLEXCLUSIVE now. If someone wants to implement that, I would gladly review the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment