Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Graceful Shutdown / Connection Draining #177

Open
agronick opened this issue Feb 20, 2018 · 21 comments
Open

Implement Graceful Shutdown / Connection Draining #177

agronick opened this issue Feb 20, 2018 · 21 comments

Comments

@agronick
Copy link
Contributor

agronick commented Feb 20, 2018

This was present in the old version of Channels. The changelog says:

0.9.4 (2016-03-08)

  • Worker processes now exit gracefully (finish their current processing) when
    sent SIGTERM or SIGINT.

This is no longer the case.

With the new architecture in Channels 2 this ability will need to be moved to Daphne. Daphne can not stop running. It will need some kind of API to reload new code while continuing to service existing connections on the old processes.

@andrewgodwin
Copy link
Member

To clarify a bit more - this ticket will just be for graceful shutdown (connection draining), as restarting/reloading is much more complicated and will require us to do things with separate processes, which I am not keen to take on at the moment.

@andrewgodwin andrewgodwin changed the title Implement Graceful Restart / Connection Draining Implement Graceful Shutdown / Connection Draining Feb 20, 2018
@agronick
Copy link
Contributor Author

Yeah, I'm not sure what benefit that provides though. Once you stop accepting connections you need something to take it's place. The only way you can do that is with a proxy before Daphne. If the proxy is routing connections to another instance, connection draining would prevent something that wouldn't happen anyways.

Unless I'm missing something and there is a way to bind two processes to a port or socket or something.

@andrewgodwin
Copy link
Member

Graceful shutdown is mostly so you can prevent new connections while you close out old ones, which is especially useful for WebSockets, which are more stateful than HTTP.

New Linux kernels do in fact allow you to bind two processes to a port using SO_REUSEPORT (https://lwn.net/Articles/542629/) - it would probably be nice to add support for this into Daphne to get full switchover without a separate loadbalancer.

@agronick
Copy link
Contributor Author

Oh thats awesome.

@andrewgodwin
Copy link
Member

As discussed on #182, SO_REUSEPORT is unfortunately not going to be easy in the short term, so instead we'll have to rely on people using the --fd option with process managers.

As for how restart without losing connections generally, the best way right now would be to use a loadbalancer (e.g. HAProxy) or process manager that supports graceful restarts itself and swap in and out servers as you change them over. Not ideal, I know - it only really works at large scale with automation. Hopefully I'll have time for proper graceful restart soon.

@agronick
Copy link
Contributor Author

agronick commented Mar 7, 2018

So will Daphne handle SIGINT by exiting after all connections terminate with the current codebase?

@andrewgodwin
Copy link
Member

It won't until I implement it, which is why this ticket is still open. Right now it will just hard-exit.

@agronick
Copy link
Contributor Author

agronick commented Mar 8, 2018

If it did that it seems Circus would work fine. The file descriptor feature appears to work well with Circus.

Edit: After spending some more time with this, the best solution I found was to put HAProxy after Nginx. Its heaver than I would of hoped for, but it allows me to set multiple instances and put them into "drain" mode one by one. It has a web UI, and once an instance is drained I can load the new code and the users don't notice anything.

@karolyi
Copy link
Contributor

karolyi commented May 8, 2018

+1, subscribing for notifications

@acu192
Copy link

acu192 commented Nov 30, 2018

To solve this problem, I started using Uvicorn & Gunicorn (those names make me laugh every time I write them...). Gunicorn can deploy your new code by spinning up new workers for you then gracefully shutting down your old workers, so that you have no down-time. See the HUP signal here. Uvicorn implements ASGI and has a plugin to Gunicorn. See here. I was able to use those as a drop-in-replacement for Daphne (no changes needed to my channels code).

Turns out it has a nice side effect too... a very nice side effect... it's like 10x faster (at least for my deployment; of course your millage may vary). By "faster" I mean my server's CPU usage is much lower now. My server used to sit at ~20% CPU when I ran my "pummel the server" script. Same script, new interface server, CPU barely hits 2%. I rolled back to Daphne just to double check it! It holds.

I'm using Nginx as a proxy in front of Gunicorn. One weird thing I ran into is that if I had Nginx proxy to Gunicorn over a unix socket, I would get a weird exception somewhere deep inside channels (at request-time). If I proxy from Nginx to Gunicorn over TCP, it all works great. So that's where I left it. I didn't look into it further -- just something to be aware of it you try it out.

@agronick
Copy link
Contributor Author

@acu192 Does the HUP handing actually work for you? I've tried it myself but the HUP signal causes it to reload immediately and drop all of it's websocket connections.

@acu192
Copy link

acu192 commented Nov 30, 2018

Yeah it will drop websocket connections, but any "normal" HTTP connections should be drained gracefully before the old workers are shut down (I haven't tested it super-well, but it does seem to work based on some basic experimentation I've done -- I've only had this setup for a few days now). I don't know of a way to not have the websocket connections drop... since it's a long lived TCP connection, if the connection-holding process dies it will have to drop. The only solution I know of would be to let those old workers live a long time to hold open those old websocket connections (I don't want to do that). Or do something like channels 1 did where it had an entirely separate interface server (as its own process) which communicated to the workers via redis (or whatever channel layer). I was never a fan of that though -- channels 2 is way better in my opinion by having the workers be the interface servers as well.

In my case I don't mind if the websocket connections drop. They'll quickly reconnect and the user will never know. As long as the "normal" HTTP connections are all served (i.e. no one sees an error message when loading the page for the first time), then I'm happy in my case.

@agronick
Copy link
Contributor Author

agronick commented Nov 30, 2018 via email

@karolyi
Copy link
Contributor

karolyi commented Nov 30, 2018 via email

@agronick
Copy link
Contributor Author

agronick commented Dec 1, 2018 via email

@andrewgodwin
Copy link
Member

If you find uvicorn works better for you, then please use it! Daphne is a reference server but doesn't have as much active development, so it likely will never beat it.

@acu192
Copy link

acu192 commented Dec 1, 2018

@andrewgodwin Thank you for working so hard to build channels! Btw, channels 2 is wonderful. All the changes are well worth breaking the interface from channels 1. It's great to see other project (like uvicorn) adopting the ASGI standard as well. Very well done.

@Ken4scholars
Copy link

Ken4scholars commented Apr 10, 2020

Seems this ticket was left out. @andrewgodwin @carltongibson any plans in the nearest future to fix this? Thank you

@carltongibson
Copy link
Member

@Ken4scholars No immediate plans no. Next priority is updating to be fully ready for Django 3.1 — which mostly involves making Channels ASGI v3 ready, and updating the documentation there.

If you would like to contribute then here is an opportunity!

@ben-xo
Copy link

ben-xo commented Aug 18, 2023

This is something I'd be interested in as well.

@cbeaujoin-stellar
Copy link

Any updates ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants