Implement Graceful Shutdown / Connection Draining #177

agronick · 2018-02-20T19:31:01Z

This was present in the old version of Channels. The changelog says:

0.9.4 (2016-03-08)

Worker processes now exit gracefully (finish their current processing) when
sent SIGTERM or SIGINT.

This is no longer the case.

With the new architecture in Channels 2 this ability will need to be moved to Daphne. Daphne can not stop running. It will need some kind of API to reload new code while continuing to service existing connections on the old processes.

andrewgodwin · 2018-02-20T23:32:19Z

To clarify a bit more - this ticket will just be for graceful shutdown (connection draining), as restarting/reloading is much more complicated and will require us to do things with separate processes, which I am not keen to take on at the moment.

agronick · 2018-02-21T03:10:10Z

Yeah, I'm not sure what benefit that provides though. Once you stop accepting connections you need something to take it's place. The only way you can do that is with a proxy before Daphne. If the proxy is routing connections to another instance, connection draining would prevent something that wouldn't happen anyways.

Unless I'm missing something and there is a way to bind two processes to a port or socket or something.

andrewgodwin · 2018-02-21T03:36:30Z

Graceful shutdown is mostly so you can prevent new connections while you close out old ones, which is especially useful for WebSockets, which are more stateful than HTTP.

New Linux kernels do in fact allow you to bind two processes to a port using SO_REUSEPORT (https://lwn.net/Articles/542629/) - it would probably be nice to add support for this into Daphne to get full switchover without a separate loadbalancer.

agronick · 2018-02-21T14:55:05Z

Oh thats awesome.

andrewgodwin · 2018-03-07T17:35:51Z

As discussed on #182, SO_REUSEPORT is unfortunately not going to be easy in the short term, so instead we'll have to rely on people using the --fd option with process managers.

As for how restart without losing connections generally, the best way right now would be to use a loadbalancer (e.g. HAProxy) or process manager that supports graceful restarts itself and swap in and out servers as you change them over. Not ideal, I know - it only really works at large scale with automation. Hopefully I'll have time for proper graceful restart soon.

agronick · 2018-03-07T18:07:52Z

So will Daphne handle SIGINT by exiting after all connections terminate with the current codebase?

andrewgodwin · 2018-03-07T19:07:57Z

It won't until I implement it, which is why this ticket is still open. Right now it will just hard-exit.

agronick · 2018-03-08T16:11:14Z

If it did that it seems Circus would work fine. The file descriptor feature appears to work well with Circus.

Edit: After spending some more time with this, the best solution I found was to put HAProxy after Nginx. Its heaver than I would of hoped for, but it allows me to set multiple instances and put them into "drain" mode one by one. It has a web UI, and once an instance is drained I can load the new code and the users don't notice anything.

karolyi · 2018-05-08T10:52:14Z

+1, subscribing for notifications

acu192 · 2018-11-30T00:46:06Z

To solve this problem, I started using Uvicorn & Gunicorn (those names make me laugh every time I write them...). Gunicorn can deploy your new code by spinning up new workers for you then gracefully shutting down your old workers, so that you have no down-time. See the HUP signal here. Uvicorn implements ASGI and has a plugin to Gunicorn. See here. I was able to use those as a drop-in-replacement for Daphne (no changes needed to my channels code).

Turns out it has a nice side effect too... a very nice side effect... it's like 10x faster (at least for my deployment; of course your millage may vary). By "faster" I mean my server's CPU usage is much lower now. My server used to sit at ~20% CPU when I ran my "pummel the server" script. Same script, new interface server, CPU barely hits 2%. I rolled back to Daphne just to double check it! It holds.

I'm using Nginx as a proxy in front of Gunicorn. One weird thing I ran into is that if I had Nginx proxy to Gunicorn over a unix socket, I would get a weird exception somewhere deep inside channels (at request-time). If I proxy from Nginx to Gunicorn over TCP, it all works great. So that's where I left it. I didn't look into it further -- just something to be aware of it you try it out.

agronick · 2018-11-30T22:57:59Z

@acu192 Does the HUP handing actually work for you? I've tried it myself but the HUP signal causes it to reload immediately and drop all of it's websocket connections.

acu192 · 2018-11-30T23:13:51Z

Yeah it will drop websocket connections, but any "normal" HTTP connections should be drained gracefully before the old workers are shut down (I haven't tested it super-well, but it does seem to work based on some basic experimentation I've done -- I've only had this setup for a few days now). I don't know of a way to not have the websocket connections drop... since it's a long lived TCP connection, if the connection-holding process dies it will have to drop. The only solution I know of would be to let those old workers live a long time to hold open those old websocket connections (I don't want to do that). Or do something like channels 1 did where it had an entirely separate interface server (as its own process) which communicated to the workers via redis (or whatever channel layer). I was never a fan of that though -- channels 2 is way better in my opinion by having the workers be the interface servers as well.

In my case I don't mind if the websocket connections drop. They'll quickly reconnect and the user will never know. As long as the "normal" HTTP connections are all served (i.e. no one sees an error message when loading the page for the first time), then I'm happy in my case.

agronick · 2018-11-30T23:20:48Z

In my case I really need the websocket connections to drain. I have some pages where it wouldnt matter but we are doing things like web based ssh sessions. Haproxy is the only way I've found to drain websocket connections.

…

On Fri, Nov 30, 2018, 6:13 PM Ryan Henning ***@***.***> wrote: Yeah it will drop websocket connections, but any "normal" HTTP connections *should* be drained gracefully before the old workers are shut down (I haven't tested it super-well, but it does seem to work based on some basic experimentation I've done -- I've only had this setup for a few days now). I don't know of a way to *not* have the websocket connections drop... since it's a long lived TCP connection, if the connection-holding process dies it will have to drop. The only solution I know of would be to let those old workers live a long time to hold open those old websocket connections (I don't want to do that). Or do something like channels 1 did where it had an entirely separate interface server (as its own process) which communicated to the workers via redis (or whatever channel layer). I was never a fan of that though -- channels 2 is way better in my opinion by having the workers *be* the interface servers as well. In my case I don't mind if the websocket connections drop. They'll quickly reconnect and the user will never know. As long as the "normal" HTTP connections are all server (i.e. no one sees an error message when loading the page for the first time), then I'm happy in my case. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#177 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB8pv05gjDAUNlFhUqxaetg4Xnp7Rt-Pks5u0buwgaJpZM4SMgya> .

karolyi · 2018-11-30T23:27:20Z

@agronick, I can understand your problem with having the websockets disconnected, but as I've been told, websocket client connections should be built to withstand disconnections and reconnect/resync gracefully, without letting the user know (being stateless, practically). For the most part, this is done by many websocket clients. I built several ones that aren't even browser based, and every time they reconnect, they either exchange synchronization information with the server, or they assume everything continues as it were happening before. YMMV, but this should be the case most of the time. Maybe you want to put some extra connection handler into your client/server logic to handle disconnects. Cheers, -- László Károlyi http://linkedin.com/in/karolyi

…

On 2018-12-01 00:20, Kyle Agronick wrote: In my case I really need the websocket connections to drain. I have some pages where it wouldnt matter but we are doing things like web based ssh sessions. Haproxy is the only way I've found to drain websocket connections. On Fri, Nov 30, 2018, 6:13 PM Ryan Henning ***@***.***> wrote: > Yeah it will drop websocket connections, but any "normal" HTTP connections > *should* be drained gracefully before the old workers are shut down (I > haven't tested it super-well, but it does seem to work based on some basic > experimentation I've done -- I've only had this setup for a few days now). > I don't know of a way to *not* have the websocket connections drop... > since it's a long lived TCP connection, if the connection-holding process > dies it will have to drop. The only solution I know of would be to let > those old workers live a long time to hold open those old websocket > connections (I don't want to do that). Or do something like channels 1 did > where it had an entirely separate interface server (as its own process) > which communicated to the workers via redis (or whatever channel layer). I > was never a fan of that though -- channels 2 is way better in my opinion by > having the workers *be* the interface servers as well. > > In my case I don't mind if the websocket connections drop. They'll quickly > reconnect and the user will never know. As long as the "normal" HTTP > connections are all server (i.e. no one sees an error message when loading > the page for the first time), then I'm happy in my case. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#177 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AB8pv05gjDAUNlFhUqxaetg4Xnp7Rt-Pks5u0buwgaJpZM4SMgya> > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#177 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA8Pr3lPEq4qbKXDaDho7jjQ1u2TXB4rks5u0b1SgaJpZM4SMgya>.

agronick · 2018-12-01T01:54:31Z

Were not talking about the user disconnecting and reconnecting. We're talking about the process dying and a new one rebuilding the previous process' state in memory. Some things just can't be serialized and persisted. Other things aren't worth an exponentially larger development effort when connection draining solves the problem fine. Especially sockets. I don't know if there is even a way to hand a socket off from a process that is shutting down to a new process.

andrewgodwin · 2018-12-01T03:05:29Z

If you find uvicorn works better for you, then please use it! Daphne is a reference server but doesn't have as much active development, so it likely will never beat it.

acu192 · 2018-12-01T03:55:05Z

@andrewgodwin Thank you for working so hard to build channels! Btw, channels 2 is wonderful. All the changes are well worth breaking the interface from channels 1. It's great to see other project (like uvicorn) adopting the ASGI standard as well. Very well done.

Ken4scholars · 2020-04-10T06:38:43Z

Seems this ticket was left out. @andrewgodwin @carltongibson any plans in the nearest future to fix this? Thank you

carltongibson · 2020-04-10T06:42:07Z

@Ken4scholars No immediate plans no. Next priority is updating to be fully ready for Django 3.1 — which mostly involves making Channels ASGI v3 ready, and updating the documentation there.

If you would like to contribute then here is an opportunity!

ben-xo · 2023-08-18T22:32:39Z

This is something I'd be interested in as well.

cbeaujoin-stellar · 2024-05-24T13:28:14Z

Any updates ?

andrewgodwin added enhancement exp/advanced labels Feb 20, 2018

andrewgodwin changed the title ~~Implement Graceful Restart / Connection Draining~~ Implement Graceful Shutdown / Connection Draining Feb 20, 2018

andrewgodwin mentioned this issue Mar 7, 2018

Doc: deploying with multiple daphne processes (v1 -> v2) #182

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Graceful Shutdown / Connection Draining #177

Implement Graceful Shutdown / Connection Draining #177

agronick commented Feb 20, 2018 •

edited

Loading

andrewgodwin commented Feb 20, 2018

agronick commented Feb 21, 2018

andrewgodwin commented Feb 21, 2018

agronick commented Feb 21, 2018

andrewgodwin commented Mar 7, 2018

agronick commented Mar 7, 2018

andrewgodwin commented Mar 7, 2018

agronick commented Mar 8, 2018 •

edited

Loading

karolyi commented May 8, 2018

acu192 commented Nov 30, 2018

agronick commented Nov 30, 2018

acu192 commented Nov 30, 2018 •

edited

Loading

agronick commented Nov 30, 2018 via email

karolyi commented Nov 30, 2018 via email

agronick commented Dec 1, 2018 via email •

edited

Loading

andrewgodwin commented Dec 1, 2018

acu192 commented Dec 1, 2018 •

edited

Loading

Ken4scholars commented Apr 10, 2020 •

edited

Loading

carltongibson commented Apr 10, 2020

ben-xo commented Aug 18, 2023

cbeaujoin-stellar commented May 24, 2024

Implement Graceful Shutdown / Connection Draining #177

Implement Graceful Shutdown / Connection Draining #177

Comments

agronick commented Feb 20, 2018 • edited Loading

0.9.4 (2016-03-08)

andrewgodwin commented Feb 20, 2018

agronick commented Feb 21, 2018

andrewgodwin commented Feb 21, 2018

agronick commented Feb 21, 2018

andrewgodwin commented Mar 7, 2018

agronick commented Mar 7, 2018

andrewgodwin commented Mar 7, 2018

agronick commented Mar 8, 2018 • edited Loading

karolyi commented May 8, 2018

acu192 commented Nov 30, 2018

agronick commented Nov 30, 2018

acu192 commented Nov 30, 2018 • edited Loading

agronick commented Nov 30, 2018 via email

karolyi commented Nov 30, 2018 via email

agronick commented Dec 1, 2018 via email • edited Loading

andrewgodwin commented Dec 1, 2018

acu192 commented Dec 1, 2018 • edited Loading

Ken4scholars commented Apr 10, 2020 • edited Loading

carltongibson commented Apr 10, 2020

ben-xo commented Aug 18, 2023

cbeaujoin-stellar commented May 24, 2024

agronick commented Feb 20, 2018 •

edited

Loading

agronick commented Mar 8, 2018 •

edited

Loading

acu192 commented Nov 30, 2018 •

edited

Loading

agronick commented Dec 1, 2018 via email •

edited

Loading

acu192 commented Dec 1, 2018 •

edited

Loading

Ken4scholars commented Apr 10, 2020 •

edited

Loading