How to perform zero downtime deploy? #42

ghost · 2015-01-13T15:17:42Z

Am about to use SocketCluster in production.

I wondering if SocketCluster implement something like Naught internally?

If not, do you have any tool recommendation for this usage?

jondubois · 2015-01-13T21:53:35Z

@maxime-crunding

You can restart all workers by sending a SIGUSR2 signal to the master process (master PID is logged when you start SC) or you can use the SocketCluster instances' killWorkers() method (on master) if you want to do it programmatically. The new workers will use the fresh code.

Workers typically take less than a second to restart - Not long enough for HTTP requests to timeout - So except from the fact that all active realtime connections are destroyed (and will have to reconnect), it might feel close to zero downtime. SC clients will automatically try to reconnect - So in effect, clients will miss a few seconds of realtime messages between the time they lose the connection and the time the 'connect' event triggers again.

Missing a few realtime messages isn't a huge deal if you're storing the messages in a database anyway (which is usually the case for most apps unless we want truly ethereal messaging) you can make your clients refetch the latest data when socket.on('connect', ...) triggers - That way they won't actually miss anything - This reduces the problem down to only to a slight delay.

True zero downtime deploy is difficult to achieve with realtime WebSocket connections because each client is attached to a single server. We could come up with a strategy to keep the old workers up (the ones that have active connections) and spawn some new ones (which use the new code) to handle all new connections and the old workers will be killed only when they have 0 clients left attached to them - But then what if we do several deploys in a row - We might end up with a LOT of workers using different versions of the code and it would get confusing when errors happen. So this approach is probably not worthwhile.

ghost · 2015-01-14T09:20:03Z

Sending a SIGUSR2 to the master PID suits me perfectly.
Thanks for digging in.

jondubois · 2015-01-14T09:31:50Z

I should probably add it to the docs... It's quite important :)

ghost closed this as completed Jan 14, 2015

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to perform zero downtime deploy? #42

How to perform zero downtime deploy? #42

ghost commented Jan 13, 2015

jondubois commented Jan 13, 2015

ghost commented Jan 14, 2015

jondubois commented Jan 14, 2015

How to perform zero downtime deploy? #42

How to perform zero downtime deploy? #42

Comments

ghost commented Jan 13, 2015

jondubois commented Jan 13, 2015

ghost commented Jan 14, 2015

jondubois commented Jan 14, 2015