Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to perform zero downtime deploy? #42

Closed
ghost opened this issue Jan 13, 2015 · 3 comments
Closed

How to perform zero downtime deploy? #42

ghost opened this issue Jan 13, 2015 · 3 comments

Comments

@ghost
Copy link

ghost commented Jan 13, 2015

Am about to use SocketCluster in production.

I wondering if SocketCluster implement something like Naught internally?

If not, do you have any tool recommendation for this usage?

@jondubois
Copy link
Member

@maxime-crunding

You can restart all workers by sending a SIGUSR2 signal to the master process (master PID is logged when you start SC) or you can use the SocketCluster instances' killWorkers() method (on master) if you want to do it programmatically. The new workers will use the fresh code.

Workers typically take less than a second to restart - Not long enough for HTTP requests to timeout - So except from the fact that all active realtime connections are destroyed (and will have to reconnect), it might feel close to zero downtime. SC clients will automatically try to reconnect - So in effect, clients will miss a few seconds of realtime messages between the time they lose the connection and the time the 'connect' event triggers again.

Missing a few realtime messages isn't a huge deal if you're storing the messages in a database anyway (which is usually the case for most apps unless we want truly ethereal messaging) you can make your clients refetch the latest data when socket.on('connect', ...) triggers - That way they won't actually miss anything - This reduces the problem down to only to a slight delay.

True zero downtime deploy is difficult to achieve with realtime WebSocket connections because each client is attached to a single server. We could come up with a strategy to keep the old workers up (the ones that have active connections) and spawn some new ones (which use the new code) to handle all new connections and the old workers will be killed only when they have 0 clients left attached to them - But then what if we do several deploys in a row - We might end up with a LOT of workers using different versions of the code and it would get confusing when errors happen. So this approach is probably not worthwhile.

@ghost
Copy link
Author

ghost commented Jan 14, 2015

Sending a SIGUSR2 to the master PID suits me perfectly.
Thanks for digging in.

@ghost ghost closed this as completed Jan 14, 2015
@jondubois
Copy link
Member

I should probably add it to the docs... It's quite important :)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant