-
-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subscribers do not get notification if a scc-broker node restarts while processing messages #435
Comments
@dkruchinin This is a feature ;p SocketCluster doesn't do delivery guarantees but you can implement your own mechanism on top. Telling a client if a message failed to be delivered is tricky because there are multiple scenarios to account for. For example, if a channel has 1000 subscribers and only 999 of them receive the message successfully; should we tell the publisher client that the publish operation was a success or a failure? The publish operation currently only tells you if the message reached the front-facing server; beyond that it doesn't track the delivery to individual subscribers. If you want to track if specific subscribers have received specific messages, then you can create special receipt/acknowledgement channels which subscribers can use to inform publishers whenever they receive certain messages. It would be nice to write a client-side plugin though which could implement this guaranteed pub/sub receipt/ack feature; it shouldn't be too difficult. |
Thank you for fast reply.
I don't think the publisher has to know anything about how its messages get distributed by SocketCluster and how many subscribers receive it. Delivery guarantees are good enough as long as I can assume that the messages at least hit the server-side handler that can be modified to persist them. In your example I would care more about building a mechanism that would notify the publisher once the message it sent is persisted by the server.
Ideally, I don't want subscribers to deal with re-sending messasges to publishers, I would prefer it being resolved on the server side. It is perfectly fine if the subscriber sends a bunch of messages and then goes offline. If the server managed to receive all those messages, it can take care of propagating them to subscribers and provide some delivery guarantees like letting the subscribers know that the stream of messages was interrupted by a scc-broker failure and they have to re-connect and pull those messages that they've missed. Basically, I'm trying to understand if SocketCluster is a good option for what I want to achive, namely:
|
@dkruchinin Now, when publishing, you can only verify that the message reached the server that you are directly connected to, beyond that there is no guarantee that the message has successfully propagated through the rest of the cluster. There is some work being done right now which will offer delivery guarantee at the back-end/cluster level (e.g. it will retry failed deliveries which did not reach other nodes on the back end) - That feature could potentially be a couple of months away from completion though. Ideally, when this feature is completed, you should be able to configure your SCC nodes to enable or disable delivery guarantees. |
@jondubois Thank you! Is there already a branch with a prototype I can keep an eye on? |
@dkruchinin; there is a branch by @BenV which supports adding custom middleware functions to a regular SC instance's broker process; see BenV/sc-broker@43adbe4; this will allow us to do things like delay the completion of a publish operation until it has fully propagated throughout the rest of the cluster (it will allow us to retry publishing multiple times behind the scenes in case an scc-broker instance fails behind the scenes). The changes by @BenV are the first step. Then we'll need to also make changes to https://github.com/SocketCluster/scc-broker-client (this is the client which connects each scc-worker instance to back end scc-brokers) and also to scc-broker https://github.com/SocketCluster/scc-broker - I guess the expected behaviour would be to retry sending a message if it doesn't reach the other instance. I think it should be configurable (can be enabled or disabled) because not all systems require delivery guarantees. |
I was testing different failure scenerious in socketcluster and I found out that subscribers do not get notified properly when an scc-broker node is restarted while it's still streaming messages. Instead terminating clinets websocket connection related to the subscriptions handled by the restarted scc-broker insance, a worker just re-initializes the connection to the newly launhed scc-broker leaving the clients unaware of the problem. They just end up getting only part of the messages sent by the publisher. Even if all the messages are persited by the workers, clients still end up not knowing that something went wrong and can't ask the workers to send them all the messages they've missed.
I created a simple test environment that spawns up an haproxy as a load-balancer, two default socketcluster workers, two scc-brokers, an scc-state instance and three socketcluster clients, one publisher and two subscribers, you can find the detailed description of how to reproduce the issue here - https://github.com/dkruchinin/socketcluster-sandbox
Is it a bug or a feature? If it's the latter, is there a way to propagate the error back to the clients through a middleware somehow?
The text was updated successfully, but these errors were encountered: