Subscribers do not get notification if a scc-broker node restarts while processing messages #435

dkruchinin · 2018-07-23T15:19:06Z

I was testing different failure scenerious in socketcluster and I found out that subscribers do not get notified properly when an scc-broker node is restarted while it's still streaming messages. Instead terminating clinets websocket connection related to the subscriptions handled by the restarted scc-broker insance, a worker just re-initializes the connection to the newly launhed scc-broker leaving the clients unaware of the problem. They just end up getting only part of the messages sent by the publisher. Even if all the messages are persited by the workers, clients still end up not knowing that something went wrong and can't ask the workers to send them all the messages they've missed.

I created a simple test environment that spawns up an haproxy as a load-balancer, two default socketcluster workers, two scc-brokers, an scc-state instance and three socketcluster clients, one publisher and two subscribers, you can find the detailed description of how to reproduce the issue here - https://github.com/dkruchinin/socketcluster-sandbox

Is it a bug or a feature? If it's the latter, is there a way to propagate the error back to the clients through a middleware somehow?

jondubois · 2018-07-23T17:58:23Z

@dkruchinin This is a feature ;p SocketCluster doesn't do delivery guarantees but you can implement your own mechanism on top.

Telling a client if a message failed to be delivered is tricky because there are multiple scenarios to account for. For example, if a channel has 1000 subscribers and only 999 of them receive the message successfully; should we tell the publisher client that the publish operation was a success or a failure?

The publish operation currently only tells you if the message reached the front-facing server; beyond that it doesn't track the delivery to individual subscribers.

If you want to track if specific subscribers have received specific messages, then you can create special receipt/acknowledgement channels which subscribers can use to inform publishers whenever they receive certain messages.

It would be nice to write a client-side plugin though which could implement this guaranteed pub/sub receipt/ack feature; it shouldn't be too difficult.

dkruchinin · 2018-07-24T11:38:10Z

@jondubois

Thank you for fast reply.

For example, if a channel has 1000 subscribers and only 999 of them receive the message successfully; should we tell the publisher client that the publish operation was a success or a failure?

I don't think the publisher has to know anything about how its messages get distributed by SocketCluster and how many subscribers receive it. Delivery guarantees are good enough as long as I can assume that the messages at least hit the server-side handler that can be modified to persist them. In your example I would care more about building a mechanism that would notify the publisher once the message it sent is persisted by the server.

If you want to track if specific subscribers have received specific messages, then you can create special receipt/acknowledgement channels which subscribers can use to inform publishers whenever they receive certain messages.

Ideally, I don't want subscribers to deal with re-sending messasges to publishers, I would prefer it being resolved on the server side. It is perfectly fine if the subscriber sends a bunch of messages and then goes offline. If the server managed to receive all those messages, it can take care of propagating them to subscribers and provide some delivery guarantees like letting the subscribers know that the stream of messages was interrupted by a scc-broker failure and they have to re-connect and pull those messages that they've missed.

Basically, I'm trying to understand if SocketCluster is a good option for what I want to achive, namely:

the server acknowledges subscribers once their messages are written to a persistent sotre
in case of a hardware failure affecting one of the core components of the cluster responsible for publishing messages to subscribers (like scc-broker or woker), all clinet connections directly or indirectly managed by the failed component are interrupted, so that they can reinitialize the connection and figure out what they missed.

jondubois · 2018-07-24T13:13:27Z

@dkruchinin Now, when publishing, you can only verify that the message reached the server that you are directly connected to, beyond that there is no guarantee that the message has successfully propagated through the rest of the cluster.

There is some work being done right now which will offer delivery guarantee at the back-end/cluster level (e.g. it will retry failed deliveries which did not reach other nodes on the back end) - That feature could potentially be a couple of months away from completion though.

Ideally, when this feature is completed, you should be able to configure your SCC nodes to enable or disable delivery guarantees.

dkruchinin · 2018-07-24T14:18:35Z

@jondubois Thank you! Is there already a branch with a prototype I can keep an eye on?

jondubois · 2018-07-24T18:30:50Z

@dkruchinin; there is a branch by @BenV which supports adding custom middleware functions to a regular SC instance's broker process; see BenV/sc-broker@43adbe4; this will allow us to do things like delay the completion of a publish operation until it has fully propagated throughout the rest of the cluster (it will allow us to retry publishing multiple times behind the scenes in case an scc-broker instance fails behind the scenes).

The changes by @BenV are the first step. Then we'll need to also make changes to https://github.com/SocketCluster/scc-broker-client (this is the client which connects each scc-worker instance to back end scc-brokers) and also to scc-broker https://github.com/SocketCluster/scc-broker - I guess the expected behaviour would be to retry sending a message if it doesn't reach the other instance.

I think it should be configurable (can be enabled or disabled) because not all systems require delivery guarantees.

dkruchinin changed the title ~~Subscribers do not get notification if a scc-broker node restarts while streaming messages~~ Subscribers do not get notification if a scc-broker node restarts while processing messages Jul 23, 2018

dkruchinin closed this as completed Jul 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subscribers do not get notification if a scc-broker node restarts while processing messages #435

Subscribers do not get notification if a scc-broker node restarts while processing messages #435

dkruchinin commented Jul 23, 2018 •

edited

Loading

jondubois commented Jul 23, 2018 •

edited

Loading

dkruchinin commented Jul 24, 2018 •

edited

Loading

jondubois commented Jul 24, 2018 •

edited

Loading

dkruchinin commented Jul 24, 2018

jondubois commented Jul 24, 2018 •

edited

Loading

Subscribers do not get notification if a scc-broker node restarts while processing messages #435

Subscribers do not get notification if a scc-broker node restarts while processing messages #435

Comments

dkruchinin commented Jul 23, 2018 • edited Loading

jondubois commented Jul 23, 2018 • edited Loading

dkruchinin commented Jul 24, 2018 • edited Loading

jondubois commented Jul 24, 2018 • edited Loading

dkruchinin commented Jul 24, 2018

jondubois commented Jul 24, 2018 • edited Loading

dkruchinin commented Jul 23, 2018 •

edited

Loading

jondubois commented Jul 23, 2018 •

edited

Loading

dkruchinin commented Jul 24, 2018 •

edited

Loading

jondubois commented Jul 24, 2018 •

edited

Loading

jondubois commented Jul 24, 2018 •

edited

Loading