Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Shared Subscriptions when instance in HA mode #28

Closed
Tracked by #2156
knolleary opened this issue Jun 1, 2023 · 5 comments · Fixed by #29
Closed
Tracked by #2156

Add support for Shared Subscriptions when instance in HA mode #28

knolleary opened this issue Jun 1, 2023 · 5 comments · Fixed by #29
Labels
task A piece of work that isn't necessarily tied to a specific Epic or Story.
Milestone

Comments

@knolleary
Copy link
Member

knolleary commented Jun 1, 2023

Description

Part of FlowFuse/flowfuse#2156

When we introduce HA mode in FlowForge, these nodes need to use shared subscriptions so that messages sent to an instance are load balanced between the replicas.

In summary, the subscription topic needs to be prefixed with $share/<instanceId>/.

The MQTT auth server will need to be updated to allow these subscriptions.

We could opt to just move all subscriptions over, without caring about whether HA mode is enabled or not. Existing FF installs are pinned to an older version of the project nodes, so I don't think we have a danger of newer project nodes being run against older FF platforms that don't allow shared subs in the broker auth scheme. Everything should just work.

There is one major issue that needs overcoming. The Call nodes use responseTopics to get a reply back. As it stands, we have no way to direct a response back to a single replica.

I think the general shape of a fix for that will be:

  • The node generates a random identifier when it is instantiated. This is included in the response topic (again, auth server needs to be updated to allow this).
  • The node's subscription to the responseTopic should not use a shared subscription.

@Steve-Mcl does that sound plausible?

Epic/Story

No response

@knolleary knolleary added the task A piece of work that isn't necessarily tied to a specific Epic or Story. label Jun 1, 2023
@knolleary knolleary added this to the 1.8 milestone Jun 1, 2023
@Steve-Mcl
Copy link
Contributor

There is one major issue that needs overcoming. The Call nodes use responseTopics to get a reply back. As it stands, we have no way to direct a response back to a single replica.

I think the general shape of a fix for that will be:

  • The node generates a random identifier when it is instantiated. This is included in the response topic (again, auth server needs to be updated to allow this).
  • The node's subscription to the responseTopic should not use a shared subscription.

Agreed. Looking at the acls, it suggests we already permit ff/v1/<team>/p/<project>/res/+/# so in theory, we could add this "random id" after res e.g. /res/<ha-instance>/#

However, if we increment the const TOPIC_VERSION to "v2" we could be more specific e.g. ff/v2/<team>/p/<project>/<ha-instance>/res/+/#

Otherwise, agree with direction.

@knolleary
Copy link
Member Author

To keep things simple, I was going to go with ff/v1/<team>/p/<project>/res-<id>/+/# - not introduce another level in the hierarchy for this - whilst also keeping support for ff/v1/<team>/p/<project>/res/+/# for existing instances.

@Steve-Mcl
Copy link
Contributor

Steve-Mcl commented Jun 5, 2023

Yeah, that works too - however...

I have just been thinking this through. For the link call -> outwards, we will need to use shared$ subs (so that it load balances) but for the response, I believe we can actually leave response topic as it is. The reason this will work is due to there being correlationData (which is essentially msg.eventId) so in theory, it should just work?

@knolleary
Copy link
Member Author

You mean send the response to all replicas and rely on the correlationData for the 'right' replica to process the message? That isn't necessarily ideal as we'll be sending messages knowing they will be ignored.

I have got the response topic approach working - it does require including the 'id' of the response topic (eg res-abcd) in the messageEvent we send so the returning node can construct the responseTopic properly.

@knolleary
Copy link
Member Author

We've just spotted a significant flaw to the shared subscription approach.

To keep things simple, this PR moves all subs to use shared subs. However, if a flow is deployed to a device, that device will now also use a shared subscription. This will change the behaviour from a messaging being received by all devices running those flows, to it being distributed between all devices.

In short, we need to modify the shared sub logic to only kick-in if we detect we're running in an HA enabled cloud instance. Devices should not use shared subs at all.

@knolleary knolleary linked a pull request Jun 7, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task A piece of work that isn't necessarily tied to a specific Epic or Story.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants