High Availability & Scaling #1678

joepavitt · 2023-02-07T19:13:17Z

Description

Regular request from customers, offering scaling and high availability of Node-RED instances on FlowForge.

First steps would require multiple instances running the same flows, which crosses over with #1492

Significant technical challenges around state management across instances

joepavitt · 2023-02-07T22:47:50Z

In a call with a customer, they mentioned they had used the following for building their own Node-RED "high availability & scaling"

Although unlikely to be useful given our architecture, thought it worth at least sharing.

hardillb · 2023-03-01T13:27:53Z

Thoughts on horizontally scaling Node-RED in FlowForge environment

Kubernates supports horizontal scaling of pods (as part of a ReplicaSet managed by a Deployment).
- Neither Docker or LocalFS provide any native support for this type of scaling
- This is N live pods for a given project, no way to run in live/hot spare mode
- Round robin routing between pods for incoming requests, no session affinity
  - No easy way to run the editor against more than one replica (requests spread across all instances)
It does not support a way to differentiate between the replicas
This means that each pod runs exactly the same flow with all the same environment variables
- No way to run different VPN clients details per pod
- MQTT client ids (Project Nodes) need to all be randomly generated to prevent clashes
  https://github.com/flowforge/flowforge-nr-project-nodes/blob/6dea9bd89b94b316c30443d7cae3a9fd9e4ce834/nodes/project-link.js#L512-L519

Also need to think about logging from the different replicas

hardillb · 2023-03-06T15:47:32Z

Any flow that ends up being hosted on multiple pods will need to be carefully written to ensure that they are either totally stateless or that ALL state is stored in a backend system.

Will need to consider how this would work with the persistent context, especially with the cache that was added to support synchronous

knolleary · 2023-03-15T17:45:37Z

Continuing design work and planning during 1.6 with a goal to have something clearly defined for delivery in 1.7.

MarianRaphael · 2023-03-28T08:39:08Z

From my perspective, a first version of High Availability (HA) and Scaling (let's call it HA Instances) comes with certain restrictions:

A HA Instance doesn't have an editor (like currently devices). The user needs to develop the flows on another instance type and then "push" the results to the HA instance (Requirements Multiple FlowForge Hosted Instances per Application #1689 and Staged Deployments #1492).
However, since HA and scaling can only be achieved if the flows (the entire snapshot) are stateless or store all states in a separate storage layer, we have to inform the user about this fact and provide them with examples of how to achieve it. It could be an information pop-up before ordering, explaining the necessity.
VPN (Custom VPN support for FlowForge applications #1570) is not possible with a HA instances.

Later, when we see there is a certain customer need, we can work on how to resolve identified restrictions.

(In my opinion, points one and three could potentially be resolved with sufficient time and testing. However, for point two, I'm currently lacking a feasible approach, especially in terms of scaling. In this case, I believe an approach for automatic failover would be more appropriate.)

Related Feature

#1920

User Story

As a customer of FlowForge,
I would like to have the option to utilize high availability and automatic scaling instances.
This will allow me to run business-critical processes within Node-RED and ensure that they are always available and can handle increasing workloads.

Assumption

Flows for business critical processes will not be developed in the target system / instance.
It is assumed that we can effectively communicate the significance of stateless functions to the user and ensure that they understand the concept and use it correctly.

MarianRaphael · 2023-05-12T08:44:24Z

Info about: Load Balancing with Shared Subscriptions - MQTT Client at HiveMQ:
https://www.hivemq.com/blog/mqtt-client-load-balancing-with-shared-subscriptions/

DasGermanPhysicist · 2023-05-27T05:20:35Z

Regarding scaling: as a manufacturing customer, I want to read out thousands of tags of data, be it e.g. from one or multiple OPC-UA servers, Modbus controllers or an MQTT broker, in order to support optimization of my entire fleet of assets or my processes.

When creating a node red flow, I want to configure the logic once, but configure it for thousands of input data tags, potentially across multiple edge deployment sites, and hence across multiple deployments. E.g. having something similar to an app.yaml per deployment would help me manage my data at scale.

MarianRaphael · 2023-05-30T06:45:55Z

@DasGermanPhysicist thank you for your insightful commentary regarding scaling in the realm of manufacturing.

On this note, I'd like to inquire further about your views on the concept of snapshots. A snapshot, as you may know, is a point-in-time backup of a Node-RED instance, capturing:

The flows
Credentials
Environment variables
NPM packages, with locked versions
Runtime settings

These snapshots can also be pushed to devices connected to the instance. (More about this here: Snapshots Documentation). Is there any aspect of this functionality that you think could be enhanced to provide a more streamlined user experience?

I genuinely value your opinion and would love to get more insights on your experience with FlowForge. Please feel free to connect with me on LinkedIn to share your experiences and thoughts. Understanding your needs and requirements better would be of utmost importance for us to deliver an improved and tailored experience.

Looking forward to hearing from you.

joepavitt · 2023-05-30T08:11:51Z

Hi @DasGermanPhysicist I'm curious to understand also if Environment Variables could act here as your "app.yml"? These are configurable for each Instance/Device of Node-RED within FlowForge. Enables a single build of a flow, then deployment out to each device customised by their respective Environment Variables.

Documentation: https://flowforge.com/docs/user/envvar/#environment-variables

DasGermanPhysicist · 2023-05-30T14:42:19Z

@joepavitt env vars would absolutely suffice, if they can be changed at scale. E.g. how would one change the env vars on 1000 devices according to a pattern (e.g. opc-ua tag names all follow the pattern <device_name>_piston_temp etc.)?

DasGermanPhysicist · 2023-05-30T14:45:57Z

@MarianRaphael I have not yet played around with flowforge snapshots. However, for me it would be important to have deployment groups, e.g. a test/dev group and a production group, and I would require to target these groups with different snapshots.

I did not know that env vars were captured on snapshot / application level. I assumed that different devices would have different env vars, as they describe the specific environment the device is in (and the flow will be in once it is deployed to the specific device).

MarianRaphael · 2023-07-06T14:36:23Z

MVP for HA in the 1.8 release (see follow-up issues)

joepavitt added size:XXL - 13 Sizing estimation point epic A significant feature or piece of work that doesn't easily fit into a single release priority:high High Priority labels Feb 7, 2023

joepavitt mentioned this issue Feb 9, 2023

Multiple FlowForge Hosted Instances per Application #1689

Closed

knolleary assigned hardillb Mar 1, 2023

MarianRaphael mentioned this issue Mar 31, 2023

Multithreading on Node Red #1287

Closed

MarianRaphael added this to the 1.7 milestone Apr 5, 2023

MarianRaphael mentioned this issue Apr 5, 2023

Automatic failover for Node-RED Instances #1920

Open

3 tasks

MarianRaphael added the headline Something to highlight in the release label Apr 5, 2023

MarianRaphael removed this from the 1.7 milestone Apr 5, 2023

MarianRaphael added the needs-triage Needs looking at to decide what to do label Apr 5, 2023

MarianRaphael added this to the 1.7 milestone Apr 13, 2023

MarianRaphael removed the size:XXL - 13 Sizing estimation point label May 9, 2023

MarianRaphael modified the milestones: 1.7, 1.8 May 11, 2023

This was referenced May 19, 2023

Instance Load Balancing #2156

Closed

Automatic failover for Devices #2157

Open

Auto-scaling of replicas #2174

Open

MarianRaphael removed this from the 1.8 milestone Jun 9, 2023

MarianRaphael added this to the 1.9 milestone Jun 9, 2023

MarianRaphael closed this as completed Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Availability & Scaling #1678

High Availability & Scaling #1678

joepavitt commented Feb 7, 2023 •

edited

joepavitt commented Feb 7, 2023

hardillb commented Mar 1, 2023 •

edited

hardillb commented Mar 6, 2023

knolleary commented Mar 15, 2023

MarianRaphael commented Mar 28, 2023 •

edited

MarianRaphael commented May 12, 2023

DasGermanPhysicist commented May 27, 2023

MarianRaphael commented May 30, 2023

joepavitt commented May 30, 2023

DasGermanPhysicist commented May 30, 2023

DasGermanPhysicist commented May 30, 2023

MarianRaphael commented Jul 6, 2023

High Availability & Scaling #1678

High Availability & Scaling #1678

Comments

joepavitt commented Feb 7, 2023 • edited

Description

joepavitt commented Feb 7, 2023

hardillb commented Mar 1, 2023 • edited

hardillb commented Mar 6, 2023

knolleary commented Mar 15, 2023

MarianRaphael commented Mar 28, 2023 • edited

Related Feature

User Story

Assumption

MarianRaphael commented May 12, 2023

DasGermanPhysicist commented May 27, 2023

MarianRaphael commented May 30, 2023

joepavitt commented May 30, 2023

DasGermanPhysicist commented May 30, 2023

DasGermanPhysicist commented May 30, 2023

MarianRaphael commented Jul 6, 2023

joepavitt commented Feb 7, 2023 •

edited

hardillb commented Mar 1, 2023 •

edited

MarianRaphael commented Mar 28, 2023 •

edited