Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Availability & Scaling #1678

Closed
joepavitt opened this issue Feb 7, 2023 · 12 comments
Closed

High Availability & Scaling #1678

joepavitt opened this issue Feb 7, 2023 · 12 comments
Assignees
Labels
epic A significant feature or piece of work that doesn't easily fit into a single release headline Something to highlight in the release needs-triage Needs looking at to decide what to do priority:high High Priority
Milestone

Comments

@joepavitt
Copy link
Contributor

joepavitt commented Feb 7, 2023

Description

Regular request from customers, offering scaling and high availability of Node-RED instances on FlowForge.

First steps would require multiple instances running the same flows, which crosses over with #1492

Significant technical challenges around state management across instances

@joepavitt joepavitt added size:XXL - 13 Sizing estimation point epic A significant feature or piece of work that doesn't easily fit into a single release priority:high High Priority labels Feb 7, 2023
@joepavitt
Copy link
Contributor Author

In a call with a customer, they mentioned they had used the following for building their own Node-RED "high availability & scaling"

Although unlikely to be useful given our architecture, thought it worth at least sharing.

@hardillb
Copy link
Contributor

hardillb commented Mar 1, 2023

Thoughts on horizontally scaling Node-RED in FlowForge environment

  • Kubernates supports horizontal scaling of pods (as part of a ReplicaSet managed by a Deployment).
    • Neither Docker or LocalFS provide any native support for this type of scaling
    • This is N live pods for a given project, no way to run in live/hot spare mode
    • Round robin routing between pods for incoming requests, no session affinity
      • No easy way to run the editor against more than one replica (requests spread across all instances)
  • It does not support a way to differentiate between the replicas
  • This means that each pod runs exactly the same flow with all the same environment variables

Also need to think about logging from the different replicas

@hardillb
Copy link
Contributor

hardillb commented Mar 6, 2023

Any flow that ends up being hosted on multiple pods will need to be carefully written to ensure that they are either totally stateless or that ALL state is stored in a backend system.

Will need to consider how this would work with the persistent context, especially with the cache that was added to support synchronous

@knolleary
Copy link
Member

Continuing design work and planning during 1.6 with a goal to have something clearly defined for delivery in 1.7.

@MarianRaphael
Copy link
Contributor

MarianRaphael commented Mar 28, 2023

From my perspective, a first version of High Availability (HA) and Scaling (let's call it HA Instances) comes with certain restrictions:

  • A HA Instance doesn't have an editor (like currently devices). The user needs to develop the flows on another instance type and then "push" the results to the HA instance (Requirements Multiple FlowForge Hosted Instances per Application #1689 and Staged Deployments #1492).

  • However, since HA and scaling can only be achieved if the flows (the entire snapshot) are stateless or store all states in a separate storage layer, we have to inform the user about this fact and provide them with examples of how to achieve it. It could be an information pop-up before ordering, explaining the necessity.

  • VPN (Custom VPN support for FlowForge applications #1570) is not possible with a HA instances.

Later, when we see there is a certain customer need, we can work on how to resolve identified restrictions.

(In my opinion, points one and three could potentially be resolved with sufficient time and testing. However, for point two, I'm currently lacking a feasible approach, especially in terms of scaling. In this case, I believe an approach for automatic failover would be more appropriate.)

Related Feature

#1920

User Story

As a customer of FlowForge,
I would like to have the option to utilize high availability and automatic scaling instances.
This will allow me to run business-critical processes within Node-RED and ensure that they are always available and can handle increasing workloads.

Assumption

  • Flows for business critical processes will not be developed in the target system / instance. 
  • It is assumed that we can effectively communicate the significance of stateless functions to the user and ensure that they understand the concept and use it correctly.

@MarianRaphael MarianRaphael added this to the 1.7 milestone Apr 5, 2023
@MarianRaphael MarianRaphael added the headline Something to highlight in the release label Apr 5, 2023
@MarianRaphael MarianRaphael removed this from the 1.7 milestone Apr 5, 2023
@MarianRaphael MarianRaphael added the needs-triage Needs looking at to decide what to do label Apr 5, 2023
@MarianRaphael MarianRaphael added this to the 1.7 milestone Apr 13, 2023
@MarianRaphael MarianRaphael removed the size:XXL - 13 Sizing estimation point label May 9, 2023
@MarianRaphael MarianRaphael modified the milestones: 1.7, 1.8 May 11, 2023
@MarianRaphael
Copy link
Contributor

Info about: Load Balancing with Shared Subscriptions - MQTT Client at HiveMQ:
https://www.hivemq.com/blog/mqtt-client-load-balancing-with-shared-subscriptions/

@DasGermanPhysicist
Copy link

Regarding scaling: as a manufacturing customer, I want to read out thousands of tags of data, be it e.g. from one or multiple OPC-UA servers, Modbus controllers or an MQTT broker, in order to support optimization of my entire fleet of assets or my processes.

When creating a node red flow, I want to configure the logic once, but configure it for thousands of input data tags, potentially across multiple edge deployment sites, and hence across multiple deployments. E.g. having something similar to an app.yaml per deployment would help me manage my data at scale.

@MarianRaphael
Copy link
Contributor

@DasGermanPhysicist thank you for your insightful commentary regarding scaling in the realm of manufacturing.

On this note, I'd like to inquire further about your views on the concept of snapshots. A snapshot, as you may know, is a point-in-time backup of a Node-RED instance, capturing:

  • The flows
  • Credentials
  • Environment variables
  • NPM packages, with locked versions
  • Runtime settings

These snapshots can also be pushed to devices connected to the instance. (More about this here: Snapshots Documentation). Is there any aspect of this functionality that you think could be enhanced to provide a more streamlined user experience?

I genuinely value your opinion and would love to get more insights on your experience with FlowForge. Please feel free to connect with me on LinkedIn to share your experiences and thoughts. Understanding your needs and requirements better would be of utmost importance for us to deliver an improved and tailored experience.

Looking forward to hearing from you.

@joepavitt
Copy link
Contributor Author

Hi @DasGermanPhysicist I'm curious to understand also if Environment Variables could act here as your "app.yml"? These are configurable for each Instance/Device of Node-RED within FlowForge. Enables a single build of a flow, then deployment out to each device customised by their respective Environment Variables.

Documentation: https://flowforge.com/docs/user/envvar/#environment-variables

@DasGermanPhysicist
Copy link

@joepavitt env vars would absolutely suffice, if they can be changed at scale. E.g. how would one change the env vars on 1000 devices according to a pattern (e.g. opc-ua tag names all follow the pattern <device_name>_piston_temp etc.)?

@DasGermanPhysicist
Copy link

@MarianRaphael I have not yet played around with flowforge snapshots. However, for me it would be important to have deployment groups, e.g. a test/dev group and a production group, and I would require to target these groups with different snapshots.

I did not know that env vars were captured on snapshot / application level. I assumed that different devices would have different env vars, as they describe the specific environment the device is in (and the flow will be in once it is deployed to the specific device).

@MarianRaphael MarianRaphael removed this from the 1.8 milestone Jun 9, 2023
@MarianRaphael MarianRaphael added this to the 1.9 milestone Jun 9, 2023
@MarianRaphael
Copy link
Contributor

MVP for HA in the 1.8 release (see follow-up issues)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic A significant feature or piece of work that doesn't easily fit into a single release headline Something to highlight in the release needs-triage Needs looking at to decide what to do priority:high High Priority
Projects
Archived in project
Development

No branches or pull requests

5 participants