New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instance Load Balancing #2156
Comments
We need to decide how HA (in whatever form) is exposed to the user as a choice, and how that relates to billing. Whilst there's a lot of business decisions wrapped in that which could evolve, it has a specific impact on how we choose to implement it from the start. I want to make sure our initial iteration is pointed in the right direction. In our current model, we have three Instance Types - S, M and L. Each Instance Type has a Stripe Product/Price associated with it. At its most simple, HA could be a simple on/off choice. When turned on, they get two replicas of the instance. That gives them double the capacity, but that might not be enough for their requirements... what if they need 3 replicas for their needs (save that thought for later). In terms of how we model this in Stripe, we have two options. Let's consider a Team with 5 small instances, 2 of which are HA enabled.
One future problem with the InstanceTypes approach (opt 2) is what happens if we have other features in the future that mean we end up managing MxN InstanceTypes to cover all combinations. Add-ons (opt 1) are a cleaner way to manage it in my view. Open questions
|
As discussed in the Product Meeting:
|
Getting some implementation notes out of head and onto virtual paper. Feature FlagAs this feature will only be available under certain conditions, we will introduce an ha feature flag that is enabled if:
In the future, this feature will be restricted to particular team tiers. That will require team-level feature flags, something we don't have today but will need to be introduced as part of the work to reintroduce team tiers. UXCreating Instance with HA When creating a new instance, a toggle will be added below the 'select instance type' to enable/disable HA mode. The UI should clearly show the cost associated with enabling the feature. For MVP this will be at no cost, but we need a means to have a cost associated with it that is dependent on the InstanceType selected. Viewing Instance Details The instance overview needs to indicate if HA is enabled or not on the instance Enabling/Disabling HA on an existing Instance Out of scope for MVP. Accessing Logs We will now have logs from In the UI, we need a way to show logs from individual replicas. Disabling the editor If HA is enabled on an instance, we will disable the editor. The only way to update the flows will be via a pipeline deploy from another instance. The instance overview needs to help the user to understand that - and explain why the editor button is unavailable. API ChangesA new flag can be added to the create-instance api end point to indicate if HA mode is to be enabled or not. The server will validate whether HA can be enabled for the given request. The instance view will include the flag for the UI to know the state (but only if EE license applied). DB ChangesAt this stage, none. The HA flag can be stored in the existing ProjectSettings blob. Billing configurationThis is potentially out of scope if for MVP there's no cost associated with enabling HA. However that will be a very short-term position, so it is worth having a sense of how billing will be applied. From previous discussions, the decision is for HA to be considered as an add-on item on the invoice, rather than use an alternative instance product. In the current model, we have a The most straight-forward implementation will be to add HA price/product options in the ProjectType properties. This won't easily scale if we want different prices for different TeamTiers. That is a future issue we already have to tackle, so this won't make it any worse to deal with than it already is. Persistent ContextWe added a caching layer to our persistent context implementation for two reasons:
If HA is enabled, we will need to disable the caching layer ProjectNodes MQTT configurationTo have multiple instances connected to the broker, they will need to use a shared subscription for the project nodes - so that messages are distributed. |
ProjectSettingsTo indicate the HA state of an Instance, we will store an object in ProjectSettings under the key
|
For anyone just reading the notifications of comments... I've added task lists to the main description of this epic to track individual tasks. |
For the sake of local development/testing, I'm going to make the stub driver appear to offer HA capabilities. This removes the need for everyone to have a local k8s development environment before they can contribute to this work (although longer term, that remains something we need to enable). |
Branch for k8s driver changes: |
REgarding Persistent Context... we need to disable synchronous access if HA is enabled as we need to ensure access is synced with the backing store. However, we have FlowFuse/nr-persistent-context#16 when the store is async-only. This has been fixed upstream in Node-RED 3.1. Need to work out the consequences of this - do we have to (document) a restriction in persistent context with this iteration of scaling... or pre-req Node-RED 3.1 (which is still in beta). |
I have raised follow-up items for the HA tasks known at this time - all linked in the task lists above. |
Epic
#1678
Description
Allow horizontal scaling of n Node-RED instances in a FlowForge environment, distribute incoming (HTTP) requests between instances in a round-robin manner.
Constraints
User Story
As a FlowForge customer, I want to leverage instance load balancing.
This will allow me to run business-critical processes within Node-RED and ensure that they are always available and can handle increasing workloads.
Assumption
Have you provided an initial effort estimate for this issue?
I have provided an initial effort estimate
Backend Tasks
UX Tasks
Documentation Tasks
The text was updated successfully, but these errors were encountered: