-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic failover for Node-RED Instances #1920
Comments
If the assumption is incorrect, the issue can be closed. |
@hardillb and I have been discussing how we will approach this at length to start building an implementation plan. Our working notes are in https://www.figma.com/file/upA7oHb9seloP74kTMyegN/FlowForge-High-Availability-Design-notes?node-id=0%3A1&t=bZVnNEZIeQBpxH12-1 There is a lot of technical work required it isn't something you can do half the job, but we are starting to identify the steps needed to work towards it. The key criteria is whether we can build a failover system that operates faster than it takes k8s to restart a crashed pod. Current ArchitectureHA ArchitectureLots more detail at the figma link above... copying here for reference Key points
Restarting Node-RED If the platform asks the HA Controller to restart Node-RED (could be updating settings or a staged deployment rollout), the HA Controller will notify the inactive instance to restart first - once it is ready, HA Controller will trigger a failover so the newly updated inactive instance becomes the active instance. It will then tell the newly inactive instance to restart. This will minimise the downtime of rolling out new flows. There are a few different scenarios like this - some already documented in the figma doc. TasksTwo immediate tasks have been identified that can be got underway now:
From there, we then have to build the HA Controller. There's no short-cutting that piece - a finer grained task breakdown will follow for that. |
Activities paused, based on workshop discussion. |
Description
Implement a robust automatic failover mechanism for Node-RED instances that focuses solely on high availability without considering scalability. This feature will monitor the active Node-RED instance and seamlessly switch to a hot-spare instance if the primary instance fails or becomes unresponsive, thus ensuring reliability without the added complexity of load balancing and scaling.
Related Epic
#1678
Assumption
Automatic failover without scaling is assumed to be easier to implement than a complete high availability solution with scaling, as it omits the complexities associated with load balancing, state management, and other challenges tied to scaling.
Motivation
As a customer of FlowForge,
I would like to have the option to utilize high availability instances.
This allows me to run business-critical processes within Node-RED and ensure that they are always available.
Key considerations
The text was updated successfully, but these errors were encountered: