Feature request: Clustering support #2443

svagner · 2020-01-02T16:15:10Z

Short description

Currently, bosun doesn't support any ha and load distribution. We should provide something that will allow us to provide bosun as a high available and scalable service

How this feature will help you/your organisation

Automatic failover then server with bosun became unavailable
Avoid split-brain problem
Distribute check execution between multiple servers

Possible solution or implementation details

One of working implementation - #2441

I offer to use raft clustering implementation from hashicorp.
Possible roundmap:

Create cluster for improving availability. Have a simple master-slave configuration. We can use silence/nochecks flags to make node as standby. This step without ant snapshots etc. Just simple standby.
Add support for snapshot cluster state, rotate snapshots, recover the cluster state
Host leader can distribute tasks for checks (by check name as instance) between nodes using consistent hashing distribution. In that step we can stop to use flags as the main instrument for manage nodes within the cluster

svagner · 2020-01-02T16:55:04Z

Related issues: #2360

johnewing1 · 2020-01-09T15:21:11Z

Hi, thanks for this contribution 👍
What's the minimum number of nodes needed for an HA solution ?
In the description you talk about a master slave solution, but my understanding was it would require at least 3 nodes for raft to tolerate the loss of a node and elect a new master.

svagner · 2020-01-09T16:55:11Z

Yes this is correct. We need at least 3 nodes. It will allow us in feature spread checks load between the nodes. In that case leader will response to set the task to servers (I'm thinking about consistent hashing based on alarm name definition)

johnewing1 · 2020-02-14T10:42:17Z

I've been thinking about this approach, and one thing that would need resolved is how updates to the alert definitions are handled.

Currently editing the rule definitions via the ui or api and saving them to the local disk is supported. Without further work this would lead to the definitions being inconsistent in a multi node setup.

svagner · 2020-02-14T15:50:52Z

Yes. It definitely is. Currently, we are syncing configs while deployment. But I think I will add sync config over the cluster internally

svagner · 2020-04-23T21:46:49Z

I've been thinking about this approach, and one thing that would need resolved is how updates to the alert definitions are handled.

Currently editing the rule definitions via the ui or api and saving them to the local disk is supported. Without further work this would lead to the definitions being inconsistent in a multi node setup.

I've created a new pull request with rule sync support and some improvement. We are using clustering from #2472 on production now and didn't notice any issue.

stale · 2021-04-18T22:17:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

langerma · 2021-04-19T05:02:16Z

So no progress on this?

svagner · 2021-04-19T09:04:16Z

Looks like the community didn't want to support this cluster implementation and I stopped using bosun. Probably someone else will pick up care of clustering for bosun but not me anymore. Sorry

stale · 2022-04-16T14:10:12Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

svagner mentioned this issue Jan 2, 2020

First cluster implementation #2441

Closed

13 tasks

johnewing1 mentioned this issue Feb 14, 2020

can bosun support cluster? #2360

Closed

svagner mentioned this issue Apr 23, 2020

Raft-based cluster support #2472

Closed

18 tasks

stale bot added the wontfix label Apr 18, 2021

stale bot removed the wontfix label Apr 19, 2021

stale bot added the wontfix label Apr 16, 2022

stale bot closed this as completed May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Clustering support #2443

Feature request: Clustering support #2443

svagner commented Jan 2, 2020

svagner commented Jan 2, 2020

johnewing1 commented Jan 9, 2020

svagner commented Jan 9, 2020 •

edited

Loading

johnewing1 commented Feb 14, 2020

svagner commented Feb 14, 2020

svagner commented Apr 23, 2020

stale bot commented Apr 18, 2021

langerma commented Apr 19, 2021

svagner commented Apr 19, 2021

stale bot commented Apr 16, 2022

Feature request: Clustering support #2443

Feature request: Clustering support #2443

Comments

svagner commented Jan 2, 2020

Short description

How this feature will help you/your organisation

Possible solution or implementation details

svagner commented Jan 2, 2020

johnewing1 commented Jan 9, 2020

svagner commented Jan 9, 2020 • edited Loading

johnewing1 commented Feb 14, 2020

svagner commented Feb 14, 2020

svagner commented Apr 23, 2020

stale bot commented Apr 18, 2021

langerma commented Apr 19, 2021

svagner commented Apr 19, 2021

stale bot commented Apr 16, 2022

svagner commented Jan 9, 2020 •

edited

Loading