Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribute leaders uniformly in the cluster in a best-effort way #7258

Closed
deepthidevaki opened this issue Jun 11, 2021 · 1 comment
Closed
Assignees
Labels
area/performance Marks an issue as performance related kind/feature Categorizes an issue or PR as a feature, i.e. new behavior scope/broker Marks an issue or PR to appear in the broker section of the changelog

Comments

@deepthidevaki
Copy link
Contributor

In Zeebe, there is no way to control which nodes becomes the leader of which partition. The raft leader election is based on randomized timeout values which is not controllable. As a result, leaders are frequently concentrated in a small number of nodes.
Because leaders typically do more work than followers, this situation can easily become a performance bottleneck. This situation is also inefficient in terms of resource allocation. We should always over-provision nodes to get good performance. Therefore, to improve the performance of the system and for an optimal resource usage, it is required to distributed the leaders uniformly among the nodes.

We propose to use priority based election in raft. The solution and alternative approaches are described here zeebe-io/enhancements#15 The goal is not to achieve strictly uniform distribution, but to achieve it in a best effort way.

A poc is evaluated and the results are explained in #7223

@deepthidevaki deepthidevaki added kind/feature Categorizes an issue or PR as a feature, i.e. new behavior scope/broker Marks an issue or PR to appear in the broker section of the changelog area/performance Marks an issue as performance related labels Jun 11, 2021
@deepthidevaki deepthidevaki self-assigned this Jun 11, 2021
@deepthidevaki deepthidevaki added this to In progress in Zeebe Jun 11, 2021
@deepthidevaki
Copy link
Contributor Author

deepthidevaki commented Jun 11, 2021

Steps to achieve the goal:

  • Extract election logic from the follower role so that it is easy to replace it with priority election
  • Implement priority election with default disabled
  • Add configuration parameters to enable/disable priority election
  • Add metrics to monitor leader distribution
  • Add metrics to monitor leader election latency

@deepthidevaki deepthidevaki added this to the Leader distribution milestone Jun 15, 2021
ghost pushed a commit that referenced this issue Jun 15, 2021
7260: refactor(raft): extract election timer logic r=deepthidevaki a=deepthidevaki

## Description

This PR is in preparation to implement priority election in #7258. By extracting the election logic, it will be easier to chose between default election and priority election.

## Related issues

Related #7258 



Co-authored-by: Deepthi Devaki Akkoorath <deepthidevaki@gmail.com>
Zeebe automation moved this from In progress to Done Jun 30, 2021
@KerstinHebel KerstinHebel removed this from Done in Zeebe Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Marks an issue as performance related kind/feature Categorizes an issue or PR as a feature, i.e. new behavior scope/broker Marks an issue or PR to appear in the broker section of the changelog
Projects
None yet
Development

No branches or pull requests

2 participants