-
Notifications
You must be signed in to change notification settings - Fork 81
/
Election.txt
98 lines (72 loc) · 3.76 KB
/
Election.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
Election based on JGroups views
===============================
Author: Bela Ban
Motivation
----------
The current impl of jgroups-raft ignores the functionality (failure detection, view changes) JGroups offers and
implements leader election more or less as described in [1].
By rewriting that part to reuse JGroups functionality, the following advantages can be had:
* More deterministic voting process, each round is _orchestrated_ by the coordinator
* Code reduction, reuse of proven code
* Many failure detection protocols are available
* Ability to customize failure detection
* No constant traffic by heartbeats; a leader establishes itself once and then remains leader until a new view starts a
new leader election
* No elections unless triggered by view changes -> this reduces competing elections
In a nutshell
-------------
* Leader election is started only on view changes
* Contrary to [1], there are only leaders or followers, *no* candidates
* Once a leader is elected, there won't be any elections until a new view (not including the leader, or dropping below
the majority) is installed
* New members get the term and the leader's address via a message from the current leader (LeaderElected)
Voting is started by the coordinator; it increments its term and solicits the votes of all members (including itself).
The vote responses contain the term of the last log entry and the last log index for the responding member.
The coordinator picks the member with the highest last term / last log index. If all members have the same terms and
indices, the coordinator picks the oldest member (itself) as leader.
If no responses from a majority of members have been received after a timeout, then voting continues; otherwise
voting is stopped and a LeaderElected message (with the new term and leader) is sent to all members
Implementation
--------------
State
-----
- term: the current term
- leader: the address of the leader
- voted-for: the address of the member we voted for in the current term
On reception of view change
---------------------------
- Coordinator: if the majority is reached or the leader left -> start the voting thread
- Leader: new members joined -> send a LeaderElected msg to new members
- If the majority is lost -> leader=null
Voting thread
-------------
- Periodically: increment term and send a VoteRequest(term) to all members (+self)
- Terminates when LeaderElected message is received, or on shutdown
On reception of VoteRequest(term)
---------------------------------
- If term < current_term: reject (don't send a VoteResponse)
- Else:
- If term > current_term -> voted-for=null, current_term=term
- If voted-for == null or voted-for == vote requester -> set voted-for=vote requester, send VoteResponse
- Send a VoteResponse (containing the current term and last log index and last log term) back to the sender
(Example current_term=26 (the number of elections held so far), last_log_index=2012, last_log_term=6)
On reception of VoteResponse
----------------------------
(on coordinator)
- If enough votes
-> Determine the leader based on highest last log term / index / rank
-> Set term to max of all received current terms
-> Send LeaderElected(leader,term) message to all members (+self)
-> Stop voting thread
- Else
-> Continue
On reception of LeaderElected(leader,term) message
--------------------------------------------------
- if(leader == self) -> become leader
- Only on the coordinator: stop the Voting thread
- Set term and leader
Leader: on reception of any request(term)
-----------------------------------------
- If term > current_term -> step down as leader and become follower
- If term < current_term -> reject / send negative rsp (depending on role and protocol (RAFT,ELECTION))
[1] https://github.com/ongardie/dissertation