-
Notifications
You must be signed in to change notification settings - Fork 476
/
PrimaryPartition.txt
43 lines (31 loc) · 3 KB
/
PrimaryPartition.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Design of the PRIMARY_PARTITION protocol
========================================
Author: Bela Ban
Version: $Id: PrimaryPartition.txt,v 1.1 2005/07/21 20:33:01 belaban Exp $
Dawid Kurzyniec wrote:
> Bela Ban wrote:
>
>> I think adding a protocol on top of (or below ?) GMS will work. However, there is the question of how you actually determine the primary and secondary partitions ?
>> For example, if we have a switch crash, and all 5 members in the group become singleton groups, then the switch is turned back on, which one
>> is the primary partition ? There is no majority. Of course, you simply need to take a deterministic decision, e.g. in this case do a lexical sort and take
>> the first (A). Is this what you are thinking of doing ? So B, C, D and E would get an EXIT event, would have to leave and possibly re-join ? This
>> would be simple to implement.
>
>
> Yes, this is pretty much what we have in mind. We intend in a conflicting case to find the greatest address of all members from candidate groups, then pick as a surviving group the one where this greatest guy belongs to (or should we take the smallest one? It would be nice to bias towards the group containing the current coordinator; does the coordinator has the smallest or the largest address in its group?)
It is a lexical sort, e.g. Address extends Comparable, so we sort and take the first member of the resulting merged group as the coordinator
> In fact we figured we don't even need a protocol - handling this in MembershipListener should do the job, I guess?... (We are lazy and we want to deal with JGroups at the highest level possible).
I think it should be a protocols, PRIMARY_PARTITION, and can be implemented as follows:
* Place it somewhere below GMS, but above MERGE2, It probably needs lossless delivery, so it should be ablove UNICAST and NAKACK as well
* Handle the MERGE event on the up() method:
o Get the subgroups, e.g. {A,B}, {C}, {D,E} and {F}
o Consult a *merge policy*, which determines (given the list of subgroups), the primary partition
o If we are the coordinator of the primary partition:
+ Send an exit message to all other coordinators (hmm, you probably can't do that as you are not a member of the subgroups, so probably we have to handle VIEW(MergeView) rather than the MERGE event
+ The other coordinators forward the EXIT event to everyone else in their group, so all members leave (and possibly rejoin) the group
o Else
+ Send the EXIT event to everyone in my group
+ Everyone shuts down and possibly rejoins later
The MergePolicy implementation needs to be configurable, so devs can specify their own implementation. We would supply a default impl if not specified.
Looks relatively straightforward. The only thing I don't really like is that we have to merge *first* before we send the EXIT message to members of the *previous* subgroups.
However, this is probably necessary as we cannot send messages to members *not* in our group