Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RAW] swim: introduce SWIM's anti-entropy component
SWIM - Scalable Weakly-consistent Infection-style Process Group Membership Protocol. It consists of 2 components: events dissemination and failure detection, and stores in memory a table of known remote hosts - members. Also some SWIM implementations have additional component: anti-entropy - periodical broadcast of a random subset of members table. Each SWIM component is different from others in both message structures and goals, they even could be sent in different messages. But SWIM describes piggybacking of messages: a ping message can piggyback a dissemination's one. SWIM has a main operating cycle during which it randomly chooses members from a member table and sends them events + ping. Answers are processed out of the main cycle asynchronously. Random selection provides even network load about ~1 message to each member regardless of the cluster size. Without randomness a member would get a network load of N messages each protocol step, since all other members will choose the same member on each step where N is the cluster size. Also SWIM describes a kind of fairness: when selecting a next member to ping, the protocol prefers LRU members. In code it would too complicated, so Tarantool's implementation is slightly different, easier. Tarantool splits protocol operation into rounds. At the beginning of a round all members are randomly reordered and linked into a list. At each round step a member is popped from the list head, a message is sent to him, and he waits for the next round. In such implementation all random selection of the original SWIM is executed once per round. The round is 'planned' actually. A list is used instead of an array since new members can be added to its tail without realloc, and dead members can be removed as easy as that. Also Tarantool implements third component - anti-entropy. Why is it needed and even vital? Consider the example: two SWIM nodes, both are alive. Nothing happens, so the events list is empty, only pings are being sent periodically. Then a third node appears. It knows about one of existing nodes. How should it learn about another one? The cluster is stable, no new events, so the only chance is to wait until another server stops and event about it will be broadcasted. Anti-entropy is an extra simple component, it just piggybacks random part of members table with each regular ping. In the example above the new node will learn about the third one via anti-entropy messages of the second one. This commit introduces the first component - anti-entropy. With this component a member can discover other members, but can not detect who has already dead. It is a part of next commit. Part of #3234
- Loading branch information