Skip to content

Commit

Permalink
[RAW] swim: introduce SWIM's anti-entropy component
Browse files Browse the repository at this point in the history
SWIM - Scalable Weakly-consistent Infection-style Process Group
Membership Protocol. It consists of 2 components: events
dissemination and failure detection, and stores in memory a
table of known remote hosts - members. Also some SWIM
implementations have additional component: anti-entropy -
periodical broadcast of a random subset of members table.

Each SWIM component is different from others in both message
structures and goals, they even could be sent in different
messages. But SWIM describes piggybacking of messages: a ping
message can piggyback a dissemination's one. SWIM has a main
operating cycle during which it randomly chooses members from a
member table and sends them events + ping. Answers are
processed out of the main cycle asynchronously.

Random selection provides even network load about ~1 message to
each member regardless of the cluster size. Without randomness
a member would get a network load of N messages each protocol
step, since all other members will choose the same member on
each step where N is the cluster size.

Also SWIM describes a kind of fairness: when selecting a next
member to ping, the protocol prefers LRU members. In code it
would too complicated, so Tarantool's implementation is
slightly different, easier.

Tarantool splits protocol operation into rounds. At the
beginning of a round all members are randomly reordered and
linked into a list. At each round step a member is popped from
the list head, a message is sent to him, and he waits for the
next round. In such implementation all random selection of the
original SWIM is executed once per round. The round is
'planned' actually. A list is used instead of an array since
new members can be added to its tail without realloc, and dead
members can be removed as easy as that.

Also Tarantool implements third component - anti-entropy. Why
is it needed and even vital? Consider the example: two SWIM
nodes, both are alive. Nothing happens, so the events list is
empty, only pings are being sent periodically. Then a third
node appears. It knows about one of existing nodes. How should
it learn about another one? The cluster is stable, no new
events, so the only chance is to wait until another server
stops and event about it will be broadcasted. Anti-entropy is
an extra simple component, it just piggybacks random part of
members table with each regular ping. In the example above the
new node will learn about the third one via anti-entropy
messages of the second one.

This commit introduces the first component - anti-entropy. With
this component a member can discover other members, but can not
detect who has already dead. It is a part of next commit.

Part of #3234
  • Loading branch information
Gerold103 committed Oct 31, 2018
1 parent 5e7699a commit e29fa15
Show file tree
Hide file tree
Showing 8 changed files with 1,167 additions and 1 deletion.
3 changes: 2 additions & 1 deletion src/CMakeLists.txt
Expand Up @@ -171,6 +171,7 @@ set (server_sources
lua/crypto.c
lua/httpc.c
lua/utf8.c
lua/swim.c
lua/info.c
${lua_sources}
${PROJECT_SOURCE_DIR}/third_party/lua-yaml/lyaml.cc
Expand Down Expand Up @@ -216,7 +217,7 @@ endif()

set_source_files_compile_flags(${server_sources})
add_library(server STATIC ${server_sources})
target_link_libraries(server core bit uri uuid ${ICU_LIBRARIES})
target_link_libraries(server core bit uri uuid swim ${ICU_LIBRARIES})

# Rule of thumb: if exporting a symbol from a static library, list the
# library here.
Expand Down
1 change: 1 addition & 0 deletions src/lib/CMakeLists.txt
Expand Up @@ -5,6 +5,7 @@ add_subdirectory(small)
add_subdirectory(salad)
add_subdirectory(csv)
add_subdirectory(json)
add_subdirectory(swim)
if(ENABLE_BUNDLED_MSGPUCK)
add_subdirectory(msgpuck EXCLUDE_FROM_ALL)
endif()
6 changes: 6 additions & 0 deletions src/lib/swim/CMakeLists.txt
@@ -0,0 +1,6 @@
set(lib_sources
swim.c
)

set_source_files_compile_flags(${lib_sources})
add_library(swim STATIC ${lib_sources})

0 comments on commit e29fa15

Please sign in to comment.