Membership discovery and (self) awareness improvements #16

caio · 2022-11-09T10:21:25Z

Going through features discussed on #15. I'll slowly be ticking the boxes as I complete each:

A way for members to learn that their messages are being ignored, so they can react to the situation where they've been declared down but haven't received that info through normal means
Periodic announce() to increase the speed to discover the whole cluster
Periodic gossip(), to speed up cluster update propagation

tracing-attributes 0.1.23 ships with a smarter `Span::record` that allows owned values [^1], which makes This means that previously invalid code such as: span.record("num_updates", num_items); Is now valid, which makes clippy throw a warning for needless_borrow Fixing the lint, however, means that foca would require a strict version check for tracing-attributes (otherwise anything pulling 0.1.22 or bellow would not compile anymore). I don't want a strict dependency on tracing: it's tangential do foca and I'd like users to benefit from the improvements that might be shipped along, so I fix the lint warning by ignoring it instead. Once we get a (at least minor) version bump on `tracing`, this can be removed and updated to skip the borrow. [^1]: tokio-rs/tracing#2212

When a member is declared down by the cluster, its messages get ignored until the cluster "forgets" it went down (i.e.: after `Config::remove_down_after` expires). This patch introduces a new setting `Config::notify_down_members` which makes foca send a lightweight message to said members when they're down.

we'll be repeating the pattern of picking random members and sending a message to each

This introduces Config::periodic_announce, which allows users to instrcut foca to periodically send a Announce message to random members so that it learns about every cluster member faster

This introduces Config::periodic_gossip which instructs foca to disseminate cluster updates to random members with a fixed frequency so that the cluster learns new information faster periodic announce paved the way, so this was mostly copy-paste :-)

caio · 2022-11-12T10:24:41Z

Good to go! Gonna leave this running for a while and inject a few failures to see if I can catch any obvious bugs before shipping (so maybe tomorrow).

Being cautious mostly because of 9b9dc7b (notifying chatty dead members that they're down). It's a very simple attempt at fixing the awareness problem.. can't help but think I'm missing something really obvious that makes this a bad idea.

jeromegn · 2022-11-12T14:01:51Z

Exciting!

What do you mean by:

Gonna leave this running for a while

Is there something fuzzing or testing this continuously? :)

caio · 2022-11-13T08:39:22Z

Haha I wish! Nothing as sophisticated as that at present. What I meant was that I put the code on this branch live in my LAN cluster to see how it behaves.

And by injecting failures I meant that my nodes have a little helper endpoint that lets me "drive" the cluster. Essentially, it takes a target address and does (maybe this is an interesting thing to do on your end, to see how the cluster behaves):

let found = foca.iter_members().find(|m| m.addr == target.addr).cloned();
if let Some(member) = found {
    let _ = foca.apply_many(
        std::iter::once(Member::new(member, 0, foca::State::Down)),
        &mut runtime,
    );
}

Technically, a gossip with no updates is still useful: it tells the receiver that the sender is active so it helps with discovery and, now, with the member gaining awareness that the cluster thinks its dead. It is, however, too chatty. During normal operations the updates buffer is mostly empty; If we start sending messages with no updates too often it won't be much different from a traditional gossip protocol with a heartbeat. So, since pings and announces already cover the case of talking to the cluster unconditionally, it makes sense to make periodic gossiping "smarter"

caio · 2022-11-13T09:57:54Z

Merging. Only had to fix the logic for periodic gossiping, making it only send if there are updates to be sent; Without that the cluster was super chatty (tiny packets, but still...) without any reason.

caio force-pushed the next branch from 9b18fae to d96570c Compare November 9, 2022 11:52

caio mentioned this pull request Nov 11, 2022

Member can lose all memberships and never recover #15

Closed

caio added 2 commits November 11, 2022 10:51

extract choose_and_send method

950aebc

we'll be repeating the pattern of picking random members and sending a message to each

Ability to periodically announce to the cluster

3d0cfa4

This introduces Config::periodic_announce, which allows users to instrcut foca to periodically send a Announce message to random members so that it learns about every cluster member faster

caio force-pushed the next branch from d96570c to 3d0cfa4 Compare November 11, 2022 10:00

Ability to periodically gossip

9ca29d0

This introduces Config::periodic_gossip which instructs foca to disseminate cluster updates to random members with a fixed frequency so that the cluster learns new information faster periodic announce paved the way, so this was mostly copy-paste :-)

caio merged commit 8be5f04 into main Nov 13, 2022

caio deleted the next branch November 13, 2022 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Membership discovery and (self) awareness improvements #16

Membership discovery and (self) awareness improvements #16

caio commented Nov 9, 2022 •

edited

caio commented Nov 12, 2022

jeromegn commented Nov 12, 2022

caio commented Nov 13, 2022

caio commented Nov 13, 2022

Membership discovery and (self) awareness improvements #16

Membership discovery and (self) awareness improvements #16

Conversation

caio commented Nov 9, 2022 • edited

caio commented Nov 12, 2022

jeromegn commented Nov 12, 2022

caio commented Nov 13, 2022

caio commented Nov 13, 2022

caio commented Nov 9, 2022 •

edited