New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Membership discovery and (self) awareness improvements #16
Conversation
tracing-attributes 0.1.23 ships with a smarter `Span::record` that allows owned values [^1], which makes This means that previously invalid code such as: span.record("num_updates", num_items); Is now valid, which makes clippy throw a warning for needless_borrow Fixing the lint, however, means that foca would require a strict version check for tracing-attributes (otherwise anything pulling 0.1.22 or bellow would not compile anymore). I don't want a strict dependency on tracing: it's tangential do foca and I'd like users to benefit from the improvements that might be shipped along, so I fix the lint warning by ignoring it instead. Once we get a (at least minor) version bump on `tracing`, this can be removed and updated to skip the borrow. [^1]: tokio-rs/tracing#2212
When a member is declared down by the cluster, its messages get ignored until the cluster "forgets" it went down (i.e.: after `Config::remove_down_after` expires). This patch introduces a new setting `Config::notify_down_members` which makes foca send a lightweight message to said members when they're down.
we'll be repeating the pattern of picking random members and sending a message to each
This introduces Config::periodic_announce, which allows users to instrcut foca to periodically send a Announce message to random members so that it learns about every cluster member faster
This introduces Config::periodic_gossip which instructs foca to disseminate cluster updates to random members with a fixed frequency so that the cluster learns new information faster periodic announce paved the way, so this was mostly copy-paste :-)
Good to go! Gonna leave this running for a while and inject a few failures to see if I can catch any obvious bugs before shipping (so maybe tomorrow). Being cautious mostly because of 9b9dc7b (notifying chatty dead members that they're down). It's a very simple attempt at fixing the awareness problem.. can't help but think I'm missing something really obvious that makes this a bad idea. |
Exciting! What do you mean by:
Is there something fuzzing or testing this continuously? :) |
Haha I wish! Nothing as sophisticated as that at present. What I meant was that I put the code on this branch live in my LAN cluster to see how it behaves. And by injecting failures I meant that my nodes have a little helper endpoint that lets me "drive" the cluster. Essentially, it takes a target address and does (maybe this is an interesting thing to do on your end, to see how the cluster behaves): let found = foca.iter_members().find(|m| m.addr == target.addr).cloned();
if let Some(member) = found {
let _ = foca.apply_many(
std::iter::once(Member::new(member, 0, foca::State::Down)),
&mut runtime,
);
} |
Technically, a gossip with no updates is still useful: it tells the receiver that the sender is active so it helps with discovery and, now, with the member gaining awareness that the cluster thinks its dead. It is, however, too chatty. During normal operations the updates buffer is mostly empty; If we start sending messages with no updates too often it won't be much different from a traditional gossip protocol with a heartbeat. So, since pings and announces already cover the case of talking to the cluster unconditionally, it makes sense to make periodic gossiping "smarter"
Merging. Only had to fix the logic for periodic gossiping, making it only send if there are updates to be sent; Without that the cluster was super chatty (tiny packets, but still...) without any reason. |
Going through features discussed on #15. I'll slowly be ticking the boxes as I complete each:
announce()
to increase the speed to discover the whole clustergossip()
, to speed up cluster update propagation