Lots of CPU time spent sorting broadcast bytes #11

jeromegn · 2022-10-14T13:30:52Z

I'm actually not sure where the time is being spent, but most of the CPU time in my project is currently being used sorting something related to broadcast messages.

Flamegraph: corro.svg.zip

Looks like it goes like this:

Foca::broadcast
Foca::send_message
Broadcasts::fill
core::slice::sort::recurse
core::slice::sort::partial_insertion_sort
PartialOrd::lt

I have a loop that calls Foca::broadcast every 200ms to ensure maximum gossip propagation of broadcast messages.

The text was updated successfully, but these errors were encountered:

caio · 2022-10-17T08:51:13Z

Hah I half expected this to come up 😬

The problem is that custom broadcasts use the same mechanism used for cluster broadcasts: it's a pretty dumb sorted array.

SWIM can get by with this inefficient thing because broadcasts are bound by the cluster size and are tiny (it's just cluster updates, and there's invalidation member identity). When I was hacking on it what I wanted was a binary heap with retain capabilities, but didn't want to jump on nightly and was too lazy to roll my own, so I ended up with what we have now heh

One way forward is making Broadcasts better. The api surface is small and requirements clear:

add_or_replace must be able to remove items based on some invalidation logic
fill must fill the given byte slice from the least-transmitted largest broadcast (what made me want a heap, using num_transmissions as objective, len(broadcast) to break even)

Another would be to disconnect the custom broadcasts from the main interaction (I think I mentioned this one before).

First alternative is a lot easier I think.

I don't think I'll be looking at this until at least the holidays season in December but I'm happy to receive/review pull-requests if you wanna take a stab at it 😁

jeromegn · 2022-10-17T14:38:21Z

That makes sense.

I might've noticed this wasn't a problem when I was using larger max payload size, transmitted via TCP. Would that have made a difference?

For reference: my cluster size is now ~300 nodes.

Another would be to disconnect the custom broadcasts from the main interaction (I think I mentioned this one before).

This might be easier on our end! If I didn't use the Broadcast stuff from foca at all and just randomly send messages to other nodes.

I don't think I'll be looking at this until at least the holidays season in December but I'm happy to receive/review pull-requests if you wanna take a stab at it 😁

All good. I just might take a stab at it yes! I will have more questions if I do that.

jeromegn · 2022-10-21T12:14:20Z

I ended up doing my own broadcasts, handling messages with logic similar to the BroadcastHandler trait. I'm using the config's number of probes to know how many I should broadcast to and I use rand's choose_multiple to pick from all active members.

That solved my CPU issues!

caio · 2022-10-22T10:14:02Z

Sweet! I think I over-promised a bit on the general purpose broadcasting feature 😅

I think when optimizing these sort of broadcasts you might end up wanting to do some fancier way of prioritizing which packets to send first, maybe even some way of feeding back information about the current known global state so you can prune/reorder the queue... It's why I think the way to go is disconnecting broadcasts from foca and just exposing some functionality to enable the use case.

From the top of my head, I could expose choose_active_members (without the unneeded &mut self) to make picking members easier; But as you've noticed, rand's choose_multiple on the active members does the job just as well... Let me know if there's more stuff foca could expose to make this easier for you

jeromegn · 2022-10-26T14:58:50Z

Yeah, we've ended up implementing some logics for some broadcast messages to go out immediately instead of buffering a few of them.

Did foca do anything about re-broadcasting when a member joined after the initial buffer had been broadcasted? I guess it would behave like that if it didn't have anybody to broadcast to? In my new logic, I'm not even retrying broadcasts if there are no members to broadcast something to. I assume foca tried for a while until it could send its data to as many members as num_indirect_probes?

caio · 2022-10-27T08:44:58Z

That's right, foca doesn't try to do anything smart. It simply buffers a broadcast up and tries to send it Confix::max_transmissions times; it doesn't keep track of who it sent to, so if there's a single active member in the cluster, they will keep receiving the same broadcast over and over until the counter goes to zero

caio · 2024-03-07T09:55:52Z

closing stale issues that seem resolved. feel free to reopen

jeromegn mentioned this issue Nov 27, 2022

Member can lose all memberships and never recover #15

Closed

caio closed this as completed Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lots of CPU time spent sorting broadcast bytes #11

Lots of CPU time spent sorting broadcast bytes #11

jeromegn commented Oct 14, 2022

caio commented Oct 17, 2022

jeromegn commented Oct 17, 2022

jeromegn commented Oct 21, 2022

caio commented Oct 22, 2022

jeromegn commented Oct 26, 2022

caio commented Oct 27, 2022

caio commented Mar 7, 2024

Lots of CPU time spent sorting broadcast bytes #11

Lots of CPU time spent sorting broadcast bytes #11

Comments

jeromegn commented Oct 14, 2022

caio commented Oct 17, 2022

jeromegn commented Oct 17, 2022

jeromegn commented Oct 21, 2022

caio commented Oct 22, 2022

jeromegn commented Oct 26, 2022

caio commented Oct 27, 2022

caio commented Mar 7, 2024