Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of CPU time spent sorting broadcast bytes #11

Closed
jeromegn opened this issue Oct 14, 2022 · 7 comments
Closed

Lots of CPU time spent sorting broadcast bytes #11

jeromegn opened this issue Oct 14, 2022 · 7 comments

Comments

@jeromegn
Copy link

I'm actually not sure where the time is being spent, but most of the CPU time in my project is currently being used sorting something related to broadcast messages.

Flamegraph: corro.svg.zip

Looks like it goes like this:

  • Foca::broadcast
  • Foca::send_message
  • Broadcasts::fill
  • core::slice::sort::recurse
  • core::slice::sort::partial_insertion_sort
  • PartialOrd::lt

I have a loop that calls Foca::broadcast every 200ms to ensure maximum gossip propagation of broadcast messages.

@caio
Copy link
Owner

caio commented Oct 17, 2022

Hah I half expected this to come up 😬

The problem is that custom broadcasts use the same mechanism used for cluster broadcasts: it's a pretty dumb sorted array.

SWIM can get by with this inefficient thing because broadcasts are bound by the cluster size and are tiny (it's just cluster updates, and there's invalidation member identity). When I was hacking on it what I wanted was a binary heap with retain capabilities, but didn't want to jump on nightly and was too lazy to roll my own, so I ended up with what we have now heh

One way forward is making Broadcasts better. The api surface is small and requirements clear:

  • add_or_replace must be able to remove items based on some invalidation logic
  • fill must fill the given byte slice from the least-transmitted largest broadcast (what made me want a heap, using num_transmissions as objective, len(broadcast) to break even)

Another would be to disconnect the custom broadcasts from the main interaction (I think I mentioned this one before).

First alternative is a lot easier I think.

I don't think I'll be looking at this until at least the holidays season in December but I'm happy to receive/review pull-requests if you wanna take a stab at it 😁

@jeromegn
Copy link
Author

That makes sense.

I might've noticed this wasn't a problem when I was using larger max payload size, transmitted via TCP. Would that have made a difference?

For reference: my cluster size is now ~300 nodes.

Another would be to disconnect the custom broadcasts from the main interaction (I think I mentioned this one before).

This might be easier on our end! If I didn't use the Broadcast stuff from foca at all and just randomly send messages to other nodes.

I don't think I'll be looking at this until at least the holidays season in December but I'm happy to receive/review pull-requests if you wanna take a stab at it 😁

All good. I just might take a stab at it yes! I will have more questions if I do that.

@jeromegn
Copy link
Author

I ended up doing my own broadcasts, handling messages with logic similar to the BroadcastHandler trait. I'm using the config's number of probes to know how many I should broadcast to and I use rand's choose_multiple to pick from all active members.

That solved my CPU issues!

@caio
Copy link
Owner

caio commented Oct 22, 2022

Sweet! I think I over-promised a bit on the general purpose broadcasting feature 😅

I think when optimizing these sort of broadcasts you might end up wanting to do some fancier way of prioritizing which packets to send first, maybe even some way of feeding back information about the current known global state so you can prune/reorder the queue... It's why I think the way to go is disconnecting broadcasts from foca and just exposing some functionality to enable the use case.

From the top of my head, I could expose choose_active_members (without the unneeded &mut self) to make picking members easier; But as you've noticed, rand's choose_multiple on the active members does the job just as well... Let me know if there's more stuff foca could expose to make this easier for you

@jeromegn
Copy link
Author

Yeah, we've ended up implementing some logics for some broadcast messages to go out immediately instead of buffering a few of them.

Did foca do anything about re-broadcasting when a member joined after the initial buffer had been broadcasted? I guess it would behave like that if it didn't have anybody to broadcast to? In my new logic, I'm not even retrying broadcasts if there are no members to broadcast something to. I assume foca tried for a while until it could send its data to as many members as num_indirect_probes?

@caio
Copy link
Owner

caio commented Oct 27, 2022

That's right, foca doesn't try to do anything smart. It simply buffers a broadcast up and tries to send it Confix::max_transmissions times; it doesn't keep track of who it sent to, so if there's a single active member in the cluster, they will keep receiving the same broadcast over and over until the counter goes to zero

@caio
Copy link
Owner

caio commented Mar 7, 2024

closing stale issues that seem resolved. feel free to reopen

@caio caio closed this as completed Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants