[Architecture] Scaling federation throughput via message queues and workers #3230

huanga · 2023-06-20T21:23:02Z

Requirements

Is this a feature request? For questions or discussions use https://lemmy.ml/c/lemmy_support
Did you check to see if this issue already exists?
Is this only a feature request? Do not put multiple feature requests in one issue.
Is this a UI / front end issue? Use the lemmy-ui repo.

(checking that check box even though it is not a front end issue, and I will not be using lemmy ui repo for this)

Is your proposal related to a problem?

Larger instances currently are all backed up with outbound federation. Lemmy.world has bumped the federation worker count up to 10000, and outbound federation events are still reaching other servers outside of the 10 seconds window. While I have no direct proof this being the issue presently, the symptoms would suggest outbound federation events are backed up, be it due to straight up too many messages to send out (i.e.: every "write" interaction = 1 event * how many ever federated servers) or worse yet, previously federated servers shutting down, and federation worker must wait for network timeout before they can proceed with sending another federation event message.

Describe the solution you'd like.

Federation workers needs to be scalable independent of the Lemmy back-end, and perhaps managed via some sort of message bus/queue. Instead of immediately emit an out going federation event message to the federated server, the event gets put into a message queue (for example, maybe a separate RabbitMQ container). Then, have separate scalable container representing Federation worker that can be spun up in multiple server/VMs, or just multiple instances, which will fetch the federation event message, and deliver to the intended federated server. This way, larger instances can have a fleet of workers not bound by TCP "half open" limits per server, and scale outbound federation event messages better.

Describe alternatives you've considered.

As a temporary solution, increasing the 10 seconds expiration window might help these larger communities scale a little bit more before we need more advanced solutions such as the one described above.

Additional context

No response

huanga · 2023-06-21T13:28:23Z

I see a change in activitypub-federation-rust (line 70) which may increase the signature duration to 5 minutes. This change is welcomed and should help alleviate the current delay issue. I don't know if that will be suffice if bulk of Reddit were to migrate to Lemmy, so I'll leave this issue open for now, and we can re-evaluate again when we see more growth of the network.

huanga · 2023-06-21T15:47:23Z

Thank you for chiming in!

I agree, we need queues, and just so I'm clear... I'm not prescribing solutions such as RabbitMQ, just that we'd need a separate independently scalable component. The separation is kind of important in my mind, as it allows for independent maintenance windows and scaling adjustments.

Also, I'm so glad to see I'm not the only one thinking about these! I'll try to add more thoughts on your Lemmy post instead of cluttering up here!

XtremeOwnageDotCom · 2023-06-21T17:07:06Z

@huanga check- out my proposal too- #3245

I think the best way to help this issue, is to reduce the exponential growth of federation traffic that comes from using a full-mesh topology. It just doesn't scale. And- growth is not going to slow down anytime soon.

thegabriele97 · 2023-06-21T17:17:03Z

Another soultion I thought about is to have an instance that has to send an update to 10k other federated instances.
It says "ok, instance #1, instance#2.. instance#10, this is the update you have to forward to those 10 instances".

In this way the main instance doesn't need to make 10k HTTP requests but only 1k in this case and the other 10 instances the main instance depicted as "forward nodes" can do the same thing with their own federated instances or can directly send everything, depending on a logic like a threshold (if > 1000, split, if < 1000 send). So you have like a "job scheduling" between instances.

phiresky · 2023-06-23T15:11:49Z

Hey! Since no one has mentioned it I wanted to say that the outgoing federation logic has been replaced with a different, hopefully better queue here: LemmyNet/activitypub-federation-rust#42

Instead of immediately emit an out going federation event message to the federated server

With debug mode off it was already a (suboptimal) queue in 0.17, and in 0.18 it should be a pretty efficient (in-memory) queue.

I'd also say that right now there's (probably) a lot of low hanging fruit still to be picked before moving to much more complex systems that have to involve horizontal scaling with multiple servers. Of course that also has to happen at some point, but it's a huge amount of effort compared to the improvements that can still be done to improve the single-instance scaling.

If you just move to an external messaging queue you're not going to actually reduce the amount of work that has to be done per time unit. Rust will already reach the maximum scalability you can get out of one server with an in-memory queue (if written well).

I think the problems are mostly

Server operators can barely tell what's happening (partially because the code doesn't report much what it's doing)
The code still has low-hanging fruit that should be picked before adding more new solutions.

Many things server operators are saying can't be explained just by the server hardware and general architecture being limiting. You can easily do 100+ inserts per second into a single table in postgresql, if inserts take 0.4 seconds on average something is very wrong somewhere else.

Stuff like the long-duration locks for the hot_rank batch updates that locked all comments at the same time, or having some part be single-threaded just because someone forgot to turn off the debug flag for weeks.
ActivityPub in general just scales horribly. I don't think there's a real way around this.

If you have 100 federated servers, for a single post with 100 comments, with 100 votes each, you will get around 100 * 100 * 100 = 1 million outgoing POST requests!! And 10k incoming requests with at least one SQL query each for each of the servers. The main thing that could be done here is
1. send votes as aggregates (which makes faking votes much easier)
2. prioritize the outgoing queue that if it grows to a certain size it starts dropping based on priorities (dropping vote updates should reduce the load by at least 10x from what I understand)

phiresky · 2023-06-23T15:17:24Z

Just as another example of low hanging fruit that we should pick before looking at exciting and complex new systems, in 0.17 the PEM key was loaded and parsed for every outgoing http request (so basically a million times for a single popular post). The rust code also does a ton of unnecessary sql selects that it then doesn't even use.

So I'd encourage everyone to work on improving the plumbing (and tooling) before thinking about adding new complexity.

XtremeOwnageDotCom · 2023-06-27T02:11:04Z

@RocketDerp I am going to assume its being called from this code: https://github.com/LemmyNet/lemmy/blob/main/crates/db_views_actor/src/person_view.rs

And- I am going to assume, when federation messages are incoming, for new comments/posts/edits/etc- the database needs to lookup the actor ID, to translate that into a local person ID.

I'd say the best way to mitigate that load, is to bundle redis with lemmy, and use it to cache actor_ids, and other IDs, to remove that load from the database.

Especially- for instances processing hundreds of thousands of federation messages in a short time. Redis scales well- and it could excel at that. Very easy to get up and going too.

huanga · 2023-06-27T02:21:44Z

I don't know if we'd necessarily need Redis at this point since the query should be fairly light at this time... probably 6 figures worth of person ID records, which Postgres should be able to handle easily. However, long term, Redis will be very useful as a query cache layer for all frequent queries, especially busy threads for public instances.

phiresky · 2023-06-27T23:19:42Z

I think since the main devs are pretty overloaded your best will be to just start with small, "unoffensive" PRs and go from there. e.g. just add a really minimal admin API endpoint with the stats you mentioned, I think they would merge that.

Nutomic · 2023-09-28T10:47:53Z

Should be resolved with #3605

huanga added the enhancement New feature or request label Jun 20, 2023

This comment was marked as abuse.

Sign in to view

lionirdeadman added area: federation support federation via activitypub area: performance labels Jul 24, 2023

Nutomic closed this as completed Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Architecture] Scaling federation throughput via message queues and workers #3230

[Architecture] Scaling federation throughput via message queues and workers #3230

huanga commented Jun 20, 2023 •

edited

Loading

huanga commented Jun 21, 2023 •

edited

Loading

This comment was marked as abuse.

huanga commented Jun 21, 2023

XtremeOwnageDotCom commented Jun 21, 2023

thegabriele97 commented Jun 21, 2023

phiresky commented Jun 23, 2023 •

edited

Loading

phiresky commented Jun 23, 2023 •

edited

Loading

This comment was marked as abuse.

This comment was marked as abuse.

XtremeOwnageDotCom commented Jun 27, 2023

huanga commented Jun 27, 2023

This comment was marked as abuse.

phiresky commented Jun 27, 2023

This comment was marked as abuse.

Nutomic commented Sep 28, 2023

[Architecture] Scaling federation throughput via message queues and workers #3230

[Architecture] Scaling federation throughput via message queues and workers #3230

Comments

huanga commented Jun 20, 2023 • edited Loading

Requirements

Is your proposal related to a problem?

Describe the solution you'd like.

Describe alternatives you've considered.

Additional context

huanga commented Jun 21, 2023 • edited Loading

This comment was marked as abuse.

huanga commented Jun 21, 2023

XtremeOwnageDotCom commented Jun 21, 2023

thegabriele97 commented Jun 21, 2023

phiresky commented Jun 23, 2023 • edited Loading

phiresky commented Jun 23, 2023 • edited Loading

This comment was marked as abuse.

This comment was marked as abuse.

XtremeOwnageDotCom commented Jun 27, 2023

huanga commented Jun 27, 2023

This comment was marked as abuse.

phiresky commented Jun 27, 2023

This comment was marked as abuse.

Nutomic commented Sep 28, 2023

huanga commented Jun 20, 2023 •

edited

Loading

huanga commented Jun 21, 2023 •

edited

Loading

phiresky commented Jun 23, 2023 •

edited

Loading

phiresky commented Jun 23, 2023 •

edited

Loading