New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve performance of community followers inbox query #3482
Conversation
This is probably THE most important thing to add in 0.18.1. My remote DB is hitting close to 1Gbs of data transfers. |
179715d
to
bd98d2d
Compare
This comment was marked as abuse.
This comment was marked as abuse.
@@ -11,19 +17,37 @@ use lemmy_db_schema::{ | |||
|
|||
type CommunityFollowerViewTuple = (Community, Person); | |||
|
|||
sql_function!(fn coalesce(x: diesel::sql_types::Nullable<diesel::sql_types::Text>, y: diesel::sql_types::Text) -> diesel::sql_types::Text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great if this could be made generic to work for multiple types, or better yet built into diesel.
Great find! I think the apub crate should generally not use any views, so there might be more to optimize. Edit: Federation tests recently keep failing in CI, but I confirmed that its passing locally for this PR. |
Just a few charts after deploying this + #3483: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, excellent find. These are the things we could never discover without more eyes on them than just us two. Thanks a ton!
We need to see what's going on with the federation tests failing.
i found a huge perf issue. get_follower_inboxes fetches all followers of a community (including thousands of copies of the whole damn community object, person object, and community_follower object) for every single action (e.g. like). it transfers that list to rust (like 20000 elements with huge objects each, including private keys and whatever). rust then finds all the federated instances it has to send the like to from that list (like 100 strings).
so basically the query uselessly transfers huge amounts of data just to get a list of instance_count urls.
on lemmy.world we looked at pg_stats_statements (thanks jelloeater) and that community_follower query absolutely dominates the stats, eating up 8 days of query time per day. Since yesterday, it has fetched over 4 billion rows.
that query is also timing out a lot due to the statement_timeout=10s we set.
This PR improves on the situation by doing the filtering on the database side. It's not optimal since the result can't be perfectly indexed, but it seems the main issue is the huge amount of data transfer between rust and PG. after some quick testing: the new query takes around 100ms and returns only tiny amounts of data. it should basically fix a huge amount of lemmys current perf problems.
i also replaced the second query in followers which was doing a similar amount of unnecessary work (but much more seldom).
i did not extensively test this for correctness.