Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Posts List Performance + cursor-based pagination #3872

Merged
merged 25 commits into from Sep 18, 2023

Conversation

phiresky
Copy link
Collaborator

@phiresky phiresky commented Aug 12, 2023

This PR implements two things:

  1. Cursor-based pagination for the /posts/list API request. A new opaque parameter page_v2 is added that fetches the next page based on the previous one.
    The request now looks like curl 'http://domain/api/v3/post/list?page_v2=Pf54e&limit=20&sort=Hot&...
    The response now looks like this:

    {
      "posts": [...],
      "next_page": "Pa46c"
    }

    Cursor-based pagination is optimized so each page only fetches ~LIMIT number of posts from the database. This is not possible with OFFSET-based pagination. Right now this works alongside the page= parameter, but this pagination method should (imo) completely replace the existing OFFSET-based pagination (with the page= parameter being deprecated). There's many articles on the internet about why offset-based pagination is bad, notably this one referencing multiple posts from MariaDB, Slack, Shopify engineering. Right now it only supports forward-paging. It is possible to implement backwards pagination like this but I don't see a reason to right now.

  2. Performant ordering and limiting for the subscribed page (see Slow SQL queries #2877 (comment)).

    This works as follows: First, fetch a page for a very active followed community. Then, use that to add a limit to the real suscribed fetch. More detail about why this works is in a code comment. This should hopefully remove all slowness and weird query plans from the subscribed post list query.

Current issues:

  • fixed Off-by-one-error: page_after actually includes one post from the previous page
  • the estimation query currently uses wrong filters (featured_community instead of featured_local)
  • code is a bit ugly

Future work:

  • Local view: the "local" view is still slow but could be optimized in the same way. this is relevant for smaller sites (e.g. < 10% of posts are local). otherwise it doesn't matter since the view fetches mostly the same data as the All view.

@phiresky phiresky mentioned this pull request Aug 12, 2023
@phiresky
Copy link
Collaborator Author

Fixed the relevant issues. ready for review

@phiresky phiresky marked this pull request as ready for review August 12, 2023 12:28
@phiresky
Copy link
Collaborator Author

In theory, this PR should work with the legacy OFFSET based-pagination (page=X param) without breaking any functionality and while still giving significant perf improvements. I have not really tested that though.

@phiresky
Copy link
Collaborator Author

Here is a diagram showing how the ALL fetches should work:

@phiresky
Copy link
Collaborator Author

Here's a diagram showing how the Subscribed post fetch will work and why it's currently slow:

@RocketDerp

This comment was marked as abuse.

@phiresky
Copy link
Collaborator Author

But there is always going to be some fringe case where a person has essentially blocked almost everything and runs out of data

With my approach (as detailed above), this shouldn't be an issue.

@@ -185,6 +186,34 @@ fn queries<'a>() -> Queries<
};

let list = move |mut conn: DbConn<'a>, options: PostQuery<'a>| async move {
macro_rules! order_and_page_filter_desc {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've tried to do this without a macro and without copy-pasting this code 10 times, but I couldn't figure out the types for diesel if this was a lambda or function. I'm not sure if it would even be possible, even with BoxedQuery..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd almost rather this be copy-pasted around, if even some parts of it could be extracted into functions, rather than using macros. @dullbananas might be able to help with diesel function return types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of query should be inferrable if a closure is used.

The type of the column is tricky because columns with multiple different sql types are used. If only one sql type was needed, then BoxableExpression could be used. And closures can't be generic.

Copy link
Collaborator Author

@phiresky phiresky Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using a closure, but even then you need to be able to name the return type and some arg types and I couldn't manage to figure that out..

Also I think it has to be generic which as you said isn't possible

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function like this could probably be used

fn order_and_page_filter_desc<Q, C, T>(
  query: &mut Q,
  column: C,
  options: &PostQuery,
  getter: impl Fn(&PostAggregates) -> T,
)
where
  Q: ThenOrderDsl<dsl::Desc<C>, Output = Q> + FilterDsl<dsl::GtEq<C, T>, Output = Q> + FilterDsl<dsl::GtEq<C, T>, Output = Q>,
  C: Expression,
  C::SqlType: SingleValue + SqlType,
  T: AsExpression<C::SqlType>,

Usage:

order_and_page_filter_desc(&mut query, post_aggregates::featured_local, &options, |a| a.featured_local);

The ascending version would use dsl::Asc instead of dsl::Desc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*query = query.then_order_by(column.desc()); seems to not work because of cannot move out of *querywhich is behind a mutable reference move occurs because*queryhas typeQ, which does not implement the Copy` trait

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the &mut and make the function return the same type used for the query parameter

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll be a great addition, a few thoughts tho:

  • Should this just work based on a simple post_id number, rather than easily reverse engineerable hash functions. See comment below.
  • If cursor based pagination works as simply as where id > cursor and (more filters) LIMIT X, I don't see why specifying upper bounds are necessary.
  • Needs some tests to make sure pagination is working correctly.
  • Should just be a breaking change, rather than supporting the old method.

crates/api_common/src/post.rs Outdated Show resolved Hide resolved
crates/db_views/src/post_view.rs Show resolved Hide resolved
crates/apub/src/api/list_posts.rs Show resolved Hide resolved
crates/db_views/src/post_view.rs Outdated Show resolved Hide resolved
@@ -185,6 +186,34 @@ fn queries<'a>() -> Queries<
};

let list = move |mut conn: DbConn<'a>, options: PostQuery<'a>| async move {
macro_rules! order_and_page_filter_desc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd almost rather this be copy-pasted around, if even some parts of it could be extracted into functions, rather than using macros. @dullbananas might be able to help with diesel function return types.

crates/db_views/src/post_view.rs Outdated Show resolved Hide resolved
crates/db_views/src/post_view.rs Show resolved Hide resolved
crates/apub/src/api/list_posts.rs Outdated Show resolved Hide resolved
crates/db_views/src/post_view.rs Show resolved Hide resolved
crates/db_views/src/post_view.rs Show resolved Hide resolved
@dessalines
Copy link
Member

I'll test this with some production data now.

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its testing out fine, but the upper-bound stuff is really difficult to wrap my head around, as I'm sure it will be for anyone looking at this code.

We desperately need more DB experts to look at this to make sure that its the best way of doing this, but I won't let that hold up this PR; it'll have to come at a later date. I especially wish @johanndt could have a look, but they don't seem to be around.

I'd also really prefer that the filtering and sorting macro be removed, so @dullbananas if you could attempt that, before we merge, it'd be really helpful. Otherwise, I spose we have no choice.


CREATE INDEX idx_post_aggregates_community_hot ON post_aggregates (community_id, featured_local DESC, hot_rank DESC, published DESC);

CREATE INDEX idx_post_aggregates_community_scaled ON post_aggregates (community_id, featured_local DESC, scaled_rank DESC, published DESC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for adding these ones too.

diesel migration redo worked well also.

@phiresky
Copy link
Collaborator Author

I've managed to remove the macro with the help of what dullbananas wrote above.

Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well with local production data. The only minor issue I see is that the deprecated page doesn't work with subscribed, but that's NBD.

@Nutomic
Copy link
Member

Nutomic commented Sep 18, 2023

Is this good to merge?

@phiresky
Copy link
Collaborator Author

It is from my side

@Nutomic Nutomic merged commit 544d30f into LemmyNet:main Sep 18, 2023
1 check passed
@dessalines
Copy link
Member

I'll start to add all these new features into lemmy-ui in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants