Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Denormalize community_id into post_aggregates for a 1000x speed-up when loading posts #3653

Merged
merged 6 commits into from
Jul 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions crates/db_schema/src/aggregates/structs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,8 @@ pub struct PostAggregates {
pub featured_local: bool,
pub hot_rank: i32,
pub hot_rank_active: i32,
pub community_id: CommunityId,
pub creator_id: PersonId,
}

#[derive(PartialEq, Eq, Debug, Serialize, Deserialize, Clone)]
Expand Down
4 changes: 4 additions & 0 deletions crates/db_schema/src/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -672,6 +672,8 @@ diesel::table! {
featured_local -> Bool,
hot_rank -> Int4,
hot_rank_active -> Int4,
community_id -> Int4,
creator_id -> Int4,
}
}

Expand Down Expand Up @@ -908,6 +910,8 @@ diesel::joinable!(person_post_aggregates -> post (post_id));
diesel::joinable!(post -> community (community_id));
diesel::joinable!(post -> language (language_id));
diesel::joinable!(post -> person (creator_id));
diesel::joinable!(post_aggregates -> community (community_id));
diesel::joinable!(post_aggregates -> person (creator_id));
diesel::joinable!(post_aggregates -> post (post_id));
diesel::joinable!(post_like -> person (person_id));
diesel::joinable!(post_like -> post (post_id));
Expand Down
2 changes: 2 additions & 0 deletions crates/db_views/src/post_report_view.rs
Original file line number Diff line number Diff line change
Expand Up @@ -470,6 +470,8 @@ mod tests {
featured_local: false,
hot_rank: 1728,
hot_rank_active: 1728,
community_id: inserted_post.community_id,
creator_id: inserted_post.creator_id,
},
resolver: None,
};
Expand Down
50 changes: 26 additions & 24 deletions crates/db_views/src/post_view.rs
sunaurus marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -73,56 +73,56 @@ impl PostView {

// The left join below will return None in this case
let person_id_join = my_person_id.unwrap_or(PersonId(-1));
let mut query = post::table
.find(post_id)
let mut query = post_aggregates::table
.filter(post_aggregates::post_id.eq(post_id))
.inner_join(person::table)
.inner_join(community::table)
.left_join(
community_person_ban::table.on(
post::community_id
post_aggregates::community_id
.eq(community_person_ban::community_id)
.and(community_person_ban::person_id.eq(post::creator_id)),
.and(community_person_ban::person_id.eq(post_aggregates::creator_id)),
),
)
.inner_join(post_aggregates::table)
.inner_join(post::table)
.left_join(
community_follower::table.on(
post::community_id
post_aggregates::community_id
.eq(community_follower::community_id)
.and(community_follower::person_id.eq(person_id_join)),
),
)
.left_join(
post_saved::table.on(
post::id
post_aggregates::post_id
.eq(post_saved::post_id)
.and(post_saved::person_id.eq(person_id_join)),
),
)
.left_join(
post_read::table.on(
post::id
post_aggregates::post_id
.eq(post_read::post_id)
.and(post_read::person_id.eq(person_id_join)),
),
)
.left_join(
person_block::table.on(
post::creator_id
post_aggregates::creator_id
.eq(person_block::target_id)
.and(person_block::person_id.eq(person_id_join)),
),
)
.left_join(
post_like::table.on(
post::id
post_aggregates::post_id
.eq(post_like::post_id)
.and(post_like::person_id.eq(person_id_join)),
),
)
.left_join(
person_post_aggregates::table.on(
post::id
post_aggregates::post_id
sunaurus marked this conversation as resolved.
Show resolved Hide resolved
.eq(person_post_aggregates::post_id)
.and(person_post_aggregates::person_id.eq(person_id_join)),
),
Expand Down Expand Up @@ -217,62 +217,62 @@ impl<'a> PostQuery<'a> {
let person_id_join = self.local_user.map(|l| l.person_id).unwrap_or(PersonId(-1));
let local_user_id_join = self.local_user.map(|l| l.id).unwrap_or(LocalUserId(-1));

let mut query = post::table
let mut query = post_aggregates::table
.inner_join(person::table)
.inner_join(post::table)
.inner_join(community::table)
.left_join(
community_person_ban::table.on(
sunaurus marked this conversation as resolved.
Show resolved Hide resolved
post::community_id
post_aggregates::community_id
.eq(community_person_ban::community_id)
.and(community_person_ban::person_id.eq(post::creator_id)),
.and(community_person_ban::person_id.eq(post_aggregates::creator_id)),
),
)
.inner_join(post_aggregates::table)
.left_join(
community_follower::table.on(
post::community_id
post_aggregates::community_id
.eq(community_follower::community_id)
.and(community_follower::person_id.eq(person_id_join)),
),
)
.left_join(
post_saved::table.on(
post::id
post_aggregates::post_id
.eq(post_saved::post_id)
.and(post_saved::person_id.eq(person_id_join)),
),
)
.left_join(
post_read::table.on(
post::id
post_aggregates::post_id
.eq(post_read::post_id)
.and(post_read::person_id.eq(person_id_join)),
),
)
.left_join(
person_block::table.on(
post::creator_id
post_aggregates::creator_id
.eq(person_block::target_id)
.and(person_block::person_id.eq(person_id_join)),
),
)
.left_join(
community_block::table.on(
post::community_id
post_aggregates::community_id
.eq(community_block::community_id)
.and(community_block::person_id.eq(person_id_join)),
),
)
.left_join(
post_like::table.on(
post::id
post_aggregates::post_id
.eq(post_like::post_id)
.and(post_like::person_id.eq(person_id_join)),
),
)
.left_join(
person_post_aggregates::table.on(
post::id
post_aggregates::post_id
.eq(person_post_aggregates::post_id)
.and(person_post_aggregates::person_id.eq(person_id_join)),
),
Expand Down Expand Up @@ -316,12 +316,12 @@ impl<'a> PostQuery<'a> {
query = query.then_order_by(post_aggregates::featured_local.desc());
} else if let Some(community_id) = self.community_id {
query = query
.filter(post::community_id.eq(community_id))
.filter(post_aggregates::community_id.eq(community_id))
.then_order_by(post_aggregates::featured_community.desc());
}

if let Some(creator_id) = self.creator_id {
query = query.filter(post::creator_id.eq(creator_id));
query = query.filter(post_aggregates::creator_id.eq(creator_id));
}

if let Some(listing_type) = self.listing_type {
Expand Down Expand Up @@ -1051,6 +1051,8 @@ mod tests {
featured_local: false,
hot_rank: 1728,
hot_rank_active: 1728,
community_id: inserted_post.community_id,
creator_id: inserted_post.creator_id,
},
subscribed: SubscribedType::NotSubscribed,
read: false,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
-- This file should undo anything in `up.sql`

CREATE OR REPLACE FUNCTION post_aggregates_post()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
BEGIN
IF (TG_OP = 'INSERT') THEN
INSERT INTO post_aggregates (post_id, published, newest_comment_time, newest_comment_time_necro)
VALUES (NEW.id, NEW.published, NEW.published, NEW.published);
ELSIF (TG_OP = 'DELETE') THEN
DELETE FROM post_aggregates WHERE post_id = OLD.id;
END IF;
RETURN NULL;
END
$$;

ALTER TABLE post_aggregates DROP COLUMN community_id, DROP COLUMN creator_id;

Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
-- Your SQL goes here
ALTER TABLE post_aggregates
ADD COLUMN community_id integer REFERENCES community ON UPDATE CASCADE ON DELETE CASCADE,
ADD COLUMN creator_id integer REFERENCES person ON UPDATE CASCADE ON DELETE CASCADE;

CREATE OR REPLACE FUNCTION post_aggregates_post()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
BEGIN
IF (TG_OP = 'INSERT') THEN
INSERT INTO post_aggregates (post_id,
published,
newest_comment_time,
newest_comment_time_necro,
community_id,
creator_id)
sunaurus marked this conversation as resolved.
Show resolved Hide resolved
VALUES (NEW.id, NEW.published, NEW.published, NEW.published, NEW.community_id, NEW.creator_id);
ELSIF (TG_OP = 'DELETE') THEN
DELETE FROM post_aggregates WHERE post_id = OLD.id;
END IF;
RETURN NULL;
END
$$;

UPDATE post_aggregates
SET community_id=post.community_id,
creator_id=post.creator_id
FROM post
WHERE post.id = post_aggregates.post_id;

ALTER TABLE post_aggregates
ALTER COLUMN community_id SET NOT NULL,
ALTER COLUMN creator_id SET NOT NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this one, much like the post table has indexes on community_id and creator_id, it would probably be a good idea to add these two indexes to post_aggregates now too:

IE:

create index idx_post_aggregates_community on post_aggregates (community_id);
create index idx_post_aggregates_creator on post_aggregates (creator_id);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added these indexes as well

Copy link
Collaborator Author

@sunaurus sunaurus Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dessalines I just discovered that adding these two indexes messes up the query plan again 😃

Edit: hmm, I just got some 3-second query plans with the indexes. Then I dropped the indexes - back to a few ms. Then re-added, and it's still at a few ms. So I can't reproduce the problem at the moment, but I will investigate a bit more..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the query plans are definitely consistently worse with those indexes.
With indexes: https://explain.depesz.com/s/W7cF
Without indexes: https://explain.depesz.com/s/4qds

Copy link
Collaborator Author

@sunaurus sunaurus Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted these indexes for now, let me know if you disagree with that

Copy link
Collaborator

@phiresky phiresky Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, you didn't add any index to get the speedup? the "right" index should be post_aggregates(community_id, featured_local desc, hot_rank desc)

it should use that one when the user has only subscribed to a few communities, and the idx_featured_local_hot_rank when subscribed to many communities

Copy link
Collaborator Author

@sunaurus sunaurus Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed a bit on Matrix already, but commenting here as well for visibility:


the "right" index should be post_aggregates(community_id, featured_local desc, hot_rank desc)

This index also messes up the query plan. It indeed makes the queries super fast for users with few subscriptions, but it makes them slow again for others.

Without that index
User with 86 subscriptions: 3ms query execution
User with 0 subscriptions: 700 ms query execution
(I didn't measure exact time, but it's also very fast for logged out users without the index)

With that index
User with 86 subscrptions: 2-3 second query execution
User with 0 subscriptions: 1ms query execution

phiresky on Matrix said:

weird. do the query plans have larger misestimates?
if so try setting default statistics target to 1000 and run analyze again. if not then I don't know why PG is being dumb


I will see tomorrow if playing around with statistics will help. I know there's also CREATE STATISTICS in postgres, but I've never used it so would need to investigate it a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit and sort is the main culprit, but also its weird to me that postgres sometimes chooses a slower index.

We can tweak those indexes later, I'll try to tag the issues as DB