You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE statements to post_aggregates for the same single row
#3877
Closed
4 tasks done
RocketDerp opened this issue
Aug 14, 2023
· 5 comments
Did you check to see if this issue already exists?
Is this only a feature request? Do not put multiple feature requests in one issue.
Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.
Is your proposal related to a problem?
Lemmy's PostgreSQL backend is having overloads and needs performance optimization. The comment table is at the heart of the site and one of the largest database tables.
Current TRIGGER function in 0.18.4 executes 3 independent UPDATE interactions (1 of which changes no rows) to the same post_aggregate row when a comment is INSERT:
CREATE FUNCTION public.post_aggregates_comment_count() RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
-- Check for post existence - it may not exist anymore
IF TG_OP = 'INSERT' OR EXISTS (
SELECT
1
FROM
post p
WHERE
p.id = OLD.post_id) THEN
IF (was_restored_or_created (TG_OP, OLD, NEW)) THEN
UPDATE
post_aggregates pa
SET
comments = comments + 1
WHERE
pa.post_id = NEW.post_id;
ELSIF (was_removed_or_deleted (TG_OP, OLD, NEW)) THEN
UPDATE
post_aggregates pa
SET
comments = comments - 1
WHERE
pa.post_id = OLD.post_id;
END IF;
END IF;
IF TG_OP = 'INSERT' THEN
UPDATE
post_aggregates pa
SET
newest_comment_time = NEW.published
WHERE
pa.post_id = NEW.post_id;
-- A 2 day necro-bump limit
UPDATE
post_aggregates pa
SET
newest_comment_time_necro = NEW.published
FROM
post p
WHERE
pa.post_id = p.id
AND pa.post_id = NEW.post_id
-- Fix issue with being able to necro-bump your own post
AND NEW.creator_id != p.creator_id
AND pa.published > ('now'::timestamp - '2 days'::interval);
END IF;
RETURN NULL;
END
$$;
Describe the solution you'd like.
Consolidate this into one single UPDATE statement, cutting PostgreSQL WAL activity and dead tuples
Describe alternatives you've considered.
A more difficult logic to do aggregates in a more holistic approach.
Additional context
PostgreSQL keeps all the row versions in the table data structure. It means an UPDATE query keeps the existing row version (a.k.a. dead tuple) and creates a new version with updated data. This cruft slows down SELECT performance due to multiple updates.
The 'SELECT
1
FROM
post p
WHERE
p.id = OLD.post_id)' logic can likely be made a WHERE clause of the UPDATE.
In testing, each of these UPDATES for every comment INSERT, 13,159 comment INSERT operations in this particular data sample, pg_stat_statements output:
For the 13,159 new comment INSERT, the first column is times statement is executed, 2nd column is count of rows actually updated, third column is average mean ms of execution time by PostgreSQL, final column is the SQL statement. As you can see, on new INSERT of a comment, one of the statements always updates zero rows and still takes 0.7 ms to execute. This was on a high-performance testing system where 13,159 was all the rows en the entire Lemmy instance - real-world servers would be even slower on each of these UPDATE executions.
The text was updated successfully, but these errors were encountered:
RocketDerp
changed the title
Lemmy 0.18.4 PostgreSQL, INSERT of every new comment immediately does 2 updates to post_aggregates
Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE transactions to post_aggregates
Aug 14, 2023
Your terminology is a bit wrong, all those UPDATEs happen in the same transaction. So it's one transaction but with three UPDATE statements.
You should be able to merge that necro bumb thing into the same update statement with something lik
update ... set newest_comment_time_necro = CASE WHEN (NEW.creator_id!=p.creator_idANDpa.published> ('now'::timestamp-'2 days'::interval)
THEN NEW.published
ELSE newest_comment_time_necro
END
...
Really though, I think we should rewrite the triggers to merge them all, I'm not sure if these incremental changes get us too much
RocketDerp
changed the title
Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE transactions to post_aggregates
Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE statements to post_aggregates
Aug 16, 2023
RocketDerp
changed the title
Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE statements to post_aggregates
Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE statements to post_aggregates for the same single row
Aug 16, 2023
Requirements
Is your proposal related to a problem?
Lemmy's PostgreSQL backend is having overloads and needs performance optimization. The comment table is at the heart of the site and one of the largest database tables.
Current TRIGGER function in 0.18.4 executes 3 independent UPDATE interactions (1 of which changes no rows) to the same post_aggregate row when a comment is INSERT:
Describe the solution you'd like.
Consolidate this into one single UPDATE statement, cutting PostgreSQL WAL activity and dead tuples
Describe alternatives you've considered.
A more difficult logic to do aggregates in a more holistic approach.
Additional context
PostgreSQL keeps all the row versions in the table data structure. It means an UPDATE query keeps the existing row version (a.k.a. dead tuple) and creates a new version with updated data. This cruft slows down SELECT performance due to multiple updates.
The 'SELECT
1
FROM
post p
WHERE
p.id = OLD.post_id)' logic can likely be made a WHERE clause of the UPDATE.
In testing, each of these UPDATES for every comment INSERT, 13,159 comment INSERT operations in this particular data sample, pg_stat_statements output:
For the 13,159 new comment INSERT, the first column is times statement is executed, 2nd column is count of rows actually updated, third column is average mean ms of execution time by PostgreSQL, final column is the SQL statement. As you can see, on new INSERT of a comment, one of the statements always updates zero rows and still takes 0.7 ms to execute. This was on a high-performance testing system where 13,159 was all the rows en the entire Lemmy instance - real-world servers would be even slower on each of these UPDATE executions.
The text was updated successfully, but these errors were encountered: