Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE statements to post_aggregates for the same single row #3877

Closed
4 tasks done
RocketDerp opened this issue Aug 14, 2023 · 5 comments

Comments

@RocketDerp
Copy link
Contributor

RocketDerp commented Aug 14, 2023

Requirements

  • Is this a feature request? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a feature request? Do not put multiple feature requests in one issue.
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Is your proposal related to a problem?

Lemmy's PostgreSQL backend is having overloads and needs performance optimization. The comment table is at the heart of the site and one of the largest database tables.

Current TRIGGER function in 0.18.4 executes 3 independent UPDATE interactions (1 of which changes no rows) to the same post_aggregate row when a comment is INSERT:

CREATE FUNCTION public.post_aggregates_comment_count() RETURNS trigger
    LANGUAGE plpgsql
    AS $$
BEGIN
    -- Check for post existence - it may not exist anymore
    IF TG_OP = 'INSERT' OR EXISTS (
        SELECT
            1
        FROM
            post p
        WHERE
            p.id = OLD.post_id) THEN
        IF (was_restored_or_created (TG_OP, OLD, NEW)) THEN
            UPDATE
                post_aggregates pa
            SET
                comments = comments + 1
            WHERE
                pa.post_id = NEW.post_id;
        ELSIF (was_removed_or_deleted (TG_OP, OLD, NEW)) THEN
            UPDATE
                post_aggregates pa
            SET
                comments = comments - 1
            WHERE
                pa.post_id = OLD.post_id;
        END IF;
    END IF;
    IF TG_OP = 'INSERT' THEN
        UPDATE
            post_aggregates pa
        SET
            newest_comment_time = NEW.published
        WHERE
            pa.post_id = NEW.post_id;
        -- A 2 day necro-bump limit
        UPDATE
            post_aggregates pa
        SET
            newest_comment_time_necro = NEW.published
        FROM
            post p
        WHERE
            pa.post_id = p.id
            AND pa.post_id = NEW.post_id
            -- Fix issue with being able to necro-bump your own post
            AND NEW.creator_id != p.creator_id
            AND pa.published > ('now'::timestamp - '2 days'::interval);
    END IF;
    RETURN NULL;
END
$$;

Describe the solution you'd like.

Consolidate this into one single UPDATE statement, cutting PostgreSQL WAL activity and dead tuples

Describe alternatives you've considered.

A more difficult logic to do aggregates in a more holistic approach.

Additional context

PostgreSQL keeps all the row versions in the table data structure. It means an UPDATE query keeps the existing row version (a.k.a. dead tuple) and creates a new version with updated data. This cruft slows down SELECT performance due to multiple updates.

The 'SELECT
1
FROM
post p
WHERE
p.id = OLD.post_id)' logic can likely be made a WHERE clause of the UPDATE.

In testing, each of these UPDATES for every comment INSERT, 13,159 comment INSERT operations in this particular data sample, pg_stat_statements output:

image

For the 13,159 new comment INSERT, the first column is times statement is executed, 2nd column is count of rows actually updated, third column is average mean ms of execution time by PostgreSQL, final column is the SQL statement. As you can see, on new INSERT of a comment, one of the statements always updates zero rows and still takes 0.7 ms to execute. This was on a high-performance testing system where 13,159 was all the rows en the entire Lemmy instance - real-world servers would be even slower on each of these UPDATE executions.

@RocketDerp RocketDerp added the enhancement New feature or request label Aug 14, 2023
@RocketDerp RocketDerp changed the title Lemmy 0.18.4 PostgreSQL, INSERT of every new comment immediately does 2 updates to post_aggregates Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE transactions to post_aggregates Aug 14, 2023
@RocketDerp

This comment was marked as abuse.

@RocketDerp

This comment was marked as abuse.

@phiresky
Copy link
Collaborator

phiresky commented Aug 16, 2023

Your terminology is a bit wrong, all those UPDATEs happen in the same transaction. So it's one transaction but with three UPDATE statements.

You should be able to merge that necro bumb thing into the same update statement with something lik

update ... set newest_comment_time_necro = CASE WHEN (NEW.creator_id != p.creator_id AND pa.published > ('now'::timestamp - '2 days'::interval)
THEN NEW.published
ELSE newest_comment_time_necro
END
...

Really though, I think we should rewrite the triggers to merge them all, I'm not sure if these incremental changes get us too much

@RocketDerp RocketDerp changed the title Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE transactions to post_aggregates Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE statements to post_aggregates Aug 16, 2023
@RocketDerp

This comment was marked as abuse.

@RocketDerp RocketDerp changed the title Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE statements to post_aggregates Lemmy 0.18.4 PostgreSQL, INSERT of every new Lemmy comment immediately executes 3 separate UPDATE statements to post_aggregates for the same single row Aug 16, 2023
@RocketDerp

This comment was marked as abuse.

@dessalines dessalines closed this as not planned Won't fix, can't repro, duplicate, stale Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants