Skip to content

Improve performance of DropOldAssets Query #35590

@wezell

Description

@wezell

Description

The DropOldAssets job looks for any content that has more than 100 versions and will delete those versions. The query that we use there is really non-performant and can be improved. We saw DB usage spike on some customers when the job ran which can bring down a cluster.

Image

See this Slack link for context:

https://dotcms.slack.com/archives/C04FRV45YR5/p1778027539677989

Acceptance Criteria

Insure the query performs the same or better than what we currently have.

Priority

None

Additional Context

Here is the explain from the old query:

_db=# explain SELECT                                                                                                                                                                                                   DISTINCT inode,                                                                                                                                                                                                                c.identifier,                                                                                                                                                                                                                  mod_date                                                                                                                                                                                                                     FROM                                                                                                                                                                                                                             contentlet c,                                                                                                                                                                                                                  contentlet_version_info cvi                                                                                                                                                                                                  WHERE                                                                                                                                                                                                                            c.identifier = 'abc'                                                                                                                                                                                                           AND c.language_id = 1                                                                                                                                                                                                          AND cvi.working_inode <> c.inode                                                                                                                                                                                               AND cvi.live_inode <> c.inode                                                                                                                                                                                                  AND cvi.lang = c.language_id                                                                                                                                                                                                 ORDER BY                                                                                                                                                                                                                         mod_date DESC                                                                                                                                                                                                                OFFSET                                                                                                                                                                                                                          10;
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19854.21..19854.21 rows=1 width=80)
   ->  Unique  (cost=19853.60..19854.21 rows=2 width=80)
         ->  Gather Merge  (cost=19853.60..19854.19 rows=5 width=80)
               Workers Planned: 2
               ->  Sort  (cost=18853.58..18853.59 rows=2 width=80)
                     Sort Key: c.mod_date DESC, c.inode
                     ->  HashAggregate  (cost=18853.55..18853.57 rows=2 width=80)
                           Group Key: c.mod_date, c.inode
                           ->  Nested Loop  (cost=0.42..17201.10 rows=330491 width=80)
                                 Join Filter: (((cvi.working_inode)::text <> (c.inode)::text) AND ((cvi.live_inode)::text <> (c.inode)::text))
                                 ->  Parallel Seq Scan on contentlet_version_info cvi  (cost=0.00..11413.72 rows=165246 width=82)
                                       Filter: (lang = 1)
                                 ->  Materialize  (cost=0.42..3.77 rows=2 width=88)
                                       ->  Index Scan using contentlet_ident_covering on contentlet c  (cost=0.42..3.76 rows=2 width=88)
                                             Index Cond: ((identifier)::text = 'abc'::text)
                                             Filter: (language_id = 1)
(16 rows)

Here is the explain from the proposed query:

_db=# explain   SELECT c.inode, c.identifier, c.mod_date
_db-#   FROM contentlet c
_db-#   WHERE c.identifier = 'abc'
_db-#     AND c.language_id = 1
_db-#     AND c.inode NOT IN (
_db(#       SELECT working_inode FROM contentlet_version_info
_db(#        WHERE identifier = 'abc' AND lang = 1
_db(#       UNION ALL
_db(#       SELECT live_inode  FROM contentlet_version_info
_db(#        WHERE identifier = 'abc' AND lang = 1
_db(#     )
_db-#   ORDER BY c.mod_date DESC
_db-#   OFFSET 10;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=9.08..9.08 rows=1 width=80)
   ->  Sort  (cost=9.08..9.08 rows=1 width=80)
         Sort Key: c.mod_date DESC
         ->  Index Scan using contentlet_ident_covering on contentlet c  (cost=5.72..9.07 rows=1 width=80)
               Index Cond: ((identifier)::text = 'abc'::text)
               Filter: ((NOT (ANY ((inode)::text = ((hashed SubPlan 1).col1)::text))) AND (language_id = 1))
               SubPlan 1
                 ->  Append  (cost=0.42..5.29 rows=2 width=37)
                       ->  Index Scan using contentlet_version_info_pkey on contentlet_version_info  (cost=0.42..2.64 rows=1 width=37)
                             Index Cond: (((identifier)::text = 'abc'::text) AND (lang = 1))
                       ->  Index Scan using contentlet_version_info_pkey on contentlet_version_info contentlet_version_info_1  (cost=0.42..2.64 rows=1 width=37)
                             Index Cond: (((identifier)::text = 'abc'::text) AND (lang = 1))
(12 rows)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions