Description
The DropOldAssets job looks for any content that has more than 100 versions and will delete those versions. The query that we use there is really non-performant and can be improved. We saw DB usage spike on some customers when the job ran which can bring down a cluster.
See this Slack link for context:
https://dotcms.slack.com/archives/C04FRV45YR5/p1778027539677989
Acceptance Criteria
Insure the query performs the same or better than what we currently have.
Priority
None
Additional Context
Here is the explain from the old query:
_db=# explain SELECT DISTINCT inode, c.identifier, mod_date FROM contentlet c, contentlet_version_info cvi WHERE c.identifier = 'abc' AND c.language_id = 1 AND cvi.working_inode <> c.inode AND cvi.live_inode <> c.inode AND cvi.lang = c.language_id ORDER BY mod_date DESC OFFSET 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=19854.21..19854.21 rows=1 width=80)
-> Unique (cost=19853.60..19854.21 rows=2 width=80)
-> Gather Merge (cost=19853.60..19854.19 rows=5 width=80)
Workers Planned: 2
-> Sort (cost=18853.58..18853.59 rows=2 width=80)
Sort Key: c.mod_date DESC, c.inode
-> HashAggregate (cost=18853.55..18853.57 rows=2 width=80)
Group Key: c.mod_date, c.inode
-> Nested Loop (cost=0.42..17201.10 rows=330491 width=80)
Join Filter: (((cvi.working_inode)::text <> (c.inode)::text) AND ((cvi.live_inode)::text <> (c.inode)::text))
-> Parallel Seq Scan on contentlet_version_info cvi (cost=0.00..11413.72 rows=165246 width=82)
Filter: (lang = 1)
-> Materialize (cost=0.42..3.77 rows=2 width=88)
-> Index Scan using contentlet_ident_covering on contentlet c (cost=0.42..3.76 rows=2 width=88)
Index Cond: ((identifier)::text = 'abc'::text)
Filter: (language_id = 1)
(16 rows)
Here is the explain from the proposed query:
_db=# explain SELECT c.inode, c.identifier, c.mod_date
_db-# FROM contentlet c
_db-# WHERE c.identifier = 'abc'
_db-# AND c.language_id = 1
_db-# AND c.inode NOT IN (
_db(# SELECT working_inode FROM contentlet_version_info
_db(# WHERE identifier = 'abc' AND lang = 1
_db(# UNION ALL
_db(# SELECT live_inode FROM contentlet_version_info
_db(# WHERE identifier = 'abc' AND lang = 1
_db(# )
_db-# ORDER BY c.mod_date DESC
_db-# OFFSET 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9.08..9.08 rows=1 width=80)
-> Sort (cost=9.08..9.08 rows=1 width=80)
Sort Key: c.mod_date DESC
-> Index Scan using contentlet_ident_covering on contentlet c (cost=5.72..9.07 rows=1 width=80)
Index Cond: ((identifier)::text = 'abc'::text)
Filter: ((NOT (ANY ((inode)::text = ((hashed SubPlan 1).col1)::text))) AND (language_id = 1))
SubPlan 1
-> Append (cost=0.42..5.29 rows=2 width=37)
-> Index Scan using contentlet_version_info_pkey on contentlet_version_info (cost=0.42..2.64 rows=1 width=37)
Index Cond: (((identifier)::text = 'abc'::text) AND (lang = 1))
-> Index Scan using contentlet_version_info_pkey on contentlet_version_info contentlet_version_info_1 (cost=0.42..2.64 rows=1 width=37)
Index Cond: (((identifier)::text = 'abc'::text) AND (lang = 1))
(12 rows)
Description
The DropOldAssets job looks for any content that has more than 100 versions and will delete those versions. The query that we use there is really non-performant and can be improved. We saw DB usage spike on some customers when the job ran which can bring down a cluster.
See this Slack link for context:
https://dotcms.slack.com/archives/C04FRV45YR5/p1778027539677989
Acceptance Criteria
Insure the query performs the same or better than what we currently have.
Priority
None
Additional Context
Here is the explain from the old query:
Here is the explain from the proposed query: