Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database pinned to 100% CPU after a few hours of running #6589

Open
kurtmc opened this issue Feb 24, 2021 · 47 comments
Open

Database pinned to 100% CPU after a few hours of running #6589

kurtmc opened this issue Feb 24, 2021 · 47 comments

Comments

@kurtmc
Copy link
Member

kurtmc commented Feb 24, 2021

Summary

After upgrading to concourse 7.0.0 I have noticed that sometimes our postgres database CPU will get pinned to 100% usage after a few hours. Currently my work around is to reboot the web node which will drop the CPU usage down.

Steps to reproduce

Run 7.0.0 for a few hours with AWS RDS as the postgres database provider.

Expected results

CPU doesn't get pinned.

Actual results

Here is a screenshot of our RDS instance CPU metric over the past 24 hours.

Screenshot_2021-02-25 RDS · AWS Console

Additional context

Triaging info

  • Concourse version: 7.0.0
  • Browser (if applicable): N/A
  • Did this used to work? Yes
@kurtmc kurtmc added the bug label Feb 24, 2021
@aoldershaw
Copy link
Contributor

Do you have statistics enabled on your Postgres instance? Can you try running some of these queries to get a sense of where the bottleneck is: https://github.com/concourse/concourse/wiki/Schema-Inspection-Queries#show-in-flight-queries

@scottietremendous scottietremendous added this to Current iteration in Epic #6616 Mar 1, 2021
@wduncanfraser
Copy link

We're seeing similar, where Concourse is causing much higher CPU usage on our DB since upgrading to 7.0:

in-flight queries grouped by query, showing longest-running time
slowest_queries.txt

currently active running queries
slowest_queries_active.txt

size and number of hits for each index
index_hits.txt

@clarafu
Copy link
Contributor

clarafu commented Mar 15, 2021

@kurtmc @wduncanfraser Thanks for the schema inspection results! I think one really useful inspection query to run would be the slowest avg queries https://github.com/concourse/concourse/wiki/Schema-Inspection-Queries#show-the-top-n-slowest-queries and maybe also altering the query to order by total_time instead of avg_time. If you would be able to show us the results of those two, that would be amazing!

@chenbh
Copy link
Contributor

chenbh commented Mar 15, 2021

@kirillbilchenko
Copy link
Contributor

slowest_queries.txt

@scottietremendous scottietremendous added this to To do in Roadmap Mar 22, 2021
@robwhitby
Copy link
Contributor

We are seeing the same issue after upgrading from 6.5.1 to 7.1.0. DB CPU has gone from 20% to 75% and the UI has become very sluggish.

image

Query insights tells us this query is the culprit. The 95% query time is 12 seconds!

SELECT
  b.id,
  b.name,
  b.job_id,
  b.resource_id,
  b.resource_type_id,
  b.team_id,
  b.status,
  b.manually_triggered,
  b.created_by,
  b.scheduled,
  b.SCHEMA,
  b.private_plan,
  b.public_plan,
  b.create_time,
  b.start_time,
  b.end_time,
  b.reap_time,
  j.name,
  r.name,
  rt.name,
  b.pipeline_id,
  p.name,
  p.instance_vars,
  t.name,
  b.nonce,
  b.drained,
  b.aborted,
  b.completed,
  b.inputs_ready,
  b.rerun_of,
  rb.name,
  b.rerun_number,
  b.span_context
FROM
  builds b
LEFT OUTER JOIN
  jobs j
ON
  b.job_id = j.id
LEFT OUTER JOIN
  resources r
ON
  b.resource_id = r.id
LEFT OUTER JOIN
  resource_types rt
ON
  b.resource_type_id = rt.id
LEFT OUTER JOIN
  pipelines p
ON
  b.pipeline_id = p.id
LEFT OUTER JOIN
  teams t
ON
  b.team_id = t.id
LEFT OUTER JOIN
  builds rb
ON
  rb.id = b.rerun_of
WHERE
  j.id = $1
  AND b.id > $2
ORDER BY
  COALESCE(b.rerun_of,
    b.id) ASC,
  b.id ASC
LIMIT
  ?

A sample query plan shows the expensive index scan

scan

@aoldershaw
Copy link
Contributor

@robwhitby thanks for providing query insights. We too are seeing that query to be periodically very slow on our (fairly) large deployment. Our query insights dashboard (also GCP) tells us that most of this time is spent in IO wait, rather than CPU time - are you seeing something similar? Are there other costly queries by total time (rather than average time)?

Also, just to get a sense of the size of your deployment, how many pipelines/jobs/builds are there in total?

@simonjohansson
Copy link
Contributor

@aoldershaw

  • 1702 pipelines (203 paused)
  • 8917 jobs
  • 2119708 builds

Screenshot 2021-04-09 at 10 22 23

So no IO wait.

Screenshot 2021-04-09 at 10 24 55

@aoldershaw
Copy link
Contributor

Appreciate those details. Interesting how different our own insights look.

One thought - the bigint migrations leaves a ton of bloat in the DB. We ended up running a VACUUM FULL VERBOSE ANALYZE after upgrading, but this takes a long time and exclusively locks the tables.

@simonjohansson
Copy link
Contributor

One thought - the bigint migrations leaves a ton of bloat in the DB. We ended up running a VACUUM FULL VERBOSE ANALYZE after upgrading, but this takes a long time and exclusively locks the tables.

Ah, yeah, we forgot to do that after the upgrade. Gonna do a PG upgrade to 13 tonight, will try a vacuum aswell. 👍

@simonjohansson
Copy link
Contributor

Upgrade to PG13 failed for unrelated reasons, but a VACUUM .. brought down the avg CPU usage of our instance from ~70% to ~60%.

@aoldershaw
Copy link
Contributor

@simonjohansson thanks for checking that - an improvement, but there's probably a bigger culprit (probably that SELECT b.id, b.name, ... query still). Speaking of, have the insights on that query changed at all? Is it still consuming the majority of your CPU time after the VACUUM?

@simonjohansson
Copy link
Contributor

@aoldershaw yep that query is still the biggest consumer of resources, virtually no difference from the previous image (https://user-images.githubusercontent.com/736248/114151841-d7e2d300-991d-11eb-9db6-e50402861af5.png)

Looking at a couple of examples from the query plan samples graph thingy its still 99% index scan on order_builds_by_rerun_of_or_id_idx

If it helps this is a old installation of Concourse,

concourse=> select start_time from builds limit 1;
          start_time
-------------------------------
 2018-08-22 08:58:05.855862+00

So I dunno if there is some weird state somewhere from an old Concourse version?

@aoldershaw
Copy link
Contributor

@simonjohansson thanks for confirming. Taking a look into how we can optimize that query.

No worries if not, but would you be willing to add a temporary index to your DB to measure the impact it has on that query's performance? It's basically identical to the existing order_builds_by_rerun_of_or_id_idx, but doesn't include ephemeral builds for checks, so shouldn't accrue as much bloat:

CREATE INDEX deleteme ON builds((COALESCE(rerun_of, id)) DESC, id DESC) WHERE job_id IS NOT NULL;

On our fairly large environment, we saw the query time drop from ~25s to ~250ms on a really large job.

Note that if you do end up adding the index, make sure you DROP INDEX deleteme before upgrading, as we're likely to add this in a migration

@clarafu
Copy link
Contributor

clarafu commented Apr 12, 2021

@simonjohansson Would you also be able to show us the entire string for query in the first row in that table? Thanks!

@simonjohansson
Copy link
Contributor

@aoldershaw sure thing! Ill leave it running for a few hours and update.

@clarafu

   id   | name |  status   | scheduled |          start_time           |           end_time            | schema  | private_plan | completed | job_id |          reap_time           | team_id | manually_triggered | interceptible | nonce |                                                                                                                                 public_plan                                                                                                                                  | pipeline_id | drained |          create_time          | aborted | rerun_of | rerun_number | inputs_ready | needs_v6_migration | span_context | resource_id | resource_type_id | created_by
--------+------+-----------+-----------+-------------------------------+-------------------------------+---------+--------------+-----------+--------+------------------------------+---------+--------------------+---------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+---------+-------------------------------+---------+----------+--------------+--------------+--------------------+--------------+-------------+------------------+------------
 180059 | 88   | succeeded | t         | 2018-08-22 08:58:05.855862+00 | 2018-08-22 09:03:05.391021+00 | exec.v2 |              | t         |    740 | 2018-11-26 11:56:24.05605+00 |      33 | f                  | f             |       | {"id":"5b7c84ad","do":[{"id":"5b7c84ab","aggregate":[{"id":"5b7c84aa","get":{"type":"git","name":"git","resource":"git","version":{"ref":"f1c7775f2a52b54a10c8a2810c423460d7e2069b"}}}]},{"id":"5b7c84ac","task":{"name":"Integration Tests - Staging","privileged":true}}]} |         238 | t       | 2019-04-24 20:51:50.016106+00 | f       |          |            0 | t            | t                  |              |             |                  |

or perhaps more readable?

-[ RECORD 1 ]------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
id                 | 180059
name               | 88
status             | succeeded
scheduled          | t
start_time         | 2018-08-22 08:58:05.855862+00
end_time           | 2018-08-22 09:03:05.391021+00
schema             | exec.v2
private_plan       | 
completed          | t
job_id             | 740
reap_time          | 2018-11-26 11:56:24.05605+00
team_id            | 33
manually_triggered | f
interceptible      | f
nonce              | 
public_plan        | {"id":"5b7c84ad","do":[{"id":"5b7c84ab","aggregate":[{"id":"5b7c84aa","get":{"type":"git","name":"git","resource":"git","version":{"ref":"f1c7775f2a52b54a10c8a2810c423460d7e2069b"}}}]},{"id":"5b7c84ac","task":{"name":"Integration Tests - Staging","privileged":true}}]}
pipeline_id        | 238
drained            | t
create_time        | 2019-04-24 20:51:50.016106+00
aborted            | f
rerun_of           | 
rerun_number       | 0
inputs_ready       | t
needs_v6_migration | t
span_context       | 
resource_id        | 
resource_type_id   | 
created_by         | 

@simonjohansson
Copy link
Contributor

Hmm, actually @aoldershaw that index doesnt seem to do much.

               tablename               |                            indexname                            |  num_rows   | table_size | index_size | unique | number_of_scans |  tuples_read  | tuples_fetched 
---------------------------------------+-----------------------------------------------------------------+-------------+------------+------------+--------+-----------------+---------------+----------------
 builds                                | deleteme                                                        | 1.92774e+06 | 2156 MB    | 58 MB      | N      |               0 |             0 |              0
 builds                                | order_builds_by_rerun_of_or_id_idx                              | 1.92774e+06 | 2156 MB    | 117 MB     | N      |         2685865 | 1125915406580 |  1125716587765

@aoldershaw
Copy link
Contributor

@simonjohansson hm, that's unfortunate. @clarafu suggested a better index than the one I gave:

CREATE INDEX deleteme ON builds (job_id, COALESCE(rerun_of, id) DESC, id DESC) WHERE job_id IS NOT NULL;

This takes the original query that took ~20s down to ~20ms. Would you be willing to give that one a shot instead? Hopefully Postgres will realize it's a more optimal index to use than order_builds_by_rerun_of_or_id_idx

@simonjohansson
Copy link
Contributor

simonjohansson commented Apr 12, 2021

@aoldershaw Thanks! Applied it and number_of_scans, tuples_read and tuples_fetched are all incrementing for the update deleteme index now and order_builds_by_rerun_of_or_id_idx's values are stagnant..

Screenshot 2021-04-12 at 20 56 07

CPU usage down to 44% from ~57%, so an improvement!

Still up from ~20% before the upgrade though :)

@aoldershaw
Copy link
Contributor

@simonjohansson Good to know, thanks! Curious to see if that query is still the primary source of slowness after running for a while.

Overall, we suspect that builds for checks will contribute a ton of bloat in other places as well, so we're discussing partitioning the builds table to not affect regular operations with the churn of creating/deleting these builds constantly.

@aoldershaw aoldershaw self-assigned this Apr 12, 2021
@aoldershaw aoldershaw moved this from To do to In flight in Roadmap Apr 12, 2021
@simonjohansson
Copy link
Contributor

Hazzah,

Screenshot 2021-04-12 at 22 23 47

Now the big query is the second, first one is UTILITY COMMAND, unclear what that is but I think it's GCP SQL related.

@aoldershaw
Copy link
Contributor

first one is UTILITY COMMAND, unclear what that is but I think it's GCP SQL related

Yeah, not sure what we can do about that to be honest. Nice to see the CPU drop a fair bit, but it's odd that it's still up 20% since upgrading, especially considering no other queries look too taxing on the DB. Let us know if you see any queries that seem to be really slow, either by total or average execution time.

Also, thanks a lot for helping us debug!

@simonjohansson
Copy link
Contributor

@aoldershaw super 👍 I'll plan it in for next week.

@xeivieni
Copy link

We are facing similar issues but the VACCUM and upgrade to 7.2.0 did not help.

Our stack has ~500 pipelines. Before the update, running in 6.4.1, we had around 40% cpu usage on our db and some spikes during the day. But since the update, cpu usage is around 60% and are a bit concerned about it.

image

Here is our slowest query :

SELECT 
  b.id,
  b.name,
  b.job_id,
  b.resource_id,
  b.resource_type_id,
  b.team_id,
  b.status,
  b.manually_triggered,
  b.created_by,
  b.scheduled,
  b.schema,
  b.private_plan,
  b.public_plan,
  b.create_time,
  b.start_time,
  b.end_time,
  b.reap_time,
  j.name,
  r.name,
  rt.name,
  b.pipeline_id,
  p.name,
  p.instance_vars,
  t.name,
  b.nonce,
  b.drained,
  b.aborted,
  b.completed,
  b.inputs_ready,
  b.rerun_of,
  rb.name,
  b.rerun_number,
  b.span_context
FROM 
  builds b 
LEFT OUTER JOIN 
  jobs j 
ON 
  b.job_id = j.id 
LEFT OUTER JOIN 
  resources r 
ON 
  b.resource_id = r.id 
LEFT OUTER JOIN 
  resource_types rt 
ON 
  b.resource_type_id = rt.id 
LEFT OUTER JOIN 
  pipelines p 
ON 
  b.pipeline_id = p.id 
LEFT OUTER JOIN 
  teams t 
ON 
  b.team_id = t.id 
LEFT OUTER JOIN 
  builds rb 
ON 
  rb.id = b.rerun_of 
WHERE 
  b.completed = $1 AND b.drained = $2

We also have lots of locks since the update to v7:

2021-04-21 17:10:11 UTC:10.102.144.16(54162):concourseci@atc:[3087]:LOG:  process 3087 still waiting for AccessExclusiveLock on tuple (2,43) of relation 17610 of database 16391 after 1000.040 ms
2021-04-21 17:10:11 UTC:10.102.144.16(54162):concourseci@atc:[3087]:DETAIL:  Process holding the lock: 3066. Wait queue: 2852, 3087, 1977, 3067, 3081, 3080, 3065, 2875, 1859, 2860, 2163, 3070, 3071, 2462, 3068, 3085.
2021-04-21 17:10:11 UTC:10.102.144.16(54162):concourseci@atc:[3087]:STATEMENT:  SELECT id, last_referenced FROM resource_configs WHERE base_resource_type_id = $1 AND source_hash = $2 FOR UPDATE
2021-04-21 17:10:11 UTC:10.102.144.16(54162):concourseci@atc:[3087]:LOG:  process 3087 acquired AccessExclusiveLock on tuple (2,43) of relation 17610 of database 16391 after 1064.815 ms
2021-04-21 17:10:11 UTC:10.102.144.16(54162):concourseci@atc:[3087]:STATEMENT:  SELECT id, last_referenced FROM resource_configs WHERE base_resource_type_id = $1 AND source_hash = $2 FOR UPDATE
2021-04-21 17:10:11 UTC:10.102.144.16(54162):concourseci@atc:[3087]:LOG:  duration: 1093.413 ms  execute <unnamed>: SELECT id, last_referenced FROM resource_configs WHERE base_resource_type_id = $1 AND source_hash = $2 FOR UPDATE
2021-04-21 17:10:11 UTC:10.102.144.16(54162):concourseci@atc:[3087]:DETAIL:  parameters: $1 = '4', $2 = 'bdab2145017d459478398c8e11340172de5bd1693377b52fe8f6f39d3b38a7ae'

@clarafu
Copy link
Contributor

clarafu commented Apr 22, 2021

@xeivieni It looks like you have the syslog-drainer turned on and probably with the new checks as builds feature, it might be trying to send way too much builds. I guess I have a question for you then, do you think it would be useful to have check build logs in your syslog server? Or do you only really care about job build logs? Prior to 7.0.0, we only drained job build logs so your server would only contain logs for job builds. But after v7.0.0 that includes the change to have checks be run as "builds", we will also drain check build logs. Is that useful to you at all or is that just unnecessary noise?

Hopping onto that question, do you also think its useful to drain one-off build logs? So for example, if someone ran fly execute that would create a one-off build. Do you think those builds should also be sent to your syslog server?

Thanks!

@xeivieni
Copy link

Hi @clarafu thanks for your answer. Indeed, the volume of logs sent to our ELK is way higher since our update :
image

I'm not sure we absolutely need the check build logs to be pushed to our logging platform, but if it can provide information why not keeping it ?

Can it have an impact on our database CPU usage or is it just an unrelated question ?

@aoldershaw aoldershaw removed their assignment Apr 29, 2021
@aoldershaw aoldershaw moved this from In flight to To do in Roadmap Apr 30, 2021
@Rukenshia
Copy link

Hi,

we also tried to upgrade from 6.7.0 to 7.2.0 this weekend, but ran into the same issue.

Cluster information

  • teams: 300
  • pipelines: 2100 (200 paused)
  • builds: 1.5 million
  • running on AWS, deployed via bosh
  • web: 2 nodes (c5.4xlarge)
  • workers: 45 (c5.4xlarge)
  • database: db.r5.4xlarge (before upgrade) -> db.r5.16xlarge (after upgrade)

Notes from the upgrade related to the DB

  • database load at 6.7.0: ~30% on a db.r5.4xlarge
  • after upgrade, the db.r5.4xlarge was maxed out
  • -> db.r5.8xlarge maxed out
  • -> db.r5.12xlarge maxed out
  • -> db.r5.16xlarge maxed out (did not attempt further scaling)

Queries from Schema Inspection Queries

Show in-flight queries ...grouped by query, showing longest-running time:
slowest_queries.txt

Show in-flight queries ...currently active running queries:
slowest_queries_active.txt

Show the top N slowest queries
top_n_slowest_queries.txt

Show foreign keys without an index
foreign_keys_without_index.txt

Show the size and number of hits for each index
index_hits.txt

AWS RDS Performance Insights:

PerformanceInsightsCounterMetrics
PerformanceInsightsDatabaseLoad
PerformanceInsightsTopSQL

unrelated to this specific issue, we also found out that git-multibranch resources lost all their version history once again and triggered a lot of builds which we aborted manually for a while.

For now, we have downgraded back to 6.7 because the environment was unusable. Let me know if theres any other information you need - happy to provide more (if we stored that while 7.2.0 was running :)).

@VengefulAncient
Copy link

UTILITY COMMAND

This is the only place on the entire web I have found so far that mentions this weird "query". If anyone has the slightest idea of what it is, please, share - it's affecting us too.

@simonjohansson
Copy link
Contributor

This is the only place on the entire web I have found so far that mentions this weird "query". If anyone has the slightest idea of what it is, please, share - it's affecting us too.

Whats the breakdown for you? Currently for us
Screenshot 2021-05-28 at 16 55 44

@VengefulAncient
Copy link

VengefulAncient commented May 28, 2021

Whats the breakdown for you? Currently for us

image

This is our staging instance (unfortunately, we have Query Insights disabled in production, there's currently a bug that makes it cause segfaults) over the last 6 hours. More CPU time than our actual heavy queries, seems to be always creating a background load of 20-30% with nothing else going on. Also, there were many instances in the past where it had a lot of lock wait (not as much as yours but still at least 1/3 of total time), just not this time.

(This is a single-core PG 12 instance with 3.75 GB RAM)

@aoldershaw
Copy link
Contributor

I was under the impression that the UTILITY COMMAND comes from GCP's Query Insights itself, but I can't find any documentation to support that

@simonjohansson
Copy link
Contributor

simonjohansson commented May 31, 2021

Screenshot 2021-05-31 at 21 56 19

A new challenger appears after upgrading too 7.3.1. Running a PG 13 instance with 8 cores and 32GB Ram.

UPDATE
  resource_configs
SET
  last_referenced = now()
WHERE
  base_resource_type_id = $1
  AND source_hash = $2 RETURNING id,
  last_referenced

Screenshot 2021-05-31 at 21 55 32

@vito
Copy link
Member

vito commented Jun 2, 2021

Did some digging, that query on its own shouldn't be super expensive at least. It's a direct index hit:

Update on resource_configs  (cost=0.41..8.43 rows=1 width=103) (actual time=0.031..0.031 rows=0 loops=1)
  ->  Index Scan using resource_configs_base_resource_type_id_so_key on resource_configs  (cost=0.41..8.43 rows=1 width=103) (actual time=0.031..0.031 rows=0 loops=1)
        Index Cond: ((base_resource_type_id = 5) AND (source_hash = 'x'::text))
Planning time: 0.090 ms
Execution time: 0.056 ms

It'll be called every time a check runs but I wouldn't expect that query to be a bottleneck. 🤔

How'd you come across it? I guess just sorting that load by total time column?

Would you mind running this query and sharing the results? Could help us get an idea of the scale we're comparing to.

SELECT base_resource_type_id, count(1)
FROM resource_configs
GROUP BY base_resource_type_id;

Thanks!

@simonjohansson
Copy link
Contributor

simonjohansson commented Jun 4, 2021

How'd you come across it? I guess just sorting that load by total time column?

By default the query inspector sorts queries by Load by total time, thats how we found it. Orange colour means Lock Wait, green is CPU and CPU Wait.

Would you mind running this query and sharing the results? Could help us get an idea of the scale we're comparing to.

 base_resource_type_id | count
-----------------------+-------
                     6 |   316
                     7 |  1608
                     8 |     3
                    12 |  1340
                    13 |     2
                    15 |   414
                       |  1920

@vito
Copy link
Member

vito commented Jun 4, 2021

Orange colour means Lock Wait, green is CPU and CPU Wait.

Neat, good to know - that makes more sense then, so the query itself is doing very little (tiny green segment) but it's spending most of its time waiting around on locks, probably because the query is being run too often (combined with whatever other queries involve locking the table).

Maybe we could adjust the query to not update every single time, and instead update when it's halfway or closer to the expiry time (1 hr check timeout + 5 mins by default). That would reduce writes from one per minute (per resource) to one per half hour or so (per resource).

@simonjohansson
Copy link
Contributor

That would reduce writes from one per minute (per resource) to one per half hour or so (per resource).

Every bit helps :)

Do you know what else might have caused the spike and volatility of the load between 7.2 and 7.3.1?

Screenshot 2021-06-07 at 14 44 03

7.2 was stable around 40% CPU for us and 7.3.1 is a bit all over the place

@Rukenshia
Copy link

Are the two queries in this issue related at all? Did anyone here find a workaround for the originally posted query?

We'd like to upgrade to Concourse 7, but are still currently blocked by this. If there's any info I can give to help please let me know.

@vito
Copy link
Member

vito commented Jun 11, 2021

@Rukenshia Admittedly this issue has kind of turned into an all-purpose inefficient database query hunting ground - we could close it with the next couple of fixes.

The slowest query reported in your case (the one with the costly LEFT JOIN LATERAL) should be addressed by a refactor @clarafu has been working on. It's done, but needed some rebasing, which she's finishing up now as long as nothing else gets merged in. 😄

@simonjohansson

Do you know what else might have caused the spike and volatility of the load between 7.2 and 7.3.1?

Hmm checking the release history nothing immediately jumps out. I think we're doing the right thing in looking at the query performance. Maybe run the inspection queries again?


Thanks everyone for your patience and for providing information that lets us whittle away at these performance issues. The 7.0 release introduced significant architectural changes that simplify the internal codebase, with the tradeoff being that it makes previously warm code paths hotter (namely, more builds are scheduled). The upside is that it has surfaced longstanding issues (e.g. #7120) and led to optimizations that should improve overall build performance, so despite the initial churn we're getting to a better place overall.

@Rukenshia
Copy link

Thanks a lot for the update, I really appreciate it and look forward to the fixes!

@simonjohansson
Copy link
Contributor

Maybe run the inspection queries again?

Show foreign keys without an index

               conrelid               |                             conname                             | reltuples
--------------------------------------+-----------------------------------------------------------------+-----------
 volumes                              | volumes_worker_artifact_id_fkey                                 |        95
 volumes                              | worker_resource_certs_id_fkey                                   |        95
 volumes                              | volumes_parent_id_fkey                                          |        95
 build_image_resource_caches          | build_image_resource_caches_job_id_fkey                         |        57
 resource_config_versions             | resource_config_versions_resource_config_scope_id_fkey          |        42
 resources                            | resources_resource_config_scope_id_fkey                         |        33
 worker_resource_caches               | worker_resource_caches_worker_name_fkey                         |        30
 resource_config_scopes               | resource_config_scopes_resource_id_fkey                         |        30
 next_build_inputs                    | next_build_inputs_resource_id_fkey                              |        26
 build_resource_config_version_inputs | build_resource_config_version_inputs_resource_id_fkey           |        18
 build_resource_config_version_inputs | build_resource_config_version_inputs_build_id_fkey              |        18
 resource_config_check_sessions       | resource_config_check_sessions_worker_base_resource_type_id_fke |        11
 next_build_pipes                     | next_build_pipes_from_build_id_fkey                             |         1
 worker_resource_certs                | worker_resource_certs_worker_name_fkey                          |         0
 resource_disabled_versions           | resource_disabled_versions_resource_id_fkey                     |         0
 build_pipes                          | build_pipes_from_build_id_fkey                                  |         0
 worker_artifacts                     | worker_artifacts_build_id_fkey                                  |         0

Show the size and number of hits for each index

               tablename               |                            indexname                            | num_rows | table_size | index_size | unique | number_of_scans | tuples_read | tuples_fetched 
---------------------------------------+-----------------------------------------------------------------+----------+------------+------------+--------+-----------------+-------------+----------------
 access_tokens                         | access_tokens_pkey                                              |       35 | 32 kB      | 16 kB      | Y      |              90 |          83 |             83
 auth_code                             | auth_code_pkey                                                  |        0 | 8192 bytes | 16 kB      | Y      |               8 |           8 |              8
 auth_request                          | auth_request_pkey                                               |        0 | 8192 bytes | 16 kB      | Y      |              32 |          32 |             32
 base_resource_types                   | base_resource_types_name_key                                    |        0 | 8192 bytes | 16 kB      | Y      |          204217 |      204189 |         204189
 base_resource_types                   | base_resource_types_pkey                                        |        0 | 8192 bytes | 16 kB      | Y      |            1332 |        1332 |           1332
 build_events                          | build_events_build_id_event_id                                  |        0 | 0 bytes    | 8192 bytes | Y      |               0 |           0 |              0
 build_events                          | build_events_build_id_old_event_id                              |        0 | 0 bytes    | 8192 bytes | Y      |               0 |           0 |              0
 build_image_resource_caches           | build_image_resource_caches_build_id                            |       57 | 8192 bytes | 16 kB      | N      |           54258 |       54293 |          53746
 build_image_resource_caches           | build_image_resource_caches_job_build_idx                       |       57 | 8192 bytes | 16 kB      | N      |             497 |          71 |             36
 build_image_resource_caches           | build_image_resource_caches_resource_cache_id                   |       57 | 8192 bytes | 16 kB      | N      |              74 |          15 |              0
 build_pipes                           | build_pipes_from_build_id_to_build_id_uniq                      |        0 | 8192 bytes | 16 kB      | Y      |          113386 |          64 |              0
 build_pipes                           | build_pipes_to_build_id_idx                                     |        0 | 8192 bytes | 16 kB      | N      |          113366 |          64 |              0
 build_resource_config_version_inputs  | build_inputs_resource_versions_idx                              |       18 | 8192 bytes | 16 kB      | N      |             250 |         268 |            196
 build_resource_config_version_inputs  | build_resource_config_version_inputs_uniq                       |       18 | 8192 bytes | 16 kB      | Y      |            4611 |        3940 |           3646
 build_resource_config_version_outputs | build_resource_config_version_outputs_build_id_idx              |        1 | 8192 bytes | 16 kB      | N      |            9363 |         180 |             80
 build_resource_config_version_outputs | build_resource_config_version_outputs_resource_id_idx           |        1 | 8192 bytes | 16 kB      | N      |             506 |         113 |              9
 build_resource_config_version_outputs | build_resource_config_version_outputs_uniq                      |        1 | 8192 bytes | 16 kB      | Y      |             457 |           3 |              3
 builds                                | build_names_uniq_idx                                            |       56 | 176 kB     | 16 kB      | Y      |             277 |         576 |             72
 builds                                | builds_interceptible_completed                                  |       56 | 176 kB     | 16 kB      | N      |           47780 |     1039015 |          15249
 builds                                | builds_ordered_by_job_id_idx                                    |       56 | 176 kB     | 16 kB      | N      |         1529520 |     1530024 |        1437211
 builds                                | builds_pipeline_id                                              |       56 | 176 kB     | 16 kB      | N      |            2557 |        2798 |           2524
 builds                                | builds_pkey                                                     |       56 | 176 kB     | 16 kB      | Y      |         1478464 |     1169353 |        1033016
 builds                                | builds_resource_id_idx                                          |       56 | 176 kB     | 16 kB      | N      |         5510400 |     5519515 |        4990930
 builds                                | builds_resource_type_id_idx                                     |       56 | 176 kB     | 16 kB      | N      |         4646217 |     4699604 |        4160171
 builds                                | builds_running_builds_by_serial_groups_idx                      |       56 | 176 kB     | 16 kB      | N      |             445 |        1400 |            693
 builds                                | builds_status                                                   |       56 | 176 kB     | 16 kB      | N      |          118379 |      479765 |         319119
 builds                                | builds_team_id                                                  |       56 | 176 kB     | 16 kB      | N      |               0 |           0 |              0
 builds                                | needs_v6_migration_idx                                          |       56 | 176 kB     | 8192 bytes | N      |              98 |           0 |              0
 builds                                | next_build_id_idx                                               |       56 | 176 kB     | 16 kB      | N      |            1614 |       12319 |           4454
 builds                                | order_builds_by_rerun_of_or_id_idx                              |       56 | 176 kB     | 16 kB      | N      |             462 |         508 |            461
 builds                                | order_job_builds_by_rerun_of_or_id_idx                          |       56 | 176 kB     | 16 kB      | N      |          492033 |      165368 |         164610
 builds                                | pending_builds_idx                                              |       56 | 176 kB     | 16 kB      | N      |             398 |        1455 |            381
 builds                                | rerun_of_builds_idx                                             |       56 | 176 kB     | 16 kB      | N      |           34846 |         772 |            716
 builds                                | succeeded_builds_ordering_with_rerun_builds_idx                 |       56 | 176 kB     | 16 kB      | N      |             154 |          90 |             79
 builds                                | unencrypted_private_plans_build_idx                             |       56 | 176 kB     | 16 kB      | N      |               0 |           0 |              0
 cert_cache                            | cert_cache_pkey                                                 |        0 | 0 bytes    | 8192 bytes | Y      |               0 |           0 |              0
 check_build_events                    | check_build_events_build_id_event_id                            |     1782 | 336 kB     | 272 kB     | Y      |            2429 |       21658 |              5
 client                                | client_pkey                                                     |        3 | 8192 bytes | 16 kB      | Y      |              39 |          39 |             39
 components                            | components_name_key                                             |       16 | 8192 bytes | 16 kB      | Y      |             448 |         432 |            432
 components                            | components_pkey                                                 |       16 | 8192 bytes | 16 kB      | Y      |             191 |         191 |            191
 connector                             | connector_pkey                                                  |        2 | 8192 bytes | 16 kB      | Y      |              35 |          35 |             35
 containers                            | containers_build_id                                             |       17 | 24 kB      | 16 kB      | N      |          445838 |      468950 |         417791
 containers                            | containers_build_id_not_null                                    |       17 | 24 kB      | 16 kB      | N      |             589 |        3977 |           2762
 containers                            | containers_handle_key                                           |       17 | 24 kB      | 16 kB      | Y      |               8 |           8 |              8
 containers                            | containers_image_check_container_id                             |       17 | 24 kB      | 16 kB      | N      |             154 |           0 |              0
 containers                            | containers_image_get_container_id                               |       17 | 24 kB      | 16 kB      | N      |             157 |          22 |              3
 containers                            | containers_pkey                                                 |       17 | 24 kB      | 16 kB      | Y      |          401282 |      348140 |         332234
 containers                            | containers_plan_id                                              |       17 | 24 kB      | 16 kB      | N      |             124 |           0 |              0
 containers                            | containers_resource_config_check_session_id                     |       17 | 24 kB      | 16 kB      | N      |             267 |         267 |            267
 containers                            | containers_team_id                                              |       17 | 24 kB      | 16 kB      | N      |               0 |           0 |              0
 containers                            | containers_worker_name                                          |       17 | 24 kB      | 16 kB      | N      |            1576 |        1928 |           1844
 deleted_pipelines                     |                                                                 |        0 | 8192 bytes |            | N      |                 |             |               
 device_request                        | device_request_pkey                                             |        0 | 0 bytes    | 8192 bytes | Y      |               0 |           0 |              0
 device_token                          | device_token_pkey                                               |        0 | 0 bytes    | 8192 bytes | Y      |               0 |           0 |              0
 job_inputs                            | job_inputs_job_id_idx                                           |       27 | 8192 bytes | 16 kB      | N      |            2204 |        2344 |           1884
 job_inputs                            | job_inputs_passed_job_id_idx                                    |       27 | 8192 bytes | 16 kB      | N      |            1010 |        1485 |            874
 job_inputs                            | job_inputs_resource_id_idx                                      |       27 | 8192 bytes | 16 kB      | N      |            1236 |        1355 |           1170
 job_outputs                           | job_outputs_job_id_idx                                          |       31 | 8192 bytes | 16 kB      | N      |            2607 |        1526 |           1070
 job_outputs                           | job_outputs_resource_id_idx                                     |       31 | 8192 bytes | 16 kB      | N      |            1164 |         602 |            462
 jobs                                  | jobs_latest_completed_build_id                                  |       26 | 184 kB     | 16 kB      | N      |          147586 |      167255 |          74873
 jobs                                  | jobs_next_build_id                                              |       26 | 184 kB     | 16 kB      | N      |          140248 |         732 |            324
 jobs                                  | jobs_pipeline_id                                                |       26 | 184 kB     | 16 kB      | N      |          829410 |     2345969 |         376481
 jobs                                  | jobs_pkey                                                       |       26 | 184 kB     | 16 kB      | Y      |         4028663 |     3034004 |        2446444
 jobs                                  | jobs_transition_build_id                                        |       26 | 184 kB     | 16 kB      | N      |          140579 |        1396 |            608
 jobs                                  | jobs_unique_pipeline_id_name                                    |       26 | 184 kB     | 16 kB      | Y      |             730 |         772 |            510
 jobs_serial_groups                    | jobs_serial_groups_job_id_idx                                   |       24 | 8192 bytes | 16 kB      | N      |            2955 |        3093 |           2808
 jobs_serial_groups                    | jobs_serial_groups_pkey                                         |       24 | 8192 bytes | 16 kB      | Y      |               0 |           0 |              0
 keys                                  | keys_pkey                                                       |        0 | 8192 bytes | 16 kB      | Y      |              30 |          28 |             28
 migrations                            |                                                                 |        0 | 8192 bytes |            | N      |                 |             |               
 migrations_history                    |                                                                 |      126 | 8192 bytes |            | N      |                 |             |               
 next_build_inputs                     | next_build_inputs_job_id                                        |       26 | 8192 bytes | 16 kB      | N      |          106492 |      106793 |          91508
 next_build_inputs                     | next_build_inputs_unique_job_id_input_name                      |       26 | 8192 bytes | 16 kB      | Y      |               0 |           0 |              0
 next_build_pipes                      | next_build_pipes_from_build_id_to_job_id_uniq                   |        1 | 8192 bytes | 16 kB      | Y      |             571 |          19 |              0
 next_build_pipes                      | next_build_pipes_job_id_idx                                     |        1 | 8192 bytes | 16 kB      | N      |             213 |          52 |              0
 offline_session                       | offline_session_pkey                                            |        0 | 8192 bytes | 16 kB      | Y      |               8 |           6 |              6
 password                              | password_pkey                                                   |        2 | 8192 bytes | 16 kB      | Y      |              32 |          32 |             32
 pipeline_build_events_100             | pipeline_build_events_100_build_id_event_id                     |       87 | 96 kB      | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_100             | pipeline_build_events_100_build_id_old_event_id                 |       87 | 96 kB      | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_90              | pipeline_build_events_90_build_id_event_id                      |      485 | 552 kB     | 32 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_90              | pipeline_build_events_90_build_id_old_event_id                  |      485 | 552 kB     | 40 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_91              | pipeline_build_events_91_build_id_event_id                      |      104 | 168 kB     | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_91              | pipeline_build_events_91_build_id_old_event_id                  |      104 | 168 kB     | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_92              | pipeline_build_events_92_build_id_event_id                      |       52 | 48 kB      | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_92              | pipeline_build_events_92_build_id_old_event_id                  |       52 | 48 kB      | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_93              | pipeline_build_events_93_build_id_event_id                      |      153 | 168 kB     | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_93              | pipeline_build_events_93_build_id_old_event_id                  |      153 | 168 kB     | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_94              | pipeline_build_events_94_build_id_event_id                      |      120 | 240 kB     | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_94              | pipeline_build_events_94_build_id_old_event_id                  |      120 | 240 kB     | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_95              | pipeline_build_events_95_build_id_event_id                      |      636 | 304 kB     | 40 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_95              | pipeline_build_events_95_build_id_old_event_id                  |      636 | 304 kB     | 40 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_96              | pipeline_build_events_96_build_id_event_id                      |      226 | 264 kB     | 32 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_96              | pipeline_build_events_96_build_id_old_event_id                  |      226 | 264 kB     | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_97              | pipeline_build_events_97_build_id_event_id                      |      283 | 264 kB     | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_97              | pipeline_build_events_97_build_id_old_event_id                  |      283 | 264 kB     | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_98              | pipeline_build_events_98_build_id_event_id                      |      339 | 264 kB     | 32 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_98              | pipeline_build_events_98_build_id_old_event_id                  |      339 | 264 kB     | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_99              | pipeline_build_events_99_build_id_event_id                      |       66 | 88 kB      | 16 kB      | Y      |               0 |           0 |              0
 pipeline_build_events_99              | pipeline_build_events_99_build_id_old_event_id                  |       66 | 88 kB      | 16 kB      | Y      |               0 |           0 |              0
 pipelines                             | parent_job_id                                                   |        9 | 8192 bytes | 16 kB      | N      |              30 |           0 |              0
 pipelines                             | pipelines_name_team_id                                          |        9 | 8192 bytes | 16 kB      | Y      |             323 |         274 |            266
 pipelines                             | pipelines_name_team_id_instance_vars                            |        9 | 8192 bytes | 8192 bytes | Y      |               0 |           0 |              0
 pipelines                             | pipelines_pkey                                                  |        9 | 8192 bytes | 16 kB      | Y      |         5393060 |     5391770 |        5390766
 pipelines                             | pipelines_team_id                                               |        9 | 8192 bytes | 16 kB      | N      |              21 |         182 |              0
 refresh_token                         | refresh_token_pkey                                              |        0 | 0 bytes    | 8192 bytes | Y      |               0 |           0 |              0
 resource_cache_uses                   | resource_cache_uses_build_id                                    |       11 | 8192 bytes | 16 kB      | N      |             337 |         542 |            132
 resource_cache_uses                   | resource_cache_uses_container_id                                |       11 | 8192 bytes | 16 kB      | N      |              50 |           0 |              0
 resource_cache_uses                   | resource_cache_uses_resource_cache_id                           |       11 | 8192 bytes | 16 kB      | N      |              67 |         148 |              0
 resource_caches                       | resource_caches_pkey                                            |       20 | 48 kB      | 16 kB      | Y      |            1131 |        1134 |           1131
 resource_caches                       | resource_caches_resource_config_id                              |       20 | 48 kB      | 16 kB      | N      |               0 |           0 |              0
 resource_caches                       | resource_caches_resource_config_id_version_md5_params_hash_uniq |       20 | 48 kB      | 16 kB      | Y      |          118523 |      127981 |         127774
 resource_config_check_sessions        | resource_config_check_sessions_pkey                             |       11 | 8192 bytes | 16 kB      | Y      |              42 |          42 |             42
 resource_config_check_sessions        | resource_config_check_sessions_resource_config_id               |       11 | 8192 bytes | 16 kB      | N      |               0 |           0 |              0
 resource_config_check_sessions        | resource_config_check_sessions_uniq                             |       11 | 8192 bytes | 16 kB      | Y      |          404710 |      401856 |         395820
 resource_config_scopes                | resource_config_scopes_pkey                                     |       30 | 8192 bytes | 16 kB      | Y      |          106090 |      106065 |         102655
 resource_config_scopes                | resource_config_scopes_resource_config_id_uniq                  |       30 | 8192 bytes | 16 kB      | Y      |             207 |          21 |             21
 resource_config_scopes                | resource_config_scopes_resource_id_resource_config_id_uniq      |       30 | 8192 bytes | 8192 bytes | Y      |               0 |           0 |              0
 resource_config_versions              | resource_config_scope_id_and_version_md5_unique                 |       42 | 40 kB      | 16 kB      | Y      |          145446 |      146849 |         146313
 resource_config_versions              | resource_config_versions_check_order_idx                        |       42 | 40 kB      | 16 kB      | N      |          738600 |      738934 |         732272
 resource_config_versions              | resource_config_versions_pkey                                   |       42 | 40 kB      | 16 kB      | Y      |               0 |           0 |              0
 resource_config_versions              | resource_config_versions_version                                |       42 | 40 kB      | 16 kB      | N      |               0 |           0 |              0
 resource_configs                      | resource_configs_base_resource_type_id                          |       30 | 8192 bytes | 16 kB      | N      |               0 |           0 |              0
 resource_configs                      | resource_configs_base_resource_type_id_so_key                   |       30 | 8192 bytes | 16 kB      | Y      |             198 |         106 |             76
 resource_configs                      | resource_configs_pkey                                           |       30 | 8192 bytes | 16 kB      | Y      |          969515 |      970966 |         967388
 resource_configs                      | resource_configs_resource_cache_id                              |       30 | 8192 bytes | 16 kB      | N      |               0 |           0 |              0
 resource_configs                      | resource_configs_resource_cache_id_so_key                       |       30 | 8192 bytes | 16 kB      | Y      |              74 |           6 |              6
 resource_disabled_versions            | resource_disabled_versions_resource_id_version_md5_uniq         |        0 | 0 bytes    | 8192 bytes | Y      |            1927 |           0 |              0
 resource_pins                         | resource_pins_pkey                                              |        0 | 8192 bytes | 16 kB      | Y      |           77528 |       29903 |          29902
 resource_types                        | resource_types_pipeline_id                                      |       19 | 8192 bytes | 16 kB      | N      |          339812 |      228155 |         153837
 resource_types                        | resource_types_pipeline_id_name_key                             |       19 | 8192 bytes | 16 kB      | Y      |            1456 |         428 |            395
 resource_types                        | resource_types_pkey                                             |       19 | 8192 bytes | 16 kB      | Y      |         5502882 |     5496615 |        4234573
 resource_types                        | resource_types_resource_config_id                               |       19 | 8192 bytes | 16 kB      | N      |           15963 |         114 |             76
 resources                             | resources_build_id_idx                                          |       33 | 24 kB      | 16 kB      | N      |          222965 |      237780 |         213793
 resources                             | resources_pipeline_id                                           |       33 | 24 kB      | 16 kB      | N      |            5016 |      120090 |         119085
 resources                             | resources_pkey                                                  |       33 | 24 kB      | 16 kB      | Y      |         2847766 |     2854564 |        2751070
 resources                             | resources_resource_config_id                                    |       33 | 24 kB      | 16 kB      | N      |          238081 |      243842 |         230239
 resources                             | unique_pipeline_id_name                                         |       33 | 24 kB      | 16 kB      | Y      |             617 |         464 |            341
 successful_build_outputs              | successful_build_outputs_build_id_idx                           |       26 | 8192 bytes | 16 kB      | Y      |             712 |         126 |            126
 successful_build_outputs              | successful_build_outputs_job_id_idx                             |       26 | 8192 bytes | 16 kB      | N      |               1 |           6 |              0
 successful_build_outputs              | successful_build_outputs_ordering_with_rerun_builds_idx         |       26 | 8192 bytes | 16 kB      | N      |              31 |          46 |              0
 successful_build_outputs              | successful_build_outputs_outputs_idx                            |       26 | 8192 bytes | 16 kB      | N      |               0 |           0 |              0
 successful_build_outputs              | successful_build_outputs_rerun_of_idx                           |       26 | 8192 bytes | 16 kB      | N      |             693 |           0 |              0
 task_caches                           | task_caches_job_id                                              |        4 | 8192 bytes | 16 kB      | N      |            1180 |        1860 |           1820
 task_caches                           | task_caches_job_id_step_name_path_uniq                          |        4 | 8192 bytes | 16 kB      | Y      |             505 |          21 |             20
 task_caches                           | task_caches_pkey                                                |        4 | 8192 bytes | 16 kB      | Y      |             200 |         200 |            200
 team_build_events_1                   | team_build_events_1_build_id_event_id                           |        0 | 0 bytes    | 8192 bytes | Y      |              15 |           0 |              0
 team_build_events_2                   | team_build_events_2_build_id_event_id                           |        0 | 0 bytes    | 8192 bytes | Y      |              15 |           0 |              0
 team_build_events_3                   | team_build_events_3_build_id_event_id                           |        0 | 0 bytes    | 8192 bytes | Y      |              15 |           0 |              0
 team_build_events_4                   | team_build_events_4_build_id_event_id                           |        0 | 0 bytes    | 8192 bytes | Y      |              15 |           0 |              0
 teams                                 | index_teams_name_unique_case_insensitive                        |        4 | 8192 bytes | 16 kB      | Y      |            4048 |        4043 |           4043
 teams                                 | teams_pkey                                                      |        4 | 8192 bytes | 16 kB      | Y      |         3547954 |     3535735 |        3535735
 users                                 | user_unique                                                     |        5 | 8192 bytes | 16 kB      | Y      |              53 |          48 |             48
 users                                 | users_pkey                                                      |        5 | 8192 bytes | 16 kB      | Y      |               0 |           0 |              0
 volumes                               | missing_volumes_idx                                             |       95 | 80 kB      | 16 kB      | N      |          132649 |       50464 |             60
 volumes                               | volumes_container_id                                            |       95 | 80 kB      | 16 kB      | N      |          125781 |      232471 |         109266
 volumes                               | volumes_handle                                                  |       95 | 80 kB      | 72 kB      | Y      |           90867 |      149082 |          86768
 volumes                               | volumes_id_state_key                                            |       95 | 80 kB      | 80 kB      | Y      |           40858 |       51197 |            248
 volumes                               | volumes_parent_id                                               |       95 | 80 kB      | 32 kB      | N      |          291333 |      136618 |          88065
 volumes                               | volumes_pkey                                                    |       95 | 80 kB      | 64 kB      | Y      |         1056284 |     1462850 |        1196720
 volumes                               | volumes_team_id                                                 |       95 | 80 kB      | 16 kB      | N      |               0 |           0 |              0
 volumes                               | volumes_worker_base_resource_type_id                            |       95 | 80 kB      | 16 kB      | N      |           25792 |       28140 |          25546
 volumes                               | volumes_worker_name                                             |       95 | 80 kB      | 16 kB      | N      |             438 |        5374 |           4996
 volumes                               | volumes_worker_name_handle_key                                  |       95 | 80 kB      | 136 kB     | Y      |              26 |         819 |            480
 volumes                               | volumes_worker_resource_cache_id                                |       95 | 80 kB      | 16 kB      | N      |             105 |         113 |            105
 volumes                               | volumes_worker_resource_cache_unique                            |       95 | 80 kB      | 32 kB      | Y      |          494705 |      494166 |         405634
 volumes                               | volumes_worker_task_cache_id                                    |       95 | 80 kB      | 16 kB      | N      |            1293 |        2057 |           1102
 volumes                               | worker_resource_cert_volumes_idx                                |       95 | 80 kB      | 16 kB      | N      |           29352 |      253036 |          29766
 wall                                  |                                                                 |        0 | 0 bytes    |            | N      |                 |             |               
 worker_artifacts                      | worker_artifacts_pkey                                           |        0 | 0 bytes    | 8192 bytes | Y      |               0 |           0 |              0
 worker_base_resource_types            | worker_base_resource_types_base_resource_type_id                |       42 | 16 kB      | 16 kB      | N      |               0 |           0 |              0
 worker_base_resource_types            | worker_base_resource_types_pkey                                 |       42 | 16 kB      | 16 kB      | Y      |             113 |         113 |            113
 worker_base_resource_types            | worker_base_resource_types_worker_name                          |       42 | 16 kB      | 16 kB      | N      |             464 |        6496 |              0
 worker_base_resource_types            | worker_base_resource_types_worker_name_base_resource_type_i_key |       42 | 16 kB      | 16 kB      | Y      |             337 |          99 |             85
 worker_resource_caches                | worker_resource_caches_pkey                                     |       30 | 8192 bytes | 16 kB      | Y      |          673762 |      601379 |         596388
 worker_resource_caches                | worker_resource_caches_resource_cache_id                        |       30 | 8192 bytes | 16 kB      | N      |             382 |         442 |            431
 worker_resource_caches                | worker_resource_caches_uniq                                     |       30 | 8192 bytes | 16 kB      | Y      |             792 |         409 |            408
 worker_resource_caches                | worker_resource_caches_worker_base_resource_type_id             |       30 | 8192 bytes | 16 kB      | N      |             112 |          17 |              0
 worker_resource_certs                 | worker_resource_certs_pkey                                      |        0 | 8192 bytes | 16 kB      | Y      |              11 |          11 |             11
 worker_task_caches                    | worker_task_caches_pkey                                         |        4 | 8192 bytes | 16 kB      | Y      |             610 |         610 |            610
 worker_task_caches                    | worker_task_caches_task_cache_id                                |        4 | 8192 bytes | 16 kB      | N      |              70 |          88 |             80
 worker_task_caches                    | worker_task_caches_worker_name                                  |        4 | 8192 bytes | 16 kB      | N      |              81 |         362 |            362
 worker_task_caches                    | worker_task_caches_worker_name_task_cache_id_uniq               |        4 | 8192 bytes | 16 kB      | Y      |            1406 |         854 |            828
 workers                               | constraint_workers_name_unique                                  |        2 | 8192 bytes | 16 kB      | Y      |            1070 |        1897 |           1058
 workers                               | workers_addr_key                                                |        2 | 8192 bytes | 16 kB      | Y      |               0 |           0 |              0
 workers                               | workers_pkey                                                    |        2 | 8192 bytes | 16 kB      | Y      |         1418662 |     1419633 |        1418314
 workers                               | workers_team_id                                                 |        2 | 8192 bytes | 16 kB      | N      |               0 |           0 |              0

@vito vito self-assigned this Jun 14, 2021
@vito vito moved this from To do to In flight in Roadmap Jun 14, 2021
@scottietremendous scottietremendous moved this from Current iteration to Done in Epic #6616 Jun 21, 2021
@mhindery
Copy link

@simonjohansson @aoldershaw @VengefulAncient

We were also wondering what the UTILITY COMMAND was, and found nothing about it elsewhere except this thread. I contacted Google support for this, and they got back to me with this explanation:

The `UTILITY COMMAND` is a classification for all queries not defined as `SELECT`, `UPDATE INSERT`, or `DELETE`, in Query Insights. 

This `UTILITY COMMAND` also includes maintenance operations like `VACUUM` and table commands like `CREATE`, `DROP`, and `COPY`.

So this is not an issue to be avoided, this is a group of different queries instead of the ones shown above and maintenance operations, all in just one place. 

@aoldershaw
Copy link
Contributor

Ah thanks for looking into that @mhindery. In that case, I have a suspicion as to what that might be from, and I think it is something we can possibly improve in Concourse.

My guess is that most of the UTILITY COMMAND calls are from how we run builds currently. Specifically, when we create a build, we CREATE a SEQUENCE for keeping track of the current event_id for build events (this event_id is used to number things like build logs). We create a sequence for every build (including all checks). When the build finishes, we DROP the SEQUENCE.

I suggested an alternative approach in #6820 (comment) - basically, rather than running CREATE SEQUENCE for each build, we create a separate table build_event_counters that contains the columns (build_id, event_id). To get the next event_id for a build, we could increment the event_id and return the new value. When the build finishes, we can DELETE the row.

I'm not totally sure if this would improve performance over the current approach of creating a sequence for each build, since this would be a really heavily run query - but it's an idea to optimize the UTILITY COMMAND.

@aoldershaw
Copy link
Contributor

My guess is that most of the UTILITY COMMAND calls are from how we run builds currently

Actually, I think that's part of it, but given that there are way more UTILITY COMMANDs than checks being created, it could be the NOTIFY build_events_<buildid> commands (which we invoke every time a build event is emitted to notify any listeners that there may be new e.g. log messages). Given that the vast majority of the time nobody is listening on the other end, this is extraordinarily wasteful.

@scottietremendous scottietremendous moved this from In flight to Done in Roadmap Jul 23, 2021
@antonysouthworth-halter
Copy link

We are experiencing similar behaviour on 7.5. In our case it's purely CPU bound (usage pinned to 100% at the higher region, down to around 50% in the second stage).

Screen Shot 2022-05-04 at 3 20 54 PM

Screen Shot 2022-05-04 at 3 21 19 PM

For this case it looks like most of the CPU is this query:

SELECT event_id, type, version, payload FROM team_build_events_4 WHERE (build_id = $1 OR build_id_old = $2) AND event_id > $3 ORDER BY event_id ASC LIMIT ?

@etiennechabert
Copy link

etiennechabert commented Jul 15, 2022

Regarding the query just above or similar (on our side the table is check_build_events)

SELECT event_id, type, version, payload FROM check_build_events WHERE (build_id = $1 OR build_id_old = $2) AND event_id > $3 ORDER BY event_id ASC LIMIT ?

The table check_build_events is having only one index:

CREATE UNIQUE INDEX check_build_events_build_id_event_id ON public.check_build_events USING btree (build_id, event_id)

This index can cover very well the following query:

SELECT event_id, type, version, payload FROM check_build_events WHERE build_id = $1 AND event_id > $3 ORDER BY event_id ASC LIMIT ?

⚠️ But the query posted above contains an OR: ...(build_id = $1 OR build_id_old = $2)... ⚠️

So, what will end-up happening on run-time is actually two queries running in parallel, the result of each will be merged to give the result, so:

  • First query, with build_id = $1 → Well covered by an index
  • Second query, with build_id_old = $2 → Full table scan, burning the CPU

Solution

Now the solution is very obvious and trivial, create the missing index:

CREATE UNIQUE INDEX check_build_events_build_id_old_event_id ON public.check_build_events USING btree (build_id_old, event_id)

As you can see below, this index is totally removing the painful query from performance insight

Before

Screenshot 2022-07-15 at 17 23 13

After

Screenshot 2022-07-15 at 17 23 28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Epic #6616
  
Done
Roadmap
  
Done
Development

No branches or pull requests