Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Concourse database hits 100% CPU usage #3948
We have some badly behaved custom resources that generate a new UUID version each time a resource check is done. Obviously we need to fix this at some point, but up till now we haven't come up with a clean solution for this.
Once the number of resource versions hits a certain threshold the database CPU goes up to 100%.
In the previous versions of concourse we could get around this in a fairly hacky way by deleting
After the version 5.1.0 upgrade we are experiencing similar problems, but caused by a different set of queries related to the retrieval of resource versions.
My first question would be if there is a way to safely delete resource versions in the new database structure the same way we did previously.
My second question is whether it would be possible to optimize these queries to handle the resource version volumes we are throwing at it. I can foresee that someone else might have a real use case for hitting these.
We also tried adding some additional indices to replace sequential scans that the PostgreSQL optimizer was choosing, with parallel index scans. But that did not alleviate the problem.
We also tried VACUUM, ANALYZE and REINDEXING to ensure that the optimal query plan was chosen. Running ANALYZE would fix the issue up to a certain row number threshold.
The queries that cause the CPU spikes in Concourse 5.1.0 are the following ones:
The number of rows in the relevant tables are as follows:
Steps to Reproduce
Write a custom resource that generates a new UUID based version with every check and do those resource checks lots of times across many pipelines.
For CPU usage to stay within normal bounds.
CPU usage on the PostgreSQL database hovers around 100% and everything slows to a crawl.