Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First request to .../builds extremely slower than others for job with huge build history #3754

Open
xtremerui opened this issue Apr 18, 2019 · 6 comments

Comments

@xtremerui
Copy link
Contributor

xtremerui commented Apr 18, 2019

As shown in the gif (first 15sec are cut), when I go to https://ci.concourse-ci.org/teams/examples/pipelines/time-triggered/jobs/job/builds/200845

The first request to ../builds endpoint took about 17s, comparing of which to those subsequent paginated calls to the same endpoint shows it could be as 5x slower (17s vs 2s vs 3s).

Apr-18-2019 12-46-22

For a job that has less builds(e.g. couple hundreds) the first call takes around only X0ms to X00ms and the paginated calls show no big difference.

So there are two questions here:

  1. Why is the call to ../builds that slow for the job who has huge build history? Since it will fetch first 100 builds if range is not specified, which should be technically the same as a request against a job with less builds.

  2. Why the paginated request to ../builds are much faster (relatively) than the first one?

@stale
Copy link

stale bot commented Jul 16, 2019

Beep boop! This issue has been idle for long enough that it's time to check
in and see if it's still important.

If it is, what is blocking it? Would anyone be interested in submitting a
PR
or
continuing the discussion to help move things forward?

If no activity is observed within the next week, this issue will be
exterminated closed, in accordance with our stale issue
process
.

@stale stale bot added the wontfix label Jul 16, 2019
@stale stale bot closed this as completed Jul 23, 2019
@cirocosta
Copy link
Member

cirocosta commented Sep 30, 2019

Hey,

I'm reopening this one as this is still a thing.

Thanks!

@cirocosta cirocosta reopened this Sep 30, 2019
@stale stale bot removed the wontfix label Sep 30, 2019
@stale
Copy link

stale bot commented Nov 29, 2019

Beep boop! This issue has been idle for long enough that it's time to check in and see if it's still important.

If it is, what is blocking it? Would anyone be interested in submitting a PR or continuing the discussion to help move things forward?

If no activity is observed within the next week, this issue will be exterminated closed, in accordance with our stale issue process.

@stale stale bot added the wontfix label Nov 29, 2019
@stale stale bot closed this as completed Dec 6, 2019
@marco-m-pix4d
Copy link
Contributor

Reopening as this is still happening at least with Concourse 7.9.1.
Will report if we still see this problem when upgrading to a more recent Concourse.

@marco-m-pix4d marco-m-pix4d reopened this Jan 23, 2024
@xtremerui
Copy link
Contributor Author

@marco-m-pix4d how big is your builds table? have you tried to manually clean up not needed rows in there?

@odormond
Copy link
Contributor

@marco-m-pix4d how big is your builds table? have you tried to manually clean up not needed rows in there?

We have this many rows in our DB:

teams pipelines jobs resources builds build_comments
5 365 7806 8398 758906 47

We run concourse with:

  • --default-build-logs-to-retain=50
  • --max-build-logs-to-retain=1000

I'm not aware of pipelines overriding the default retention setting but I will not rule it out. :-D

I was able to pinpoint the query (as seen from postgres) to:

SELECT b.id, b.name,
       b.job_id,
       b.resource_id,
       b.resource_type_id,
       b.team_id,
       b.status,
       b.manually_triggered,
       b.created_by,
       b.scheduled,
       b.schema,
       b.private_plan,
       b.public_plan,
       b.create_time,
       b.start_time,
       b.end_time,
       b.reap_time,
       j.name,
       r.name,
       b.pipeline_id,
       p.name,
       p.instance_vars,
       t.name,
       b.nonce,
       b.drained,
       b.aborted,
       b.completed,
       b.inputs_ready,
       b.rerun_of,
       rb.name,
       b.rerun_number,
       b.span_context,
       COALESCE(bc.comment, '')
FROM builds b
LEFT OUTER JOIN jobs j ON b.job_id = j.id
LEFT OUTER JOIN resources r ON b.resource_id = r.id
LEFT OUTER JOIN pipelines p ON b.pipeline_id = p.id
LEFT OUTER JOIN teams t ON b.team_id = t.id
LEFT OUTER JOIN builds rb ON rb.id = b.rerun_of
LEFT OUTER JOIN build_comments bc ON b.id = bc.build_id
WHERE b.pipeline_id = $1 AND b.id < $2
ORDER BY COALESCE(b.rerun_of, b.id) DESC, b.id DESC
LIMIT 1;

For the first build id of the pipeline, the b.id < $2 forces the query to scan the cross product of the tables and due to the builds showing up twice (as b and rb) it has to scan $O(N_{builds}^2)$ rows which makes it blow up the roof with the query taking ~1.5 seconds and sometime up to 4 seconds.
For the second build, it will stop on the first one due to the LIMIT 1. In that case, it only takes ~10ms to run the query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants