Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize check #418

Merged
merged 2 commits into from
Mar 16, 2024
Merged

Optimize check #418

merged 2 commits into from
Mar 16, 2024

Conversation

rmg
Copy link
Contributor

@rmg rmg commented Nov 21, 2023

YMMV, but with a large monorepo we saw:

  • 99% reduction in network traffic between the concourse workers and GHE server
  • 90% reduction in CPU usage on the server during clone/fetch

Details

check: use bare clone

The check container doesn't make any use of an actual checkout, but on
large repositories the time to checkout after fetching can be 90% or
more of the overall time to clone.

Because the --bare option prevents the --branch option from being
setting the refspec in the config we have to do it as a follow-up step.

Instead of relying solely on the config having the refspec set
correctly, we can also explicitly specify the branch when fetching if
one is set.

check: use blobless partial clones

The check operation doesn't need any blobs. The most detail is ever
needs about commits is the paths they touched, which only requires
commit and tree objects, not the blobs.

On large repositories, especially with lots of history, this can
dramatically reduce the time to clone and fetch while also dramatically
reducing the server-side resources used to serve up those large git
repositories.

The check container doesn't make any use of an actual checkout, but on
large repositories the time to checkout after fetching can be 90% or
more of the overall time to clone.

Because the --bare option prevents the --branch option from being
setting the refspec in the config we have to do it as a follow-up step.

Instead of relying solely on the config having the refspec set
correctly, we can also explicitly specify the branch when fetching if
one is set.

Signed-off-by: Ryann Graham <r.m.graham@gmail.com>
The check operation doesn't need any blobs. The most detail is ever
needs about commits is the paths they touched, which only requires
commit and tree objects, not the blobs.

On large repositories, especially with lots of history, this can
dramatically reduce the time to clone and fetch while also dramatically
reducing the server-side resources used to serve up those large git
repositories.

Signed-off-by: Ryann Graham <r.m.graham@gmail.com>
Copy link
Member

@taylorsilva taylorsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks for the PR and this AMAZING improvement!

I just merged another PR so I rebased your PR locally on master and ran the tests again to be sure everything still worked as expected. Everything so passed ✅

@taylorsilva taylorsilva merged commit 9c16470 into concourse:master Mar 16, 2024
2 checks passed
@marco-m-pix4d
Copy link

@rmg we went through a similar pain with big repos. HUGE thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants