-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared repositories (git mirrors) between checkouts #936
Conversation
bootstrap/bootstrap.go
Outdated
|
||
// if we have a reference, add the submodule to it | ||
if reference != "" { | ||
name := fmt.Sprintf("submodule%d", idx+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might require some more thought when submodules change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change look awesome!
We have some large, slow-to-clone repos in some of our pipelines, and we would love to have this. To the point where I was about two days away from building it myself before we spoke about this PR. :)
@@ -78,6 +84,7 @@ func NewBootstrapTester() (*BootstrapTester, error) { | |||
"HOME=" + homeDir, | |||
"BUILDKITE_BIN_PATH=" + pathDir, | |||
"BUILDKITE_BUILD_PATH=" + buildDir, | |||
"BUILDKITE_REPOS_PATH=" + reposDir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about BUILDKITE_REPOS_PATH
vs BUILDKITE_REPO_MIRROR_PATH
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main NIT: I found variable naming somewhat ambiguous.
Should it be Repo
vs Repos
, repository
vs repositories
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGreatTM.
How does this interact with running multiple agent processes per host via
? I would prefer that there be a single reference of each repo per host as opposed to per BK agent instance, to save disk space. |
Another quick thing comes to mind - since it's probably not safe to assume that you can guarantee that the mirrored repos never get any unexpected mutations, then it's probably safer to also include the |
Also, though this doesn't necessarily reflect on everyone's experience (since this particular repo is very large), I did some git benchmarking of clones of our worst-performing repo using various different sets of
Again, this is not intended to be a scientific study of the impacts of these flags, but in our case, this is the scale of change that we see, and highlights why we want this change so much. :) |
That data is super helpful @DoomGerbil! |
@DoomGerbil I think I agree that |
@petemounce, yup that's the design! |
After some discussion @keithpitt and I ended up on |
@harrietgrace I might need some assistance from you with updating the docs for this at some point! |
Is it an opt-in config option at the moment? If so, why don’t we remove it from experiments? |
@toolmantim we've done that in that past. I think the big downside is that we miss out on the really clear "this is super experimental" . This feature will likely be flakey for a month or two whilst we work through the edge cases, so I feel like having to strongly opt-in to the experimental nature helps manage expectations. The alternative is releasing a beta agent, but I'd much prefer to keep iterating quickly on stable with new things behind experiment flags. |
Okay! Probably shouldn’t be changelogged or added to docs yet then? |
@toolmantim any reason we can't mention an experimental feature in a changelog? It's something heaps of people are interested in and would be great to get feedback on. We mention other beta features there? |
Oh, it sounded this was “get out a sneaky build to test if this actually works as designed” and I was assuming you had enough people to test it. And then after we’re confident it works, turn off the experimental flag in a new point release and changelog. I thought a changelog announcement would be a good push for making sure we move it out of experimental. But perhaps we just do a changelog for every new agent release? And we can list the experimental changes in there, alongside other fixes. |
Sorry, I should have made my question clearer!
I was wondering why it couldn’t be removed from experiments in a point release once we’ve tested it, and then enabled by default in v4? Related q for making it the default: would we have a default calculated value for |
Sure, quite possibly! I guess it depends how it goes?
Nope, I don't think we would calculate a default value. We don't for |
@DoomGerbil I'm going to remove |
https://github.blog/2015-02-06-git-2-3-has-been-released/ says it's been there for four years now, but the point about LFS is well taken. We don't support LFS with the rest of our infrastructure, so we would probably add the flag to our configuration, since as you said, it's not hard to tack it on ourselves. |
The bit I'm still very blurry on is how the submodule support will work in practice. |
We've decided to drop reference clone support for submodules for the first pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
Just an FYI, we've been doing this for a little while with a custom |
@silvamerica yeah, I suspect we might end up with it being a default, but it brings with it a quite recent git dependency, so going to see how it goes. It's only really an issue if you plan on deleting the git bare repositories, isn't it? |
This is another take on #917 that should hopefully be less controversial. Rather than introducing a new phase, this creates a shared repository mirror that is then used for reference cloning the subsequent checkout. This should radically reduce network usage for cloning big repositories. Otherwise the semantics are the same as without a reference clone, no new checkout dirs or other complexities.
This introduces an experiment,
git-mirrors
, and several new configuration options:git-mirrors-path
: a directory that contains the git mirrors shared by agents on a give machine, unique to a given repository addressgit-clone-mirrors-flags
: the parameters passed togit clone
for the mirroring portion of the checkout. Defaults to-v --mirror
.git-mirrors-lock-timeout
: the number of seconds to wait for a lock to expire on a checkout that has crashed. Needs to be greater than the maximum time for a git clone on the agent.It also introduces the idea of running the integration tests with an experiment enabled, which whilst potentially could end up with an explosion of permutations, for the minute it allows us to test checkouts with and without the
git-mirrors
experiment.To test this out, you can run your agent with:
My plan is to merge this and release it in
3.10.0
, with the intention of making it the default and removing the experiment wrapper in4.0.0
.