Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared repositories (git mirrors) between checkouts #936

Merged
merged 22 commits into from
Mar 12, 2019
Merged

Conversation

lox
Copy link
Contributor

@lox lox commented Mar 3, 2019

This is another take on #917 that should hopefully be less controversial. Rather than introducing a new phase, this creates a shared repository mirror that is then used for reference cloning the subsequent checkout. This should radically reduce network usage for cloning big repositories. Otherwise the semantics are the same as without a reference clone, no new checkout dirs or other complexities.

This introduces an experiment, git-mirrors, and several new configuration options:

  • git-mirrors-path: a directory that contains the git mirrors shared by agents on a give machine, unique to a given repository address
  • git-clone-mirrors-flags: the parameters passed to git clone for the mirroring portion of the checkout. Defaults to -v --mirror.
  • git-mirrors-lock-timeout: the number of seconds to wait for a lock to expire on a checkout that has crashed. Needs to be greater than the maximum time for a git clone on the agent.

It also introduces the idea of running the integration tests with an experiment enabled, which whilst potentially could end up with an explosion of permutations, for the minute it allows us to test checkouts with and without the git-mirrors experiment.

To test this out, you can run your agent with:

$  buildkite-agent start --experiment git-mirrors --git-mirrors-path /tmp/buildkite-git-mirrors

My plan is to merge this and release it in 3.10.0, with the intention of making it the default and removing the experiment wrapper in 4.0.0.

@lox lox requested a review from a team March 3, 2019 23:14

// if we have a reference, add the submodule to it
if reference != "" {
name := fmt.Sprintf("submodule%d", idx+1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might require some more thought when submodules change.

Copy link
Contributor

@DoomGerbil DoomGerbil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change look awesome!

We have some large, slow-to-clone repos in some of our pipelines, and we would love to have this. To the point where I was about two days away from building it myself before we spoke about this PR. :)

agent/job_runner.go Outdated Show resolved Hide resolved
bootstrap/bootstrap.go Outdated Show resolved Hide resolved
bootstrap/bootstrap.go Outdated Show resolved Hide resolved
@@ -78,6 +84,7 @@ func NewBootstrapTester() (*BootstrapTester, error) {
"HOME=" + homeDir,
"BUILDKITE_BIN_PATH=" + pathDir,
"BUILDKITE_BUILD_PATH=" + buildDir,
"BUILDKITE_REPOS_PATH=" + reposDir,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about BUILDKITE_REPOS_PATH vs BUILDKITE_REPO_MIRROR_PATH.

clicommand/agent_start.go Outdated Show resolved Hide resolved
Copy link
Contributor

@gavinelder gavinelder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main NIT: I found variable naming somewhat ambiguous.

Should it be Repo vs Repos, repository vs repositories ?

Copy link
Contributor

@petemounce petemounce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGreatTM.

agent/agent_configuration.go Outdated Show resolved Hide resolved
agent/job_runner.go Outdated Show resolved Hide resolved
bootstrap/bootstrap.go Outdated Show resolved Hide resolved
bootstrap/bootstrap.go Outdated Show resolved Hide resolved
bootstrap/config.go Outdated Show resolved Hide resolved
clicommand/bootstrap.go Outdated Show resolved Hide resolved
@petemounce
Copy link
Contributor

How does this interact with running multiple agent processes per host via

  • separate processes (e.g. systemd template-unit based vs unit based)
  • --spawn n

?

I would prefer that there be a single reference of each repo per host as opposed to per BK agent instance, to save disk space.

@DoomGerbil
Copy link
Contributor

Another quick thing comes to mind - since it's probably not safe to assume that you can guarantee that the mirrored repos never get any unexpected mutations, then it's probably safer to also include the --dissociate flag to git clone as well as --reference. Otherwise, the reference clone will mostly just hard-link to the mirror clone, which can cause issues if any mutations happen to the mirror clone.

@DoomGerbil
Copy link
Contributor

DoomGerbil commented Mar 8, 2019

Also, though this doesn't necessarily reflect on everyone's experience (since this particular repo is very large), I did some git benchmarking of clones of our worst-performing repo using various different sets of git clone flags.

# Totally normal clone
time git clone git@github.com:improbableio/UnrealEngine.git \
std-clone
	Time: 9m21.282s
	Size: 3.3GB

# Mirror clone
time git clone git@github.com:improbableio/UnrealEngine.git \
--mirror \
std-clone
	Time: 9m1.757s
	Size: 2.1GB

# Standard clone with a local reference
time git clone git@github.com:improbableio/UnrealEngine.git \
--reference std-clone \
ref-clone
	Time: 0m30.103s
	Size: 1.2GB

# Shallow clone from remote, no reference
time git clone git@github.com:improbableio/UnrealEngine.git \
--depth 1 \
shallow-clone
	Time: 1m16.511s
	Size: 1.4GB

# Standard clone with a local reference, dissociated
time git clone git@github.com:improbableio/UnrealEngine.git \
--reference std-clone \
--dissociate \
ref-dis-clone
	Time: 1m1.195s
	Size: 3.3GB

# Clone with a local reference, dissociated, no post-clone checkout
time git clone git@github.com:improbableio/UnrealEngine.git \
--reference std-clone \
--dissociate \
--no-checkout \
ref-dis-nocheckout-clone
	Time: 0m26.485s
	Size: 2.1GB

# Shallow clone with a local reference, dissociated, no post-clone checkout
time git clone git@github.com:improbableio/UnrealEngine.git \
--reference std-clone \
--dissociate \
--no-checkout \
--depth 1 \
ref-dis-nocheckout-shallow-clone
	Time: 0m19.503s
	Size: 1.5GB

# Shallow clone with a local reference, dissociated
time git clone git@github.com:improbableio/UnrealEngine.git \
--reference std-clone \
--dissociate \
--depth 1 \
ref-dis-shallow-clone
	Time: 0m35.618s
	Size: 2.6GB

# Shallow clone with a local reference, no post-clone checkout
time git clone git@github.com:improbableio/UnrealEngine.git \
--reference std-clone \
--no-checkout \
--depth 1 \
ref-nocheckout-shallow-clone
	Time: 0m2.188s
	Size: 696K

Again, this is not intended to be a scientific study of the impacts of these flags, but in our case, this is the scale of change that we see, and highlights why we want this change so much. :)

@lox
Copy link
Contributor Author

lox commented Mar 9, 2019

That data is super helpful @DoomGerbil!

@lox
Copy link
Contributor Author

lox commented Mar 9, 2019

@DoomGerbil I think I agree that --dissociate is a sensible default.

@lox
Copy link
Contributor Author

lox commented Mar 9, 2019

I would prefer that there be a single reference of each repo per host as opposed to per BK agent instance, to save disk space.

@petemounce, yup that's the design!

@lox
Copy link
Contributor Author

lox commented Mar 10, 2019

After some discussion @keithpitt and I ended up on BUILDKITE_GIT_MIRRORS_PATH.

@lox lox added the wip label Mar 10, 2019
@lox lox changed the title Shared repositories between checkouts Shared repositories (git mirrors) between checkouts Mar 10, 2019
@lox
Copy link
Contributor Author

lox commented Mar 10, 2019

@harrietgrace I might need some assistance from you with updating the docs for this at some point!

@toolmantim
Copy link
Contributor

Is it an opt-in config option at the moment? If so, why don’t we remove it from experiments?

@lox
Copy link
Contributor Author

lox commented Mar 10, 2019

@toolmantim we've done that in that past. I think the big downside is that we miss out on the really clear "this is super experimental" . This feature will likely be flakey for a month or two whilst we work through the edge cases, so I feel like having to strongly opt-in to the experimental nature helps manage expectations.

The alternative is releasing a beta agent, but I'd much prefer to keep iterating quickly on stable with new things behind experiment flags.

@toolmantim
Copy link
Contributor

Okay! Probably shouldn’t be changelogged or added to docs yet then?

@lox
Copy link
Contributor Author

lox commented Mar 10, 2019

@toolmantim any reason we can't mention an experimental feature in a changelog? It's something heaps of people are interested in and would be great to get feedback on. We mention other beta features there?

@toolmantim
Copy link
Contributor

Oh, it sounded this was “get out a sneaky build to test if this actually works as designed” and I was assuming you had enough people to test it. And then after we’re confident it works, turn off the experimental flag in a new point release and changelog. I thought a changelog announcement would be a good push for making sure we move it out of experimental.

But perhaps we just do a changelog for every new agent release? And we can list the experimental changes in there, alongside other fixes.

@toolmantim
Copy link
Contributor

toolmantim commented Mar 10, 2019

Sorry, I should have made my question clearer!

My plan is to merge this and release it in 3.10.0, with the intention of making it the default and removing the experiment wrapper in 4.0.0.

I was wondering why it couldn’t be removed from experiments in a point release once we’ve tested it, and then enabled by default in v4?

Related q for making it the default: would we have a default calculated value for --git-mirrors-path, or it’d error if people didn’t update their config, or it’s just be disabled if people didn’t update their config?

@lox
Copy link
Contributor Author

lox commented Mar 10, 2019

I was wondering why it couldn’t be removed from experiments in a point release once we’ve tested it, and then enabled by default in v4?

Sure, quite possibly! I guess it depends how it goes?

Related q: would we have a default calculated value for --git-mirrors-path, or it’d error if people didn’t update their config, or it’s just be disabled if people didn’t update their config?

Nope, I don't think we would calculate a default value. We don't for plugins-path for instance. In v4 we'll change the packaging to set a default in the configs and during installation.

@lox
Copy link
Contributor Author

lox commented Mar 10, 2019

@DoomGerbil I'm going to remove --dissociate from the defaults (it's presently accidentally on the mirror clone anyway). It's pretty recently added to git and it also causes problems for git LFS. It's easy to add it to your git-clone-flags if you want it!

@DoomGerbil
Copy link
Contributor

@DoomGerbil I'm going to remove --dissociate from the defaults (it's presently accidentally on the mirror clone anyway). It's pretty recently added to git and it also causes problems for git LFS. It's easy to add it to your git-clone-flags if you want it!

https://github.blog/2015-02-06-git-2-3-has-been-released/ says it's been there for four years now, but the point about LFS is well taken. We don't support LFS with the rest of our infrastructure, so we would probably add the flag to our configuration, since as you said, it's not hard to tack it on ourselves.

@lox
Copy link
Contributor Author

lox commented Mar 11, 2019

The bit I'm still very blurry on is how the submodule support will work in practice.

@lox
Copy link
Contributor Author

lox commented Mar 12, 2019

We've decided to drop reference clone support for submodules for the first pass.

Copy link
Member

@keithpitt keithpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@lox lox merged commit 7d24aec into master Mar 12, 2019
@lox lox deleted the shared-repositories branch March 12, 2019 05:59
@silvamerica
Copy link

Just an FYI, we've been doing this for a little while with a custom elastic_bootstrap script to create the mirror and git clone flags to --reference-if-able and we had immediate issues with the cloned repo missing refs before adding --disassociate.

@lox
Copy link
Contributor Author

lox commented Apr 11, 2019

@silvamerica yeah, I suspect we might end up with it being a default, but it brings with it a quite recent git dependency, so going to see how it goes. It's only really an issue if you plan on deleting the git bare repositories, isn't it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants