Skip to content

Improve git repo caching by cloning with `--reference` to a local `--mirror` clone #1323

Closed
patcon opened this Issue Nov 13, 2012 · 13 comments
@patcon
patcon commented Nov 13, 2012

As per #915, @Seldaek and @chEbba were interested in the idea of using git clone --reference and --mirror to cache git repositories and decrease build time.

More information is available on the git-clone man page:
http://git-scm.com/docs/git-clone

To be fair, the idea isn't my own and comes from how Drupal's drush CLI tool handles caching for Drupal modules pulled in from git repos:
https://github.com/drush-ops/drush/blob/master/commands/pm/package_handler/git_drupalorg.inc#L56-89

This appears to have been implemented by @msonnabaum, so perhaps he has input.

@patcon
patcon commented Nov 13, 2012

From the other issue:

@Seldaek said:

@patcon I don't think this would help much, but that's interesting though, maybe we should do all clones to a central location and then reference the repo like that if it already exists. Only problem would be to do reference counting somehow to avoid using space for repos that aren't used anywhere on disk anymore. If you'd like to create a new issue for this I'd be happy to discuss it further. It would mean much faster clones which would be pretty cool.

@chEbba said:

@patcon I think this feature can be implemented on CvsDownloader level. We used such feature in one of our deploy system where we had a local storage with git repositories (we didn't use alternates just clone from local one, but i thnik alternates is better solution). It really reduces build time.

@RobLoach

Would speed up build times in environments where source checkouts are made.

@cognifloyd

Right now, I cringe when I use composer. I know that the open source projects I'm cloning from have to pay for the bandwidth of my cloning over and over again, even though I'm just installing some software on multiple machines, or a package is a dependency of several different apps I'm installing on one machine. I look forward to the local cache of git repos.

@msonnabaum

The reference stuff is a bit complex and there are a few gotchas if it ever gets deleted.

My preference would be to clone everything in a central location and rsync the repos into place, similar to how the remote cache option works in capistrano:

https://github.com/capistrano/capistrano/blob/master/lib/capistrano/recipes/deploy/strategy/remote_cache.rb#L40

This would also solve an issue I've had with composer in that it leaves working copies in vendor, which cause problems if you need to commit your vendors to a deployment repo.

@RobLoach

Sticking in "preferred-install": "dist" makes Composer download packages rather than doing a git checkouts when available, which speeds up build times. It also caches the git repositories and packages it downloads in ~/.composer/cache. Would be neat to get --reference or --mirror working though.

From the sounds of it, it may be more work than expected though, taking the gotchas into mind that Mark pointed out. In the mean time, I do find build times have become much faster since the cleaned up package cache landed.

@fazy
fazy commented May 2, 2013

I'm using a few dev-master packages, currently can't install my application because github.com/sonata-project/SonataBlockBundle is offline (see status.github.com).

Will the "preferred-install": "dist" setting above fix this, or does it only work for tagged releases? (Note: can't test it now until Github fix their storage server...)

[Edit: maybe I can work around this by forking every repo I'm interested in, and making my own tags in the fork...]
[Edit2: back online, quick test suggests "preferred-install": "dist" doesn't help]

@stof
stof commented May 2, 2013

@tazy as dists are also coming from github, there is a chance that the archives of the repo are also affected by the storage outage.
but dists also work for dev versions as github builds archives for every commit

@weitzman

Yeah, I'll second what @msonnabaum said, rsync is a better choice here than -git's -reference which breaks if the source ever disappears.

@glensc
glensc commented Jun 17, 2014

if source disappears you can git repack according to this post (read comments):

http://randyfay.com/content/git-clone-reference-considered-harmful

@jamesj2
jamesj2 commented Sep 16, 2014

I would like to either be able to keep a copy of a git repo if I'm using a specific release or have the ability to set preferred-install per package. There is a windows path limitation that prevents me from using "preferred-install": "dist" with zendframework1. I'm forced to use "preferred-install": "source" and with such a big repo it really slows down the build process. Not only that other packages have to use the source method.

@alexislefebvre

I had the same idea while seeing that Travis CI clone git repositories for each build, which takes a lot of time. The Travis CI cache could be used to store these git mirrors and it would greatly speed up Composer.

Will this idea be implemented in the future? Thanks.

@barryvdh

In that case it would be better to download archives, without using the api as discussed in #4737 and cache that.

@Seldaek
Composer member
Seldaek commented Apr 15, 2016

Closing in favor of #3722

@Seldaek Seldaek closed this Apr 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.