Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shallow git clone to target location (and utilize cache as full repo source) #10074

Closed
mvorisek opened this issue Aug 23, 2021 · 10 comments
Closed

Comments

@mvorisek
Copy link
Contributor

mvorisek commented Aug 23, 2021

This is a feature request to improve git performace and save disk space with git repos.

Currently, all git cloning is done thru cache. This is perfect and it seems, as discussed in #3449 (and some other related issues), one full git clone is always needed. However the cached repo is 1:1 mirrored to the target install location, which requires twice the space and is time consuming with repos with long history or big removed files.

Example of such repos are:

  • git@github.com:PrestaShop/PrestaShop.git
  • git@github.com:phpstan/phpstan.git

This is a feature request to always determine the exact sha to clone againt the single local cached copy and then mirror/fetch the exactly sha to the install target location only.

Updating the target shallowly (eg. git fetch) is possible since git 2.5 (2015), ref https://stackoverflow.com/questions/14872486/retrieve-specific-commit-from-a-remote-git-repository/30701724#30701724

@Seldaek
Copy link
Member

Seldaek commented Aug 29, 2021

Since 2.1 we don't ever clone by default anymore, even for dev versions, so I don't think this is super urgent or beneficial as the amount of clones happening is pretty low, and I'd even argue if you do request a clone you most likely will want the full history of the repo to work with it, so it might even be counter productive to optimize this away.

@AngryUbuntuNerd
Copy link

AngryUbuntuNerd commented Dec 15, 2021

just ran into this problem, using composer 2.1.14

for example, installing phpunit/php-file-iterator version 1.4.5 takes 18M disk space, of which 16K (!) are the source code files, the rest 99,X% is .git

causing issues for our CI (and wastes space locally) as it sums up to 100s of megabytes

edit: turns out a composer update was enough in this case to fix it, all .git folders have disappeared

@Seldaek
Copy link
Member

Seldaek commented Jun 7, 2022

Closing as I don't see the point as per my comment above.

@Seldaek Seldaek closed this as not planned Won't fix, can't repro, duplicate, stale Jun 7, 2022
@computator
Copy link

I think this should be added to be enabled with an optional parameter. Many packages use dists yes, but for ones that only have a repository as a source it still uses the full git history in both the cache and the target. If using it with a dev environment where you want to edit the packages that's great, but for CI, building containers, or other things like that it's counterproductive to have the entire repository in the target directory.

If composer update or another command somehow removes the .git directories from the target (as mentioned above) that would work for some situations, however I have been unable to duplicate this.

@mvorisek
Copy link
Contributor Author

@Seldaek would it be ok with you to reopen this issue? Fo CI purposes, we need shallow clone - we have large dependencies and can access them thru git only. Currently, whole repo needs to be cloned, which is several orders of magnitude slower than shallow clone.

@Seldaek
Copy link
Member

Seldaek commented Sep 14, 2022

Sorry but I am not so inclined to bear the cost of maintenance for this edge case, and add even more complexity to GitDownloader (which is already a horrible mess of mostly untested edge cases).

I would recommend using Private Packagist or similar to host the private packages in a way that they can be installed as regular packages. It tends to make everything faster and smoother anyway, and should help more than just your CI.

@NickSdot
Copy link

NickSdot commented Oct 21, 2023

@Seldaek as others mentioned, this is not only about performance.

I'd like to add another example. cebe/php-openapi is using apis-guru/openapi-directory to get testing data. The actual data is ~105 MB, and .git/objects is ~740 MB. This version version was cloned in August.

image

To send this comment here I did a new clone to check how much it grew since then. Now, only about two months later, the .git/objects folder of the project now is 16 MB more, ~760 MB.

image

This is one single repo. Size adds up, when you work on and with a lot of stuff.
I also would appreciate if you would consider to re-open this issue. ❤️

@Seldaek
Copy link
Member

Seldaek commented Dec 18, 2023

@NickSdot and the reason there is a clone is that the packages are defined inline as having only a git source to install. If a dist was defined with the zipball URL from github then this dependency would be downloaded as a zip and all would be well. Again, I don't see the need to bear the cost of maintaining this because others have unreasonable expectations.

@mvorisek
Copy link
Contributor Author

@Seldaek I am the author of this issue and in some scenarios/repos, the size/time different can be even 100 times (realistically, theoretically nearly unlimited). In our usecase, it will reduce about 4 GB download to 40 MB. The improvement will be huge and it will save us also our SSD cells - we currently, sadly need to download/store the whole 4GB in CI daily many times.

I would be happy if this issue can be at least kept open, as it should be possible when the dep constraints are like dev-xxx (branch) or dev-xxx#hash as 1.0 (specific commit).

@mvorisek
Copy link
Contributor Author

@Seldaek can you please keep this issue open. I actually think this is solveable by using git sparse checkout to allow still clone all composer.json versions for all tags. This will massively lower the data needed to be downloaded for dependency solved. Once an installable version/commit is resolved, only this commit needs to be fully downloaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants