Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: speed up all clone operations by loading all commits compressed into 1 #2348

Closed
wants to merge 1 commit into from

Conversation

nikelborm
Copy link

@nikelborm nikelborm commented Jan 24, 2024

I added --depth 1 to all git clone commands without --single-branch or --depth to download latest state of the repo without all of the commit history.

--single-branch is implied when using --depth according to git manual.

--depth <depth>
Create a shallow clone with a history truncated to the specified number of commits. Implies --single-branch unless --no-single-branch is given to fetch the histories near the tips of all branches. If you want to clone submodules shallowly, also pass --shallow-submodules.

That's why --depth 1 replaced --single-branch instead of leaving both.

My big problem with yay was is that it either clones entire repo (commands without both --depth 1 and --single-branch) or clones entire main branch with changes that have no affection to current state of the repo being cloned (with --single-branch). If one function/file/class was created once and then deleted, it may not be present in the current filesystem, but still will be present in commits and will be passed to user over the internet during clone operation. --depth 1 solves it.

--depth 1 fixes the problem and loads only default branch of the repo and only the latest it's state. It basically squashes all commits into one.

I don't know go although know grep 😅, so don't judge too harshly if I did something wrong.

There are loads and loads of packages that take a long time to download. Example of such package in AUR is amf-headers-git. It references this AMF repo.

Cloning of AMF repo even over a consistent ssh channel takes SIGNIFICANTLY more time and disk space than doing so with --depth 1.

Here is benchmark (cloning speed in git logs represents only the speed in last seconds)

$ time git clone git@github.com:GPUOpen-LibrariesAndSDKs/AMF.git

Cloning into 'AMF'...
remote: Enumerating objects: 7632, done.
remote: Counting objects: 100% (2425/2425), done.
remote: Compressing objects: 100% (1138/1138), done.
remote: Total 7632 (delta 1260), reused 2390 (delta 1247), pack-reused 5207
Receiving objects: 100% (7632/7632), 848.69 MiB | 1.59 MiB/s, done.
Resolving deltas: 100% (4242/4242), done.
Updating files: 100% (5085/5085), done.

real	8m58.996s
user	0m36.857s
sys	0m16.024s


$ du -s AMF
1481388	AMF

$ time git clone --depth 1 git@github.com:GPUOpen-LibrariesAndSDKs/AMF.git amf_faster_smaller

Cloning into 'amf_faster_smaller'...
remote: Enumerating objects: 1923, done.
remote: Counting objects: 100% (1923/1923), done.
remote: Compressing objects: 100% (1504/1504), done.
remote: Total 1923 (delta 675), reused 1260 (delta 384), pack-reused 0
Receiving objects: 100% (1923/1923), 142.80 MiB | 1.37 MiB/s, done.
Resolving deltas: 100% (675/675), done.
Updating files: 100% (5085/5085), done.

real	1m49.330s
user	0m7.058s
sys	0m3.369s


du -s amf_faster_smaller
758384	amf_faster_smaller

Cloning repo without --depth 1 takes 8*60 + 58 = 538 seconds
Cloning repo with --depth 1 takes 1*60 + 49 = 109 seconds
Repo cloned without --depth 1 takes 1481388 bytes
Repo cloned with --depth 1 takes 758384 bytes

We get 100-(758384 / 1481388)*100=48.8% decreasing in space on disk
We get 100-(109 / 538)*100=79.7% decreasing in cloning time

@Jguer
Copy link
Owner

Jguer commented Jan 25, 2024

@nikelborm this does not implement shallow cloning for a repo like git@github.com:GPUOpen-LibrariesAndSDKs/AMF.git but rather to the extremely small PKGBUILD repos.

The git clone of, for example, git@github.com:GPUOpen-LibrariesAndSDKs/AMF.git is executed directly by makepkg

@Jguer Jguer closed this Jan 25, 2024
@nikelborm
Copy link
Author

nikelborm commented Jan 25, 2024

If anybody wants to see a follow up, here I am.
I opened pull request in makepkg: https://gitlab.archlinux.org/pacman/pacman/-/merge_requests/115
Thank you, @Jguer, for pointing me in the right direction

@nikelborm
Copy link
Author

Although I still don't understand why you rejected it if it almost doesn't add more complexity and may make a difference on slower internet connections than mine

@Jguer
Copy link
Owner

Jguer commented Jan 25, 2024

#972 (comment)

From testing, this does not make cloning faster as the repos are very small and may actually make other operations such as diffing either not work or slower due to just in time fetching

@HaleTom
Copy link

HaleTom commented Feb 11, 2024

@nikelborm Thanks for raising the Arch PR. I can't comment there as I'm not a user, but it seems that you're getting a very strong hint there from Levente to:

s/--depth 1/--filter=blob:none/g

This will download the metadata for all commits / tags / refs (which is small and in a single pack file anyway), but only download the data for the current commit. It's the best of both worlds allowing previous commits to be checked out (in the unlikely but highlighted cases that that's required).

@anthraxx
Copy link

@HaleTom Yes, that is the key takeaway.
First and foremost its about not breaking expected functionality, as using --depth in this context is harmful. We expect certain functionality as none optional like git describe which is very frequently used in pkgver functions, as well as git cherry-pick which is often used in prepare functions to backport patches on top of a release.

@HaleTom
Copy link

HaleTom commented Feb 26, 2024

The Arch Wiki now gives an example of how to use GITFLAGS="--filter=tree:0" makepkg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants