Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for partial clones #264

Merged
merged 4 commits into from
Jan 9, 2024

Conversation

mathomp4
Copy link
Member

@mathomp4 mathomp4 commented Jan 9, 2024

This PR adds support for partial clones with mepo clone. A new option --partial has two settings:

  • blobless: performs a blobless clone via git clone --filter=blob:none
  • treeless: performs a treeless clone via git clone --filter=tree:0

The motivation for this is that it has been noticed that clones of GEOS (mainly MAPL) are often extremely slow. For example:

❯ time git clone git@github.com:GEOS-ESM/MAPL.git MAPL-normal
Cloning into 'MAPL-normal'...
remote: Enumerating objects: 704212, done.
remote: Counting objects: 100% (271489/271489), done.
remote: Compressing objects: 100% (6882/6882), done.
remote: Total 704212 (delta 269709), reused 266119 (delta 264584), pack-reused 432723
Receiving objects: 100% (704212/704212), 997.08 MiB | 2.59 MiB/s, done.
Resolving deltas: 100% (694344/694344), done.
noglob git clone git@github.com:GEOS-ESM/MAPL.git MAPL-normal  95.46s user 13.88s system 24% cpu 7:17.37 total

This took over 7 minutes to clone!

However, git supports partial clones as detailed in this GitHub Blog post. Now, of the two, blobless clones are fairly safe and give you faster initial clone speed at the cost of slower operations after that. As a test:

❯ time git clone --filter=blob:none git@github.com:GEOS-ESM/MAPL.git MAPL-blobless-from-git
Cloning into 'MAPL-blobless-from-git'...
remote: Enumerating objects: 21299, done.
remote: Counting objects: 100% (3171/3171), done.
remote: Compressing objects: 100% (1324/1324), done.
remote: Total 21299 (delta 1972), reused 2989 (delta 1829), pack-reused 18128
Receiving objects: 100% (21299/21299), 20.80 MiB | 2.99 MiB/s, done.
Resolving deltas: 100% (13343/13343), done.
remote: Enumerating objects: 942, done.
remote: Counting objects: 100% (552/552), done.
remote: Compressing objects: 100% (500/500), done.
remote: Total 942 (delta 112), reused 126 (delta 50), pack-reused 390
Receiving objects: 100% (942/942), 1.85 MiB | 136.00 KiB/s, done.
Resolving deltas: 100% (249/249), done.
Updating files: 100% (1064/1064), done.
noglob git clone --filter=blob:none git@github.com:GEOS-ESM/MAPL.git   1.38s user 0.54s system 6% cpu 27.609 total

28 seconds!

Treeless clones are usually faster than blobless as you aren't just filtering out blobs but whole trees. But per the blog:

We strongly recommend that developers do not use treeless clones for their daily work. Treeless clones are really only helpful for automated builds when you want to quickly clone, compile a project, then throw away the repository. In environments like GitHub Actions using public runners, you want to minimize your clone time so you can spend your machine time actually building your software! Treeless clones might be an excellent option for those environments.

As there are possible scenarios with CI that this could be useful for, the option is added. As for speed:

❯ time git clone --filter=tree:0 git@github.com:GEOS-ESM/MAPL.git MAPL-treeless
Cloning into 'MAPL-treeless'...
remote: Enumerating objects: 6875, done.
remote: Counting objects: 100% (843/843), done.
remote: Compressing objects: 100% (730/730), done.
remote: Total 6875 (delta 124), reused 805 (delta 113), pack-reused 6032
Receiving objects: 100% (6875/6875), 2.24 MiB | 277.00 KiB/s, done.
Resolving deltas: 100% (757/757), done.
remote: Enumerating objects: 106, done.
remote: Counting objects: 100% (70/70), done.
remote: Compressing objects: 100% (66/66), done.
remote: Total 106 (delta 1), reused 19 (delta 0), pack-reused 36
Receiving objects: 100% (106/106), 37.34 KiB | 538.00 KiB/s, done.
Resolving deltas: 100% (3/3), done.
remote: Enumerating objects: 942, done.
remote: Counting objects: 100% (552/552), done.
remote: Compressing objects: 100% (500/500), done.
remote: Total 942 (delta 112), reused 126 (delta 50), pack-reused 390
Receiving objects: 100% (942/942), 1.85 MiB | 2.09 MiB/s, done.
Resolving deltas: 100% (249/249), done.
Updating files: 100% (1064/1064), done.
noglob git clone --filter=tree:0 git@github.com:GEOS-ESM/MAPL.git   0.45s user 0.25s system 4% cpu 15.950 total

16 seconds!

Along with this option, we also add a new .mepoconfig setting where one can add:

[clone]
partial = blobless

and blobless clones will be the default.

@mathomp4 mathomp4 added the enhancement New feature or request label Jan 9, 2024
@mathomp4 mathomp4 requested a review from a team as a code owner January 9, 2024 20:55
@mathomp4 mathomp4 merged commit 7507e12 into develop Jan 9, 2024
13 checks passed
@mathomp4 mathomp4 deleted the feature/mathomp4/add-partial-clones branch January 9, 2024 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant