Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git: Add support for sparse checkout #685

Open
krish2718 opened this issue Sep 20, 2023 · 5 comments
Open

git: Add support for sparse checkout #685

krish2718 opened this issue Sep 20, 2023 · 5 comments
Labels
enhancement New feature or request performance How long things take

Comments

@krish2718
Copy link

When west is used to pull in dependent files, currently we need to do a full repo clone, this is unnecessary and slow. If we can add git's sparse checkout feature we can extend west.yml to take a list of files/folders with regex and only checkout those saving time and bandwidth.

E.g., Manifest project A, wants a specific shared header file in project B and nothing else.

@krish2718
Copy link
Author

krish2718 commented Sep 20, 2023

I haven't tried but looked at the docs, path support to filter has been removed :

Note that the form --filter=sparse:path= that wants to read from an arbitrary path on the filesystem has been dropped for security reasons.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 20, 2023

path support to filter has been removed

Bummer.

I was curious so I took at quick look at what this would take. This would be tricky because the sparse-checkout configuration is git repo specific and should happen between git fetch and git checkout inside west update. So the sparse-checkout configuration would have to be preset in some configuration file, maybe in some new west config section?

west does not use git clone for... other optimization reasons (!) so git clone --no-checkout is not an option. git update-ref HEAD is more or less equivalent to git checkout --no-checkout (couldn't resist) so there's that.

this is unnecessary and slow. [...] saving time and bandwidth.

How much time saved? As with every other performance problem, you don't know until you have measured it. And then you only know for the particular cases you measured. According to https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/ saves significant time only for large "monorepos". Is that your case?

#319 has some measurements. Please take a look at it and see how much other optimizations that are already available can help.

BTW: https://git-scm.com/docs/git-sparse-checkout

THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE.

For measuring/testing/prototyping here's a starting point:

--- a/src/west/app/project.py
+++ b/src/west/app/project.py
@@ -1228,7 +1228,7 @@ class Update(_ProjectCommand):
             # out the new detached HEAD, then print some helpful context.
             if take_stats:
                 start = perf_counter()
-            project.git(['checkout', '--detach', sha])
+            # project.git(['checkout', '--detach', sha])
             if take_stats:
                 stats['checkout new manifest-rev'] = perf_counter() - start
             self.post_checkout_help(project, current_branch,
@@ -1505,7 +1505,7 @@ class Update(_ProjectCommand):
             # it avoids a spammy detached HEAD warning from Git.
             if take_stats:
                 start = perf_counter()
-            project.git('checkout --detach ' + QUAL_MANIFEST_REV)
+            project.git('update-ref HEAD ' + QUAL_MANIFEST_REV)
             if take_stats:
                 stats['checkout new manifest-rev'] = perf_counter() - start

This (tested) HACK stops west update from checking out any code. After that you can use west foreach -h to "manually" perform sparse checkouts.

@marc-hb marc-hb added enhancement New feature or request performance How long things take labels Sep 20, 2023
@akauppi
Copy link

akauppi commented Dec 3, 2023

Newcomer comment:

I started going through the "Getting started" today (Windows 10 + WSL2) and this step takes ages. I am expecting only the latest state of Zephyr repo would be required, but it looks like a deep clone.

$ west init ~/zephyrproject/
=== Initializing in /home/akauppi/zephyrproject
--- Cloning manifest repository from https://github.com/zephyrproject-rtos/zephyr
Cloning into '/home/akauppi/zephyrproject/.west/manifest-tmp'...
remote: Enumerating objects: 958200, done.
remote: Counting objects: 100% (24324/24324), done.
remote: Compressing objects: 100% (1216/1216), done.
Receiving objects:   4% (43854/958200), 13.62 MiB | 176.00 KiB/s
Receiving objects:   4% (44464/958200), 13.73 MiB | 178.00 KiB/s
Receiving objects:  45% (437575/958200), 369.92 MiB | 210.00 KiB/s
...

There are likely two separate concerns: getting people onboarded fast (not happening to me, at least; the clone is still ongoing...) and the west update. Is this the right place for the comment?

Edit: There was something wrong with my WLAN. Was able to raise the 176 KiB/s to ~2 Mbps but that's missing the point. Still time-and-space taking, to get started.. 1h gone

@marc-hb
Copy link
Collaborator

marc-hb commented Dec 4, 2023

With some simple optimizations, the SOF CI clones the zephyr repo and a few others from scratch for every SOF PR in in 40s:

https://github.com/thesofproject/sof/actions/workflows/daily-tests.yml
https://github.com/thesofproject/sof/actions/runs/7080442491/job/19268340897

Is this the right place for the comment?

There are many different optimizations possible and many are not mutually exclusive. Here's a tentative list:
https://github.com/zephyrproject-rtos/west/issues?q=+label%3Aperformance+

None is perfect which is why none is enabled by default.

Was able to raise the 176 KiB/s to ~2 Mbps but that's missing the point. Still time-and-space taking, to get started.. 1h gone

2Mb/s does not qualify as "broadband". Without any optimization, cloning everything from scratch typically takes 5 minutes max for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance How long things take
Projects
None yet
Development

No branches or pull requests

3 participants