Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if the performance of submodule update could be improved #686

Open
karhama opened this issue Sep 22, 2023 · 7 comments
Open
Labels
performance How long things take

Comments

@karhama
Copy link
Contributor

karhama commented Sep 22, 2023

We have west projects which are configured to use submodules. We have noticed that if access to remote repository is slow or submodule repository size is large submodule update can become the slowest part of the west update process.

West has already multiple flags to speedup update process (e.g. --narrow, --fetch-opt --path-cache) but none of these optimization are used in submodule update phase. For submodules there is option --submodule-init-config but it seems tricky (if not impossible?) to configure e.g. depth 1 for submodule update this way.

Please consider adding submodule update flag similar to --fetch-opt. In addition --path-cache option could perhaps be extended to support submodules (git submodule update --reference )

@karhama karhama changed the title Investigate if the performance of submodule update could be improved (west update) Investigate if the performance of submodule update could be improved Sep 22, 2023
@marc-hb
Copy link
Collaborator

marc-hb commented Nov 6, 2023

Please consider adding submodule update flag similar to --fetch-opt.

Which submodule options have you already tried to pass to --fetch-opt=?

The only thing west does is invoking git with various options.

@karhama
Copy link
Contributor Author

karhama commented Nov 7, 2023

I have not tried to pass any submodule-related options to fetch-opt as I didn't find any useful flags in the git fetch documentation (I might have missed something!).

So, I tried using --depth=1 as fetch-opts, but with this flag, submodules are updated with the full history.

@marc-hb marc-hb added the performance How long things take label Nov 7, 2023
@marc-hb
Copy link
Collaborator

marc-hb commented Nov 7, 2023

west is mostly just a wrapper on top of git and it will rarely ever go beyond what git itself can do - especially not for a git feature that duplicates what west does!

So if you can't find a way to optimize submodules with git alone, then we'll have to close this issue as "wontfix".

If you're really desperate for cloning performance (e.g.: in CI) then you can have some git repos registered as BOTH a git submodule AND in a west manifest. Then you can use west --fetch-opt=... to initialize git submodules faster "behind their back". This duplication has some minor inconveniences but it's routine and very well tested in at least this project:
thesofproject/sof@8fd351ea9aa6c3117

To keep such an "submodule override" optimization private you can use a submanifests directory:
https://docs.zephyrproject.org/latest/develop/west/manifest.html#manifest-imports

Working example here: thesofproject/sof@fca84b8773ee4a8e6308a0

For more performance discussions take a look at this label: https://github.com/zephyrproject-rtos/west/issues?q=+label%3Aperformance+

What would probably make the most performance difference is (as often) parallelization:

@karhama
Copy link
Contributor Author

karhama commented Nov 8, 2023

Git itself supports additional flags for submodule update (e.g. depth and reference flags mentioned here: https://git-scm.com/docs/git-submodule).

However, inside west, submodule update seems quite fixed. I mean, there isn't an equivalent for e.g. fetch-opts around here: https://github.com/zephyrproject-rtos/west/blob/main/src/west/app/project.py#L1168
Would it make sense to have e.g. submodule-update-opts flag here?

You are correct that this is in CI. We already use west update --path-cache and it works great. However path-cache won't utilize cache for submodules. Maybe path-cache could be extended to support submodules?

Having repos both as git submodules and in the west manifest does not sound very elegant solution. It probably is ok for couple of repos which rarely move but what if submodules look something like this: https://github.com/project-chip/connectedhomeip/blob/master/.gitmodules

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 8, 2023

Would it make sense to have e.g. submodule-update-opts flag here?

Maybe. Or simpler: pass --fetch-opt arguments to git submodule update too. Could you take a stab at it?

Maybe path-cache could be extended to support submodules?

I don't know that part of the code but it sounds like it would be more complicated.

Having repos both as git submodules and in the west manifest does not sound very elegant solution.

Agreed, I wrote "If you're really desperate for cloning performance". Then this can give CI a boost TODAY without other users noticing (so it's not that ugly).

It probably is ok for couple of repos which rarely move

Agreed, that's what I had in mind. Not that:

but what if submodules look something like this: https://github.com/project-chip/connectedhomeip/blob/master/.gitmodules

If submodules look like this, you have a bigger problem than just performance. Managing multiple git repos is complex and bordering on "package management"
zephyrproject-rtos/zephyr#54276 (comment)

Managing a large number of git repos using TWO different solutions at the same time (west + submodules) seems very problematic long before even looking at performance. You should explore alternative ways not to do that; such a complex combination of project manifests seems like a recipe for various disasters.

Circling back: like it or not west is "competing" with git submodules with extremely limited development resources. So it's not going to bend over backwards to help manage git submodules. If simple git submodule optimizations can be achieved then great! But if other git submodule work requires a lot of development effort and longer term maintenance then I would strongly advise against it.

@mbolivar-ampere without looking at the technical details can you please confirm, correct or nuance the last, non-technical point?

@karhama
Copy link
Contributor Author

karhama commented Nov 15, 2023

Maybe. Or simpler: pass --fetch-opt arguments to git submodule update too. Could you take a stab at it?

I'm slightly worried that this could cause backward compatibility issues in case someone is using the current fetch-opt for purposes other than, for example, depth.

Anyway I conducted some tests locally with --depth 1 for one of the projects that has submodules and compared cloned sizes. The gains with --depth 1 were not substantial IMO (only ~10% smaller size on disk), but this obviously depends on the repos. west update time was fluctuating much more than 10% for me so I would need a lot of samples to see how much time could be saved.

I also tried extending path-cache support for submodules using --reference flag, and it appears more promising to me, at least from a performance standpoint.

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 15, 2023

The gains with --depth 1 were not substantial IMO (only ~10% smaller size on disk), but this obviously depends on the repos.

--depth 1 saves time and disk only for repos with large histories. In smaller repos, the entire history takes less space than one checkout (because it's compressed)

Try BOTH --depth 1 and --filter=tree:0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance How long things take
Projects
None yet
Development

No branches or pull requests

2 participants