-
Notifications
You must be signed in to change notification settings - Fork 180
Properly performs shallow clone when depth
is used
#565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looking at certain blocks of code to determine why they each take 30s.
Note, that repo checking out could be made much faster with the omission of |
Testing results on my system. Measurements were made with Using mbed-cli v1.2.2, |
mbed/mbed.py
Outdated
@@ -1724,11 +1744,12 @@ def new(name, scm='git', program=False, library=False, mbedlib=False, create_onl | |||
p.path = cwd_root | |||
p.set_root() | |||
if not create_only and not p.get_os_dir() and not p.get_mbedlib_dir(): | |||
url = mbed_lib_url if mbedlib else mbed_os_url+'#latest' | |||
url = mbed_lib_url if mbedlib else mbed_os_url+"#latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this modification required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Reverted.
mbed/mbed.py
Outdated
d = 'mbed' if mbedlib else 'mbed-os' | ||
try: | ||
with cd(d_path): | ||
add(url, depth=depth, protocol=protocol, top=False) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we drop the formatting changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
mbed/mbed.py
Outdated
@@ -1800,7 +1821,7 @@ def import_(url, path=None, ignore=False, depth=None, protocol=None, top=True): | |||
warning(err) | |||
else: | |||
error(err, 1) | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't add superfluous white space.
mbed/mbed.py
Outdated
if top: | ||
Program(repo.path).post_action() | ||
#if top: | ||
# Program(repo.path).post_action() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to check in this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. Please see comments below. When implementing these otherwise great improvements to --depth, please consider the following design patterns:
- As much as possible let Git/HG handle the filesystem/checkout/directory behavior. These tools will throw all the appropriate errors/warnings/tips, and a developer will be familiar with these anyway.
- When adding a condition, the behavior regarding path naming, url formating, etc shouldn't deviate from the established behavior
- SCM methods should be fairly consistent so the parent Repo() class can directly call them, instead of if scm.name == 'git'...
Also could you please test this against cache turned on and off? We'd like to enable cache by default in mbed CLI 1.3. Caching provides significant speedup, which makes --depth somewhat obsolete. See below
$ time mbed new test1 -q
real 0m38.353s
user 0m13.222s
sys 0m6.563s
$ time mbed new test2 -q
real 0m39.369s
user 0m11.199s
sys 0m6.306s
$ mbed config -G cache on
[mbed] on now set as global cache
$ time mbed new test1-cache -q # cache is being during this run
real 0m38.158s
user 0m12.148s
sys 0m6.814s
$ time mbed new test2-cache -q # this effectively uses cache
real 0m10.223s
user 0m3.259s
sys 0m3.641s
~ 3.7 times improvement
That doesn't mean that we shouldn't try to combine both - cache and shallow clone, and your implementation enables that. Unfortunately in it's current state it's very intrusive and breaks behavior of Git.clone()
result = pquery([git_cmd, "ls-remote", url, (rev if rev else "HEAD")]) | ||
|
||
if result and rev: | ||
repo_name = url.split('/')[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks the behavior of clone() regarding param path. As an example if path="/tmp/sometempdir", the first part of this "if" will ignore the path value and create it's own dir based on the url. The expected behavior (as show in the else) is that path is passed to git and therefore the cloned repository will be stored in path, not in a folder generated from the url name.
E.g. imagine that there's no correlation between the library folder name and the repo url, e.g. https://github.com/ARMmbed/spiflash-stm32f429-driver is referenced from spiflash-driver.lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a good point. It didn't occur to me that the paths could also point to local repos.
if '.git' in repo_name: | ||
repo_name = repo_name[:-4] | ||
|
||
os.mkdir(repo_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if destination directory is readonly or there's filesystem error, a try statement here will help prevent ugly traceback or silent behavior (mbed CLI by default supresses tracebacks)
os.mkdir(repo_name) | ||
|
||
with cd(repo_name): | ||
Git.init() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not try to find an alternative approach instead of hacking it? E.g.
- Clone using the original depth command
- Add an if statement if --depth is used to call fetch with revision/tag parameter.
Worst case scenario this would fetch the latest from master + latest tag, vs 2 calls - git ls-remote
+ git fetch
+ headache around naming and filesystem handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually is the alternative approach. The entire goal of the feature was to download as little information as possible, to speed up the command.
To use git clone
first would defeat the purpose of the enhancement, since downloading the entire repo history is excessive if the user only wants the files from a single release. I didn't find a flag for git clone
that would allow for cloning a repo at a specific ref, hence the split.
info("Fetching revisions from remote repository to \"%s\"" % os.path.basename(os.getcwd())) | ||
popen([git_cmd, 'fetch', '--all', '--tags'] + (['-v'] if very_verbose else ([] if verbose else ['-q']))) | ||
if url: | ||
popen([git_cmd, 'fetch', '--tags'] + ([url] if url else []) + ([rev] if rev else []) + (["--depth", depth] if depth else []) + (['-v'] if very_verbose else ([] if verbose else ['-q']))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URL should be passed to format(url, protocol) for behavior consistency. See line 627 popen([git_cmd, 'clone', formaturl(url, protocol), path] + (['-v'] if very_verbose else ([] if verbose else ['-q'])))
@screamerbg, thank you for listing out the suggested design patterns. Testing local git repos did pass my checks, but isn't this the kind of thing that pytest should have caught? To your third point, I was under the impression that HG already didn't support the I'll definitely look at making sure it remains compatible with local repos, and check its behavior when using the cache feature. is there an eta on v1.3? This is the first that I've heard of a roadmap for mbed-cli. |
@cmonr, pytest currently doesn't cover the caching feature, nor the circle CI tests. But contributions are (always) welcome :) I haven't suggested that HG supports shallow clones - it doesn't, not in the sense of a standard HG feature. You can emulate it with a non-standard additional plugin, but it's a messy business. Regarding the roadmap, mbed CLI is part of the Mbed OS core tools (despite being a separate repository) and as such it's part of the mbed OS roadmap. As for it's specific technical features, feel free to contact @sg- or myself if you have questions. Caching has been around as an optional (experimental) feature for about a year now. We're looking for ways to introduce this as a standard default-on feature, but stability is a concern and we've been continuously improving how caching works. The next big step for caching will be to introduce a standard interface that allows a user to manipulate their cache, e.g.
|
When
depth
flag is used,clone
will attempt to pull only a single commit from repos.If a SHA in a .lib file does not match a repo ref, the
clone
will revert to the default behavior of pulling the entire repo and checking out the specific SHA.References:
#560
#561