Properly performs shallow clone when `depth` is used #565

cmonr · 2017-11-22T19:49:25Z

When depth flag is used, clone will attempt to pull only a single commit from repos.

If a SHA in a .lib file does not match a repo ref, the clone will revert to the default behavior of pulling the entire repo and checking out the specific SHA.

References:
#560
#561

Looking at certain blocks of code to determine why they each take 30s.

cmonr · 2017-11-22T19:51:15Z

Note, that repo checking out could be made much faster with the omission of .postAction() and .sync() calls (see first commit), but determining if those are safe for modification are our of scope for this PR.

cmonr · 2017-11-22T19:55:28Z

Testing results on my system. Measurements were made with time command in Ubuntu 16.04.3 LTS VM.

Using mbed-cli v1.2.2, mbed new ./ took 4m40.304s.
Using modified mbed-cli, mbed new ./ took 2m38.188s.

theotherjimmy · 2017-12-01T15:51:04Z

mbed/mbed.py

@@ -1724,11 +1744,12 @@ def new(name, scm='git', program=False, library=False, mbedlib=False, create_onl
        p.path = cwd_root
        p.set_root()
        if not create_only and not p.get_os_dir() and not p.get_mbedlib_dir():
-            url = mbed_lib_url if mbedlib else mbed_os_url+'#latest'
+            url = mbed_lib_url if mbedlib else mbed_os_url+"#latest"


Is this modification required?

Nope. Reverted.

theotherjimmy · 2017-12-01T15:51:15Z

mbed/mbed.py

            d = 'mbed' if mbedlib else 'mbed-os'
            try:
                with cd(d_path):
                    add(url, depth=depth, protocol=protocol, top=False)
+


Can we drop the formatting changes?

theotherjimmy · 2017-12-01T15:51:34Z

mbed/mbed.py

@@ -1800,7 +1821,7 @@ def import_(url, path=None, ignore=False, depth=None, protocol=None, top=True):
            warning(err)
        else:
            error(err, 1)
-
+   


Please don't add superfluous white space.

theotherjimmy · 2017-12-01T15:51:55Z

mbed/mbed.py

-    if top:
-        Program(repo.path).post_action()
+    #if top:
+    #    Program(repo.path).post_action()


Did you mean to check in this change?

Nope. Reverted.

screamerbg

Thanks for the contribution. Please see comments below. When implementing these otherwise great improvements to --depth, please consider the following design patterns:

As much as possible let Git/HG handle the filesystem/checkout/directory behavior. These tools will throw all the appropriate errors/warnings/tips, and a developer will be familiar with these anyway.
When adding a condition, the behavior regarding path naming, url formating, etc shouldn't deviate from the established behavior
SCM methods should be fairly consistent so the parent Repo() class can directly call them, instead of if scm.name == 'git'...

Also could you please test this against cache turned on and off? We'd like to enable cache by default in mbed CLI 1.3. Caching provides significant speedup, which makes --depth somewhat obsolete. See below

$ time mbed new test1 -q
real	0m38.353s
user	0m13.222s
sys	0m6.563s
$ time mbed new test2 -q
real	0m39.369s
user	0m11.199s
sys	0m6.306s
$ mbed config -G cache on
[mbed] on now set as global cache
$ time mbed new test1-cache -q # cache is being during this run
real	0m38.158s
user	0m12.148s
sys	0m6.814s
$ time mbed new test2-cache -q # this effectively uses cache
real	0m10.223s
user	0m3.259s
sys	0m3.641s

~ 3.7 times improvement

That doesn't mean that we shouldn't try to combine both - cache and shallow clone, and your implementation enables that. Unfortunately in it's current state it's very intrusive and breaks behavior of Git.clone()

screamerbg · 2017-12-02T14:09:03Z

mbed/mbed.py

+        result = pquery([git_cmd, "ls-remote", url, (rev if rev else "HEAD")])
+
+        if result and rev:
+            repo_name = url.split('/')[-1]


This breaks the behavior of clone() regarding param path. As an example if path="/tmp/sometempdir", the first part of this "if" will ignore the path value and create it's own dir based on the url. The expected behavior (as show in the else) is that path is passed to git and therefore the cloned repository will be stored in path, not in a folder generated from the url name.
E.g. imagine that there's no correlation between the library folder name and the repo url, e.g. https://github.com/ARMmbed/spiflash-stm32f429-driver is referenced from spiflash-driver.lib.

Ah, that's a good point. It didn't occur to me that the paths could also point to local repos.

screamerbg · 2017-12-02T14:09:54Z

mbed/mbed.py

+            if '.git' in repo_name:
+                repo_name = repo_name[:-4]
+
+            os.mkdir(repo_name)


if destination directory is readonly or there's filesystem error, a try statement here will help prevent ugly traceback or silent behavior (mbed CLI by default supresses tracebacks)

screamerbg · 2017-12-02T14:12:21Z

mbed/mbed.py

+            os.mkdir(repo_name)
+
+            with cd(repo_name):
+                Git.init()


Why not try to find an alternative approach instead of hacking it? E.g.

Clone using the original depth command

Add an if statement if --depth is used to call fetch with revision/tag parameter.

Worst case scenario this would fetch the latest from master + latest tag, vs 2 calls - git ls-remote + git fetch + headache around naming and filesystem handling.

This actually is the alternative approach. The entire goal of the feature was to download as little information as possible, to speed up the command.

To use git clone first would defeat the purpose of the enhancement, since downloading the entire repo history is excessive if the user only wants the files from a single release. I didn't find a flag for git clone that would allow for cloning a repo at a specific ref, hence the split.

screamerbg · 2017-12-02T14:13:34Z

mbed/mbed.py

        info("Fetching revisions from remote repository to \"%s\"" % os.path.basename(os.getcwd()))
-        popen([git_cmd, 'fetch', '--all', '--tags'] + (['-v'] if very_verbose else ([] if verbose else ['-q'])))
+        if url:
+            popen([git_cmd, 'fetch', '--tags'] + ([url] if url else []) + ([rev] if rev else []) + (["--depth", depth] if depth else []) + (['-v'] if very_verbose else ([] if verbose else ['-q'])))


URL should be passed to format(url, protocol) for behavior consistency. See line 627 popen([git_cmd, 'clone', formaturl(url, protocol), path] + (['-v'] if very_verbose else ([] if verbose else ['-q'])))

cmonr · 2017-12-03T01:44:08Z

@screamerbg, thank you for listing out the suggested design patterns. Testing local git repos did pass my checks, but isn't this the kind of thing that pytest should have caught?

To your third point, I was under the impression that HG already didn't support the --depth parameter, which is why I thought the flag wasn't following the design pattern to begin with. My experience with HG is minimal, but this was my reference into that question when I was developing the feature: https://stackoverflow.com/questions/7934031/what-is-the-status-of-the-mercurial-shallow-clone-extension

I'll definitely look at making sure it remains compatible with local repos, and check its behavior when using the cache feature. is there an eta on v1.3? This is the first that I've heard of a roadmap for mbed-cli.

screamerbg · 2017-12-03T18:21:03Z

@cmonr, pytest currently doesn't cover the caching feature, nor the circle CI tests. But contributions are (always) welcome :)

I haven't suggested that HG supports shallow clones - it doesn't, not in the sense of a standard HG feature. You can emulate it with a non-standard additional plugin, but it's a messy business.

Regarding the roadmap, mbed CLI is part of the Mbed OS core tools (despite being a separate repository) and as such it's part of the mbed OS roadmap. As for it's specific technical features, feel free to contact @sg- or myself if you have questions. Caching has been around as an optional (experimental) feature for about a year now. We're looking for ways to introduce this as a standard default-on feature, but stability is a concern and we've been continuously improving how caching works. The next big step for caching will be to introduce a standard interface that allows a user to manipulate their cache, e.g.

mbed cache [default|on|off|none|/path]
mbed cache list
mbed cache size
mbed cache purge|clean

cmonr added 3 commits November 21, 2017 15:49

Sped up mbed new from 5m to 2m.

35f1444

Looking at certain blocks of code to determine why they each take 30s.

Modified git clone function to perform shallow clones if requested

7cd5a65

Removed extra print statements

e1c69e8

cmonr added 2 commits November 22, 2017 14:10

Added rev parameter to hg clone function.

67eb3ab

Passes local tests.

8a3510a

cmonr mentioned this pull request Nov 28, 2017

Replace SHAs with git refs in .lib files #568

Closed

theotherjimmy suggested changes Dec 1, 2017

View reviewed changes

Cleaned up formatting

c7711b3

theotherjimmy approved these changes Dec 1, 2017

View reviewed changes

theotherjimmy merged commit 001c68b into ARMmbed:master Dec 1, 2017

cmonr deleted the issue_560 branch December 1, 2017 17:03

screamerbg reviewed Dec 2, 2017

View reviewed changes

screamerbg mentioned this pull request Dec 2, 2017

Revert "Properly performs shallow clone when depth is used" #573

Merged

Properly performs shallow clone when depth is used #565

Properly performs shallow clone when depth is used #565

Uh oh!

Conversation

cmonr commented Nov 22, 2017

Uh oh!

cmonr commented Nov 22, 2017

Uh oh!

cmonr commented Nov 22, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

screamerbg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

screamerbg Dec 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

screamerbg Dec 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmonr commented Dec 3, 2017

Uh oh!

screamerbg commented Dec 3, 2017

Uh oh!

Uh oh!

Properly performs shallow clone when `depth` is used #565

Properly performs shallow clone when `depth` is used #565

screamerbg left a comment •

edited

Loading

screamerbg Dec 2, 2017 •

edited

Loading

screamerbg Dec 2, 2017 •

edited

Loading