Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: Support cloning of specific repository versions (fixes gh-2109) #4036

Merged
merged 4 commits into from Jan 18, 2020

Conversation

mih
Copy link
Member

@mih mih commented Jan 17, 2020

GitRepo.clone() is enhanced to accept arbitrary additional options
that are passed on to git-clone. We use the --branch option
(that, despite its name, can handle any relevant version identifier)
to let git-clone do all the work.

datalad-clone is now set up to acknowledge a request for a particular
version, for any kind of source URL where decode_source_spec()
yields a non-None version property.

At the moment this is only the case for ria+http|ssh:// URL, but
future additions only need to alter/enhance decode_source_spec()
to extend this functionality for other types.

This change also sets the stage for gh-4035

The user experience of this new feature is negatively impacted by #4038.

…2109)

`GitRepo.clone()` is enhanced to accept arbitrary additional options
that are passed on to `git-clone`. We use the `--branch` option
(that, despite its name, can handle any relevant version identifier)
to let `git-clone` do all the work.

`datalad-clone` is now set up to acknowlege a request for a particular
version, for any kind of source URL where `decode_source_spec()`
yields a non-None `version` property.

At the moment this is only the case for `ria+http|ssh://` URL, but
future additions only need to alter/enhance `decode_source_spec()`
to extend this functionality for other types.

This change also sets the stage for dataladgh-4035
@mih mih added the merge-if-ok label Jan 17, 2020
mih added 2 commits Jan 17, 2020
But:

Here is what it looks like when requesting a version that doesn't exist:

```
% datalad clone "ria+http://127.0.0.1:41207/#33c2e000-38fa-11ea-aa8f-f0d5bf7b5561@impossible"
[ERROR  ] Failed to clone from all attempted sources: ['http://127.0.0.1:41207/33c/2e000-38fa-11ea-aa8f-f0d5bf7b5561', 'http://127.0.0.1:41207/33c/2e000-38fa-11ea-aa8f-f0d5bf7b5561/.git'] [install(/tmp/33c2e000-38fa-11ea-aa8f-f0d5bf7b5561)]
install(error): /tmp/33c2e000-38fa-11ea-aa8f-f0d5bf7b5561 (dataset) [Failed to clone from all attempted sources: ['http://127.0.0.1:41207/33c/2e000-38fa-11ea-aa8f-f0d5bf7b5561', 'http://127.0.0.1:41207/33c/2e000-38fa-11ea-aa8f-f0d5bf7b5561/.git']]
```

If we ignore the pointless double-reporting, we see that the error is wrong. The clone worked just fine, but the checkout failed. I would expect `git-clone` to be more clever, and just say it like it is.

Turns out it is
```
% git clone --branch impossible http://127.0.0.1:41207/33c/2e000-38fa-11ea-aa8f-f0d5bf7b5561
Cloning into '2e000-38fa-11ea-aa8f-f0d5bf7b5561'...
fatal: Remote branch impossible not found in upstream origin
```

```
(Pdb) GitRepo.clone(url='http://127.0.0.1:41207/33c/2e000-38fa-11ea-aa8f-f0d5bf7b5561', path='/tmp/broken', create=True, clone_options={'branch': 'impossible'})
[DEBUG  ] Git clone from http://127.0.0.1:41207/33c/2e000-38fa-11ea-aa8f-f0d5bf7b5561 to /tmp/broken
[DEBUG  ] HTTP: "GET /33c/2e000-38fa-11ea-aa8f-f0d5bf7b5561/info/refs?service=git-upload-pack HTTP/1.1" 200 -
[DEBUG  ] HTTP: "GET /33c/2e000-38fa-11ea-aa8f-f0d5bf7b5561/HEAD HTTP/1.1" 200 -
*** git.exc.GitCommandError: Cmd('/usr/bin/git') failed due to: exit code(128)
  cmdline: /usr/bin/git clone --progress -v --branch=impossible http://127.0.0.1:41207/33c/2e000-38fa-11ea-aa8f-f0d5bf7b5561 /tmp/broken
```

Also reveals the issue, but does not seem to include the critical information in the exception

```
(Pdb) gitpy.Repo.clone_from('http://127.0.0.1:32771/2bd/222e6-38fc-11ea-a876-f0d5bf7b5561', '/tmp/broken', multi_options=['--branch=impossible'])
[DEBUG  ] HTTP: "GET /2bd/222e6-38fc-11ea-a876-f0d5bf7b5561/info/refs?service=git-upload-pack HTTP/1.1" 200 -
[DEBUG  ] HTTP: "GET /2bd/222e6-38fc-11ea-a876-f0d5bf7b5561/HEAD HTTP/1.1" 200 -
*** git.exc.GitCommandError: Cmd('/usr/bin/git') failed due to: exit code(128)
  cmdline: /usr/bin/git clone -v --branch=impossible http://127.0.0.1:32771/2bd/222e6-38fc-11ea-a876-f0d5bf7b5561 /tmp/broken
  stderr: 'Cloning into '/tmp/broken'...
fatal: Remote branch impossible not found in upstream origin
...

-> e_str = exc_str(e)
(Pdb) e.stderr
''
(Pdb) e.stdout
''
(Pdb) e.status
128
(Pdb) str(e)
"Cmd('/usr/bin/git') failed due to: exit code(128)\n  cmdline: /usr/bin/git clone --progress -v http://127.0.0.1:38771/a23/42448-38fc-11ea-a812-f0d5bf7b5561/subdir/subds /tmp/datalad_temp_tree_test_ria_httph0n4roy6/clone/subdir/subds"
```

Possibly the progress reporting makes the output vanish

```
(Pdb) git_progress = GitPythonProgressBar("Cloning")
(Pdb) gitpy.Repo.clone_from('http://127.0.0.1:38771/a23/42448-38fc-11ea-a812-f0d5bf7b5561', '/tmp/broken', multi_options=['--branch=impossible'], progress=git_progress)
[DEBUG  ] HTTP: "GET /a23/42448-38fc-11ea-a812-f0d5bf7b5561/info/refs?service=git-upload-pack HTTP/1.1" 200 -
[DEBUG  ] HTTP: "GET /a23/42448-38fc-11ea-a812-f0d5bf7b5561/HEAD HTTP/1.1" 200 -
*** git.exc.GitCommandError: Cmd('/usr/bin/git') failed due to: exit code(128)
  cmdline: /usr/bin/git clone --progress -v --branch=impossible http://127.0.0.1:38771/a23/42448-38fc-11ea-a812-f0d5bf7b5561 /tmp/broken
```

Seems to be the case :(
But it is not git's fault:

```
% /usr/bin/git clone --progress -v --branch=impossible http://127.0.0.1:38771/a23/42448-38fc-11ea-a812-f0d5bf7b5561 /tmp/broken

Cloning into '/tmp/broken'...
fatal: Remote branch impossible not found in upstream origin
```

@bpoldrack points out that this line may give a hint on where we are discarding this information
https://github.com/datalad/datalad/blame/master/datalad/support/gitrepo.py#L480/datalad/datalad/commit/0610dda16d4181d2525af0f5dd90c4da370ed840
@codecov
Copy link

codecov bot commented Jan 17, 2020

Codecov Report

Merging #4036 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4036      +/-   ##
==========================================
- Coverage   89.78%   89.77%   -0.02%     
==========================================
  Files         272      272              
  Lines       36536    36555      +19     
==========================================
+ Hits        32804    32817      +13     
- Misses       3732     3738       +6
Impacted Files Coverage Δ
datalad/support/gitrepo.py 89.74% <100%> (ø) ⬆️
datalad/core/distributed/tests/test_clone.py 90.93% <100%> (+0.46%) ⬆️
datalad/core/distributed/clone.py 92.88% <100%> (+0.09%) ⬆️
datalad/downloaders/tests/test_http.py 60.58% <0%> (-1.22%) ⬇️
datalad/downloaders/http.py 74.9% <0%> (-0.4%) ⬇️
datalad/downloaders/base.py 75.18% <0%> (-0.38%) ⬇️
datalad/utils.py 87.37% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 47980eb...6299500. Read the comment docs.

@yarikoptic
Copy link
Member

yarikoptic commented Jan 18, 2020

At least according to the man git-clone, there is no promise for git clone to be able to checkout specific hexsha.

--branch
Instead of pointing the newly created HEAD to the branch pointed to by the cloned repository’s HEAD, point to branch instead. In a non-bare repository, this is the branch that will be checked out. --branch can also take tags and detaches the HEAD at that commit in the resulting repository.

But I guess underneath it just uses checkout those could consume anything it consumes.

Either way - great stuff, thanks!

@yarikoptic yarikoptic merged commit fca00c0 into datalad:master Jan 18, 2020
17 checks passed
@mih mih deleted the enh-cloneversion branch Jan 20, 2020
@mih
Copy link
Member Author

mih commented Jan 20, 2020

FTR: We test this approach with a tag (also not promised to work), as a likely non-branch approach to versioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-if-ok
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants