Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get data from clone of a clone #3926

Closed
dorianps opened this issue Dec 15, 2019 · 3 comments
Closed

Cannot get data from clone of a clone #3926

dorianps opened this issue Dec 15, 2019 · 3 comments
Assignees
Labels
enhancement fix-implemented

Comments

@dorianps
Copy link

@dorianps dorianps commented Dec 15, 2019

I checked a bit the issues and google but this does not seem to have been reported before.

What is the problem?

When working in a local server (i.e., only local paths involved), cloning a dataset A into B, and cloning B into C, makes C look into its source (that is B) when getting the data. As a result, if data is only in A, C will fail to retrieve the data with datalad get. This is somewhat counter intuitive because a new install should know in theory about other copies of the dataset and be able to retrieve the data from one of the copy locations. It also forces the users that need to be parsimonious on storage to clone from A only, at whichever status that is.

What steps will reproduce the problem?

I can provide tomorrow some code if the problem is not easy to reproduce.

What version of DataLad are you using (run datalad --version)? On what operating system (consider running datalad wtf)?

0.12.rc6

Is there anything else that would be useful to know in this context?

Not sure how this can be resolved, maybe adding more git-annex locations for each file, or perhaps by adding the remote A during the install process from B to C. In principle though, new installs should always be able to scout the data even when a clone is cloned.

Have you had any success using DataLad before? (to assess your expertise/prior luck. We would welcome your testimonial additions to https://github.com/datalad/datalad/wiki/Testimonials as well)

NA

@mih
Copy link
Member

@mih mih commented Dec 15, 2019

Here is the code

# setup
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad create src
[INFO   ] Creating a new annex repo at /tmp/cloneclone/src 
create(ok): /tmp/cloneclone/src (dataset)                                                                                    
(datalad3-dev) mih@meiner /tmp/cloneclone % echo 123 > src/file1
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad save -d src
add(ok): file1 (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

# 1st-level clone
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad clone src clone1
[INFO   ] Cloning src into '/tmp/cloneclone/clone1' 
install(ok): /tmp/cloneclone/clone1 (dataset)
# 2nd-level clone
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad clone clone1 clone2
[INFO   ] Cloning clone1 into '/tmp/cloneclone/clone2' 
install(ok): /tmp/cloneclone/clone2 (dataset)
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad get -d clone2 file1
get(error): /tmp/cloneclone/file1 [path not associated with dataset <Dataset path=/tmp/cloneclone/clone2>]

# failure described by OP
(datalad3-dev) 1 mih@meiner /tmp/cloneclone % datalad get -d clone2 clone2/file1
[WARNING] Running get resulted in stderr output: git-annex: get: 1 failed
 
get(error): file1 (file) [not available; Try making some of these repositories available:;      892982dc-ecaa-4cf7-b1c4-708ec32d133f -- mih@meiner:/tmp/cloneclone/src]

# potential fix
(datalad3-dev) mih@meiner /tmp/cloneclone % git -C clone2 remote add src $(git -C clone1 config remote.origin.url)

# proof of principle
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad update -d clone2
[INFO   ] Fetching updates for <Dataset path=/tmp/cloneclone/clone2> 
update(ok): . (dataset)
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad get -d clone2 clone2/file1
get(ok): file1 (file) [from src...]

If we start inheriting any configured remote (or just origin?), we need to come up with a reliable naming scheme. If we inherit more than just origin, we need to come up with a way to deal with build-up of cruft (chains of clones will quickly turn into a long list of remotes that will slow some operations). And we will also have to anticipate and prevent recursive self-references.

Also, I don't see why we should limit such a feature to local clones.

@mih
Copy link
Member

@mih mih commented Dec 28, 2019

I will look into this feature addition as part of #3966

@mih
Copy link
Member

@mih mih commented Dec 29, 2019

An implementation of this feature is now available in #3966 Please have a look.

@mih mih added the fix-implemented label Dec 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement fix-implemented
Projects
None yet
Development

No branches or pull requests

2 participants