Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot get data from clone of a clone #3926

dorianps opened this issue Dec 15, 2019 · 3 comments

Cannot get data from clone of a clone #3926

dorianps opened this issue Dec 15, 2019 · 3 comments
enhancement fix-implemented


Copy link

@dorianps dorianps commented Dec 15, 2019

I checked a bit the issues and google but this does not seem to have been reported before.

What is the problem?

When working in a local server (i.e., only local paths involved), cloning a dataset A into B, and cloning B into C, makes C look into its source (that is B) when getting the data. As a result, if data is only in A, C will fail to retrieve the data with datalad get. This is somewhat counter intuitive because a new install should know in theory about other copies of the dataset and be able to retrieve the data from one of the copy locations. It also forces the users that need to be parsimonious on storage to clone from A only, at whichever status that is.

What steps will reproduce the problem?

I can provide tomorrow some code if the problem is not easy to reproduce.

What version of DataLad are you using (run datalad --version)? On what operating system (consider running datalad wtf)?


Is there anything else that would be useful to know in this context?

Not sure how this can be resolved, maybe adding more git-annex locations for each file, or perhaps by adding the remote A during the install process from B to C. In principle though, new installs should always be able to scout the data even when a clone is cloned.

Have you had any success using DataLad before? (to assess your expertise/prior luck. We would welcome your testimonial additions to as well)


Copy link

@mih mih commented Dec 15, 2019

Here is the code

# setup
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad create src
[INFO   ] Creating a new annex repo at /tmp/cloneclone/src 
create(ok): /tmp/cloneclone/src (dataset)                                                                                    
(datalad3-dev) mih@meiner /tmp/cloneclone % echo 123 > src/file1
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad save -d src
add(ok): file1 (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

# 1st-level clone
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad clone src clone1
[INFO   ] Cloning src into '/tmp/cloneclone/clone1' 
install(ok): /tmp/cloneclone/clone1 (dataset)
# 2nd-level clone
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad clone clone1 clone2
[INFO   ] Cloning clone1 into '/tmp/cloneclone/clone2' 
install(ok): /tmp/cloneclone/clone2 (dataset)
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad get -d clone2 file1
get(error): /tmp/cloneclone/file1 [path not associated with dataset <Dataset path=/tmp/cloneclone/clone2>]

# failure described by OP
(datalad3-dev) 1 mih@meiner /tmp/cloneclone % datalad get -d clone2 clone2/file1
[WARNING] Running get resulted in stderr output: git-annex: get: 1 failed
get(error): file1 (file) [not available; Try making some of these repositories available:;      892982dc-ecaa-4cf7-b1c4-708ec32d133f -- mih@meiner:/tmp/cloneclone/src]

# potential fix
(datalad3-dev) mih@meiner /tmp/cloneclone % git -C clone2 remote add src $(git -C clone1 config remote.origin.url)

# proof of principle
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad update -d clone2
[INFO   ] Fetching updates for <Dataset path=/tmp/cloneclone/clone2> 
update(ok): . (dataset)
(datalad3-dev) mih@meiner /tmp/cloneclone % datalad get -d clone2 clone2/file1
get(ok): file1 (file) [from src...]

If we start inheriting any configured remote (or just origin?), we need to come up with a reliable naming scheme. If we inherit more than just origin, we need to come up with a way to deal with build-up of cruft (chains of clones will quickly turn into a long list of remotes that will slow some operations). And we will also have to anticipate and prevent recursive self-references.

Also, I don't see why we should limit such a feature to local clones.

Copy link

@mih mih commented Dec 28, 2019

I will look into this feature addition as part of #3966

Copy link

@mih mih commented Dec 29, 2019

An implementation of this feature is now available in #3966 Please have a look.

@mih mih added the fix-implemented label Dec 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
enhancement fix-implemented
None yet

No branches or pull requests

2 participants