Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datalad get (-n) of subdataset in adjusted mode yield improper initialization #5257

Closed
mih opened this issue Dec 16, 2020 · 3 comments · Fixed by #5241
Closed

datalad get (-n) of subdataset in adjusted mode yield improper initialization #5257

mih opened this issue Dec 16, 2020 · 3 comments · Fixed by #5241
Labels
adjusted-branches Issues caused by or affected adjusted branch operation severity-critical makes it unusable, causes data loss, introduces security vulnerability

Comments

@mih
Copy link
Member

mih commented Dec 16, 2020

Using the test code from #5241 I build a superdataset with a subdataset that has a single annexed files. I push them into a ria store, and clone the superdataset from it.

Running the following snippet in the superdataset, I can more or less consistently generate an improperly initialized subdataset:

# clean prev attempt
datalad uninstall --nocheck sub
# cause the breakage
datalad get -n sub
# test for the breakage: there should be a diff between HEAD of the managed branch
# to the corresponding branch from creating a pointer file
[ "x$(git -C sub diff dl-test-branch..HEAD)" = "x" ] && echo BROKEN

In this broken state, the adjusted mode branch has no commit from adjusting. A subsequent git annex get
will leave the dataset dirty (staged adjusted file).

I am not able to replicate this situation outside of datalad. Here is my attempt at performing the internal steps manually without datalad:

% datalad uninstall --nocheck sub
% git submodule update sub

% git -C sub annex init
% [ "x$(git -C sub diff dl-test-branch..$(git -C sub log --pretty=%h -1))" = "x" ] && echo BROKEN
@bpoldrack
Copy link
Member

FTR: Spontaneously smells like an issue with clone.postclone_check_head() to me.

@mih
Copy link
Member Author

mih commented Dec 17, 2020

Debug session with @bpoldrack revealed that it is indeed datalad that destroys a perfectly initialized subdataset in an attempt to checkout the correct submodule commit -- in complete ignorance of adjusted mode.

The path to enlightenment seems to be to delete all this code in get and use the capabilities of clone_dataset() itself to check out particular version.

Well, not that simple. clone_dataset cannot do what it needed, but would need to be equipped with what is needed. Importantly, the checkout of the right commit needs to happen prior to git annex init.

@mih mih added adjusted-branches Issues caused by or affected adjusted branch operation severity-critical makes it unusable, causes data loss, introduces security vulnerability labels Dec 17, 2020
@mih
Copy link
Member Author

mih commented Dec 17, 2020

I have a fix. I think.

mih added a commit to mih/datalad that referenced this issue Dec 17, 2020
...and before an `annex init`. Previously this was done in the context
of `get` and the installation of submodules.

This RF enables the proper adjusting of freshly installed subdatasets.
In the previous approach, we would miss and reset an adjusted branch to
the recorded state and essentially break the repository setup, until
the first `save` would rectify it.

This is more or less just shifting code from `get` to `clone`. Please
a bit of renaming and slightly adjusted error handling.

Fixes dataladgh-5257
@mih mih closed this as completed in #5241 Dec 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adjusted-branches Issues caused by or affected adjusted branch operation severity-critical makes it unusable, causes data loss, introduces security vulnerability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants