Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update/clone/create-sibling: Fixes related to a commit-less HEAD #4370

Merged
merged 6 commits into from Apr 3, 2020

Conversation

kyleam
Copy link
Contributor

@kyleam kyleam commented Apr 1, 2020

The core issue fixed by this series is our handling of a cloned dataset where HEAD points to an unborn branch. It seems the most likely way to get into this situation is to create a sibling from a repo where the current branch isn't master and where master won't be published to the sibling.

This series prevents this situation from happening by teaching (1) create-sibling to point HEAD at the local repo's current branch and (2) clone to try to switch to another branch if they land on an unborn branch. In addition, update and GitRepo.update_submodule now give more informative errors in this situation.

Re: gh-4347, gh-4349, gh-4355

kyleam added 5 commits Apr 1, 2020
If update_submodule() is called with a submodule that has an unborn
branch checked out, the get_hexsha() call fails with a ValueError.  If
we were to catch that, and proceed with subbranch{,_hexsha} as None,
the underlying 'git submodule update' would fail in this situation.
So catch the get_hexsha() value error and raise an exception with a
more appropriate message.

Note that in this case there is probably another ref that could be
checked out, but later commits in this series will take care of that
in higher level code.
If update() is called while on an unborn branch, trying to merge in
changes will fail with a confusing "not something we can merge" error
from git, relayed by git-annex (see dataladgh-4347).  This isn't a situation
a caller is likely to be in knowingly, but it can happen easily if a
dataset was cloned from a repo where HEAD points to an unborn branch.
That, in turn, can happen easily because create_sibling(), when called
with a branch other than master checked out, will leave HEAD pointing
to master, which may not exist and, even it does, may never be
published to the sibling (dataladgh-4349).

Note that this is a maint-specific fix.  On master update() fails with
a reasonable "cannot determine merge target" (*), a change that came
with 2342cb4 (Merge pull request datalad#4167 from kyleam/update-follow,
2020-02-21).

Fixes dataladgh-4347, in the sense that a more understandable error is given.
The situation will be further improved in the rest of this series by
(1) teaching create_sibling() to point the sibling's HEAD to the ref
that is checked out in the local repo and (2) teaching clone() and
friends to try to check out another ref if the one pointed at by HEAD
doesn't exist.

(*) Unsurprisingly, this commit has non-trivial conflicts with master.
The last commit made update() fail with a clearer error message when
called on a branch with no commits.  As mentioned there, a likely
reason for the caller to get into this situation is by cloning a
remote whose HEAD points to a non-existent ref.

Let's try to avoid this scenario by looking for another ref to check
out if we land on an unborn branch.  Select the ref with the most
recent commit, ignoring the git-annex branch.  As mentioned in dataladgh-4349
(*), we could do a smarter check in the case of submodules, but that
would require a more substantial rework that might not be worth the
effort given that, as described in the code comment, the worst case
scenario is a detached HEAD.  (And even if we want to go that
direction eventually, ending up on a detached HEAD is certainly an
improvement over crashing.)

Re: datalad#4349
(*): datalad#4349 (comment)
The upstream code already binds ds.path to refds_path.  Use it.
The .repo property is more expensive than typical attribute lookup.
Assign it to a variable.
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Apr 2, 2020

At least the failures I looked at are due to GitRepo not having a get_corresponding_branch method on maint.

@kyleam kyleam added the WIP label Apr 2, 2020
When a caller creates a sibling with a branch other than master
checked out, the sibling's HEAD ref still points to refs/heads/master.
Depending on what is pushed to the repo, this may stay an unborn
branch.  With a plain 'git clone', there will be nothing to check out,
so the cloned repo will end up an unborn master.

Earlier commits in this series improved datalad's handling of such a
sibling: `datalad update` fails with a more informative error (as it
already did on master), GitRepo.update_submodule() fails with an
informative error, and `datalad clone` and friends try to check out
another ref when the repo lands on an unborn branch.

While create-sibling's behavior here is less problematic after the
other changes in this series, it can still lead to issues when
datasets are consumed with a plain git clone (or with older versions
of datalad).  Mitigate these remaining issues by making create-sibling
set the sibling's HEAD to the current local branch, which probably
matches the expectations of most callers anyway.

Note: The get_corresponding_branch() call is written in an unusual way
for maint; it's done for compatibility with the master branch, where
get_corresponding_branch() now returns None rather than the main
branch when not on an adjusted branch.

Supersedes and closes datalad#4355.
Closes datalad#4349.
@codecov
Copy link

@codecov codecov bot commented Apr 2, 2020

Codecov Report

Merging #4370 into maint will increase coverage by 0.02%.
The diff coverage is 99.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##            maint    #4370      +/-   ##
==========================================
+ Coverage   89.59%   89.62%   +0.02%     
==========================================
  Files         275      275              
  Lines       36933    37037     +104     
==========================================
+ Hits        33091    33195     +104     
  Misses       3842     3842              
Impacted Files Coverage Δ
datalad/support/gitrepo.py 89.77% <83.33%> (+0.04%) ⬆️
datalad/core/distributed/clone.py 93.44% <100.00%> (+0.36%) ⬆️
datalad/core/distributed/tests/test_clone.py 92.55% <100.00%> (+0.62%) ⬆️
datalad/distribution/create_sibling.py 85.49% <100.00%> (+0.32%) ⬆️
datalad/distribution/tests/test_create_sibling.py 98.81% <100.00%> (+0.08%) ⬆️
datalad/distribution/tests/test_update.py 100.00% <100.00%> (ø)
datalad/distribution/update.py 96.93% <100.00%> (+0.09%) ⬆️
datalad/support/tests/test_gitrepo.py 99.78% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6e2cac5...8452426. Read the comment docs.

@kyleam kyleam added do not merge and removed WIP do not merge labels Apr 2, 2020
@yarikoptic yarikoptic added this to the 0.12.6 milestone Apr 3, 2020
Copy link
Member

@yarikoptic yarikoptic left a comment

I have not spotted any oddities. Was not 100% sure about some now explicit "errorring out", but seemed to be legit ;) Given that came with nice tests -- we have better assurance that it would correctly ;)

@kyleam kyleam merged commit 8755469 into datalad:maint Apr 3, 2020
11 checks passed
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Apr 3, 2020

@yarikoptic Thanks for taking a look!

@kyleam kyleam deleted the unborn-ref-fixes branch Apr 3, 2020
kyleam added a commit to kyleam/datalad that referenced this issue Apr 16, 2020
Two of the tests added in 8f5cc45 (ENH: clone: Try to switch away
from an unborn branch, 2020-04-01) fail on the crippled file system
run [0].  The easy part to fix is to write assertions about the
current branch using get_corresponding_branch (written a bit oddly for
compatibility with the changed return value in master).

The more involved part is changing the setup in test_clone_unborn_head
to account for the fact that, while setting up, we may be on an
adjusted branch.  This means we can't use empty commits, because
git-annex diffs to decide whether things need to be propagated back.
And we need to sprinkle in calls to adjust() followed by save() to
sync state.

[0]: Example:
     https://github.com/datalad/datalad/pull/4405/checks?check_run_id=592761391

Re: datalad#4370
kyleam added a commit to kyleam/datalad that referenced this issue Apr 17, 2020
Two of the tests added in 8f5cc45 (ENH: clone: Try to switch away
from an unborn branch, 2020-04-01) fail on the crippled file system
run [0].  The easy part to fix is to write assertions about the
current branch using get_corresponding_branch (written a bit oddly for
compatibility with the changed return value in master).

The more involved part is changing the setup in test_clone_unborn_head
to account for the fact that, while setting up, we may be on an
adjusted branch.  This means we can't use empty commits, because
git-annex diffs to decide whether things need to be propagated back.
And we need to sprinkle in calls to adjust() followed by save() to
sync state.

[0]: Example:
     https://github.com/datalad/datalad/pull/4405/checks?check_run_id=592761391

Re: datalad#4370
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants