New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update/clone/create-sibling: Fixes related to a commit-less HEAD #4370
Conversation
If update_submodule() is called with a submodule that has an unborn branch checked out, the get_hexsha() call fails with a ValueError. If we were to catch that, and proceed with subbranch{,_hexsha} as None, the underlying 'git submodule update' would fail in this situation. So catch the get_hexsha() value error and raise an exception with a more appropriate message. Note that in this case there is probably another ref that could be checked out, but later commits in this series will take care of that in higher level code.
If update() is called while on an unborn branch, trying to merge in changes will fail with a confusing "not something we can merge" error from git, relayed by git-annex (see dataladgh-4347). This isn't a situation a caller is likely to be in knowingly, but it can happen easily if a dataset was cloned from a repo where HEAD points to an unborn branch. That, in turn, can happen easily because create_sibling(), when called with a branch other than master checked out, will leave HEAD pointing to master, which may not exist and, even it does, may never be published to the sibling (dataladgh-4349). Note that this is a maint-specific fix. On master update() fails with a reasonable "cannot determine merge target" (*), a change that came with 2342cb4 (Merge pull request datalad#4167 from kyleam/update-follow, 2020-02-21). Fixes dataladgh-4347, in the sense that a more understandable error is given. The situation will be further improved in the rest of this series by (1) teaching create_sibling() to point the sibling's HEAD to the ref that is checked out in the local repo and (2) teaching clone() and friends to try to check out another ref if the one pointed at by HEAD doesn't exist. (*) Unsurprisingly, this commit has non-trivial conflicts with master.
The last commit made update() fail with a clearer error message when called on a branch with no commits. As mentioned there, a likely reason for the caller to get into this situation is by cloning a remote whose HEAD points to a non-existent ref. Let's try to avoid this scenario by looking for another ref to check out if we land on an unborn branch. Select the ref with the most recent commit, ignoring the git-annex branch. As mentioned in dataladgh-4349 (*), we could do a smarter check in the case of submodules, but that would require a more substantial rework that might not be worth the effort given that, as described in the code comment, the worst case scenario is a detached HEAD. (And even if we want to go that direction eventually, ending up on a detached HEAD is certainly an improvement over crashing.) Re: datalad#4349 (*): datalad#4349 (comment)
The upstream code already binds ds.path to refds_path. Use it.
The .repo property is more expensive than typical attribute lookup. Assign it to a variable.
At least the failures I looked at are due to |
When a caller creates a sibling with a branch other than master checked out, the sibling's HEAD ref still points to refs/heads/master. Depending on what is pushed to the repo, this may stay an unborn branch. With a plain 'git clone', there will be nothing to check out, so the cloned repo will end up an unborn master. Earlier commits in this series improved datalad's handling of such a sibling: `datalad update` fails with a more informative error (as it already did on master), GitRepo.update_submodule() fails with an informative error, and `datalad clone` and friends try to check out another ref when the repo lands on an unborn branch. While create-sibling's behavior here is less problematic after the other changes in this series, it can still lead to issues when datasets are consumed with a plain git clone (or with older versions of datalad). Mitigate these remaining issues by making create-sibling set the sibling's HEAD to the current local branch, which probably matches the expectations of most callers anyway. Note: The get_corresponding_branch() call is written in an unusual way for maint; it's done for compatibility with the master branch, where get_corresponding_branch() now returns None rather than the main branch when not on an adjusted branch. Supersedes and closes datalad#4355. Closes datalad#4349.
Codecov Report
@@ Coverage Diff @@
## maint #4370 +/- ##
==========================================
+ Coverage 89.59% 89.62% +0.02%
==========================================
Files 275 275
Lines 36933 37037 +104
==========================================
+ Hits 33091 33195 +104
Misses 3842 3842
Continue to review full report at Codecov.
|
I have not spotted any oddities. Was not 100% sure about some now explicit "errorring out", but seemed to be legit ;) Given that came with nice tests -- we have better assurance that it would correctly ;)
@yarikoptic Thanks for taking a look! |
Two of the tests added in 8f5cc45 (ENH: clone: Try to switch away from an unborn branch, 2020-04-01) fail on the crippled file system run [0]. The easy part to fix is to write assertions about the current branch using get_corresponding_branch (written a bit oddly for compatibility with the changed return value in master). The more involved part is changing the setup in test_clone_unborn_head to account for the fact that, while setting up, we may be on an adjusted branch. This means we can't use empty commits, because git-annex diffs to decide whether things need to be propagated back. And we need to sprinkle in calls to adjust() followed by save() to sync state. [0]: Example: https://github.com/datalad/datalad/pull/4405/checks?check_run_id=592761391 Re: datalad#4370
Two of the tests added in 8f5cc45 (ENH: clone: Try to switch away from an unborn branch, 2020-04-01) fail on the crippled file system run [0]. The easy part to fix is to write assertions about the current branch using get_corresponding_branch (written a bit oddly for compatibility with the changed return value in master). The more involved part is changing the setup in test_clone_unborn_head to account for the fact that, while setting up, we may be on an adjusted branch. This means we can't use empty commits, because git-annex diffs to decide whether things need to be propagated back. And we need to sprinkle in calls to adjust() followed by save() to sync state. [0]: Example: https://github.com/datalad/datalad/pull/4405/checks?check_run_id=592761391 Re: datalad#4370
The core issue fixed by this series is our handling of a cloned dataset where HEAD points to an unborn branch. It seems the most likely way to get into this situation is to create a sibling from a repo where the current branch isn't master and where master won't be published to the sibling.
This series prevents this situation from happening by teaching (1)
create-sibling
to point HEAD at the local repo's current branch and (2)clone
to try to switch to another branch if they land on an unborn branch. In addition,update
andGitRepo.update_submodule
now give more informative errors in this situation.Re: gh-4347, gh-4349, gh-4355