New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Record the active branch of a subdataset in the parent #3817
Conversation
Windows test failure is somewhat related to #3818 |
Codecov Report
@@ Coverage Diff @@
## master #3817 +/- ##
==========================================
+ Coverage 70.88% 89.7% +18.81%
==========================================
Files 274 271 -3
Lines 36305 36349 +44
==========================================
+ Hits 25736 32607 +6871
+ Misses 10569 3742 -6827
Continue to review full report at Codecov.
|
Rebased on master and gepushed. |
This brings us closer to dataladgh-1424
Remaining test failures are due to the adjusted branch carrying an additional commit after an Here is what a dataset like this feels like: Unsure about the best way to proceed. Given that the branch info would be useful to have, we could
Affected tests:
(possibly more, but these are all the github 2019 tests identify) |
I agree that the |
As of a few commits back, saving an annex repo that is on an adjusted branch calls `git annex sync` to propagate changes back to the main line. This causes the latest commit to be the sync commit and confuses some checks the expect the history to be in a certain state. The easiest confusions to address are (1) "I expect the last commit to be this" and (2) "diffing these last commits gives ...". Make these work on the adjusted branch by using "master" (the main line branch) instead of "HEAD". Note that there are still remaining failures in listed at dataladgh-3817. These are more complicated because they involve subdatasets (see dataladgh-3818) or because the actual functionality of the commands (in particular aggregate_metadata and rerun) was designed to work on a non-adjusted branch. Beyond that, only a subset of tests are executed on Windows, those builds are the only ones where we test with adjusted branches, so there are likely to be many more issues. Re: datalad#3817 (comment)
I've pushed a fix for the easier cases. (I'll update the check list once I verify that they pass.) I think the remaining ones start to touch on deeper issues related to our code base not being designed to work with adjusted branches. Moreover, I view the explicit sync call in |
Thanks @kyleam for your assessment! I'll try implementing this missing pieces. |
@@ -84,7 +84,7 @@ def test_repo_diff(path, norepo): | |||
create_tree(ds.path, {'new': 'empty'}) | |||
ds.save(to_git=True) | |||
assert_repo_status(ds.path) | |||
eq_(ds.repo.diff(fr=fr_base + '~1', to=to), | |||
eq_(ds.repo.diff(fr=fr_base + '~1', to=fr_base), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that. Thanks for fixing my mistake.
I am hopeful that this could be it! |
THIS IS IT!! |
As of v0.12.0, specifically 2fda83e (ENH: Record the active branch of a subdataset in the parent, 2019-10-20, dataladgh-3817), we record the current branch in the _parent_ repository as the value for `submodule.<name>.branch` in .gitmodules when saving a new submodule. There are a few problems with this: * The current branch in the parent is recorded. It seems unlikely that that was the intent because there is no reason to assume a branch with that name exists in the submodule repository or, if it does, to assume that the parent and submodule branches are necessarily coupled. 2fda83e mentions dataladgh-1424 (`Datalad.recall_state() -> `load` command) as the motivation, which makes it seem likely the current branch in the submodule, not the parent, was supposed be recorded. This was mentioned in a currently open PR: <datalad#4275 (comment)> * Discussion of recording the branch at dataladgh-1424 suggests that the idea is to record this information with every save, but 2fda83e records it only when the submodule is initially added. 2fda83e doesn't say explicitly that its intent was to do it with every save (just that the change moves in the direction of dataladgh-1424), but still it doesn't seem a particularly useful incremental step to have a one-shot record of the current branch at the time the submodule is added. * It's not clear that `submodule.<name>.branch` is a good spot to record the branch information required by dataladgh-1424. `submodule.<name>.branch` is about which _remote_ branch is used when `--remote` is passed to `submodule update`. The goal in dataladgh-1424 seems to be tracking what the current _local_ branch was at the time of a save. Given these different purposes, it seems like it'd be a good idea to track this information in a different way (perhaps an entry in .gitmodules with a different key). Given these issues, let's revert the changes from 2fda83e, along with the changes from the follow-up commit ee50107 (BF: Do not record adjust branch in submodule config, 2019-10-21). Fixes datalad#4373.
As of v0.12.0, specifically 2fda83e (ENH: Record the active branch of a subdataset in the parent, 2019-10-20, dataladgh-3817), we record the current branch in the _parent_ repository as the value for `submodule.<name>.branch` in .gitmodules when saving a new submodule. There are a few problems with this: * The current branch in the parent is recorded. It seems unlikely that that was the intent because there is no reason to assume a branch with that name exists in the submodule repository or, if it does, to assume that the parent and submodule branches are necessarily coupled. 2fda83e mentions dataladgh-1424 (`Datalad.recall_state() -> `load` command) as the motivation, which makes it seem likely the current branch in the submodule, not the parent, was supposed be recorded. This was mentioned in a currently open PR: <datalad#4275 (comment)> * Discussion of recording the branch at dataladgh-1424 suggests that the idea is to record this information with every save, but 2fda83e records it only when the submodule is initially added. 2fda83e doesn't say explicitly that its intent was to do it with every save (just that the change moves in the direction of dataladgh-1424), but still it doesn't seem a particularly useful incremental step to have a one-shot record of the current branch at the time the submodule is added. * It's not clear that `submodule.<name>.branch` is a good spot to record the branch information required by dataladgh-1424. `submodule.<name>.branch` is about which _remote_ branch is used when `--remote` is passed to `submodule update`. The goal in dataladgh-1424 seems to be tracking what the current _local_ branch was at the time of a save. Given these different purposes, it seems like it'd be a good idea to track this information in a different way (perhaps an entry in .gitmodules with a different key). Given these issues, let's revert the changes from 2fda83e, along with the changes from the follow-up commit ee50107 (BF: Do not record adjust branch in submodule config, 2019-10-21). Fixes datalad#4373.
This brings us closer to gh-1424
git annex sync
blows up on windows (was #3824 and is fixed)