Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Record the active branch of a subdataset in the parent #3817

Merged
merged 14 commits into from Dec 19, 2019

Conversation

mih
Copy link
Member

@mih mih commented Oct 20, 2019

This brings us closer to gh-1424

@mih
Copy link
Member Author

@mih mih commented Oct 21, 2019

Windows test failure is somewhat related to #3818

@codecov
Copy link

@codecov codecov bot commented Oct 21, 2019

Codecov Report

Merging #3817 into master will increase coverage by 18.81%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #3817       +/-   ##
==========================================
+ Coverage   70.88%   89.7%   +18.81%     
==========================================
  Files         274     271        -3     
  Lines       36305   36349       +44     
==========================================
+ Hits        25736   32607     +6871     
+ Misses      10569    3742     -6827
Impacted Files Coverage Δ
datalad/distribution/install.py 98.87% <ø> (+65.16%) ⬆️
datalad/interface/tests/test_ls_webui.py 100% <100%> (ø) ⬆️
datalad/support/gitrepo.py 89.46% <100%> (+38.75%) ⬆️
datalad/core/local/tests/test_save.py 99.19% <100%> (+1.35%) ⬆️
datalad/interface/tests/test_rerun.py 100% <100%> (+0.4%) ⬆️
datalad/consts.py 100% <100%> (+3.44%) ⬆️
datalad/metadata/tests/test_aggregation.py 99.05% <100%> (ø) ⬆️
datalad/support/annexrepo.py 86.13% <100%> (+48.41%) ⬆️
datalad/core/local/tests/test_diff.py 99.45% <100%> (+0.04%) ⬆️
datalad/core/local/tests/test_run.py 99.55% <100%> (ø) ⬆️
... and 108 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9aa12b6...ec8ee22. Read the comment docs.

@mih
Copy link
Member Author

@mih mih commented Nov 29, 2019

Rebased on master and gepushed.

@mih
Copy link
Member Author

@mih mih commented Dec 15, 2019

Remaining test failures are due to the adjusted branch carrying an additional commit after an annex sync despite --no-commit. This is a problem for any test that verifies the nature of "the last commit". In particular diff and run tests are affected.

Here is what a dataset like this feels like:

image

Unsure about the best way to proceed. Given that the branch info would be useful to have, we could

  • adjust all tests to make them aware of this situation (preferred)
  • not run these tests, with adjusted branches

Affected tests:

(possibly more, but these are all the github 2019 tests identify)

@kyleam
Copy link
Contributor

@kyleam kyleam commented Dec 16, 2019

Given that the branch info would be useful to have, we could

  • adjust all tests to make them aware of this situation (preferred)
  • not run these tests, with adjusted branches

I agree that the latter former is the way to go. I'll look at adjusting at least some of those tests today.

As of a few commits back, saving an annex repo that is on an adjusted
branch calls `git annex sync` to propagate changes back to the main
line.  This causes the latest commit to be the sync commit and
confuses some checks the expect the history to be in a certain state.

The easiest confusions to address are (1) "I expect the last commit to
be this" and (2) "diffing these last commits gives ...".  Make these
work on the adjusted branch by using "master" (the main line branch)
instead of "HEAD".

Note that there are still remaining failures in listed at dataladgh-3817.
These are more complicated because they involve subdatasets (see
dataladgh-3818) or because the actual functionality of the commands (in
particular aggregate_metadata and rerun) was designed to work on a
non-adjusted branch.  Beyond that, only a subset of tests are executed
on Windows, those builds are the only ones where we test with adjusted
branches, so there are likely to be many more issues.

Re: datalad#3817 (comment)
@kyleam
Copy link
Contributor

@kyleam kyleam commented Dec 17, 2019

I've pushed a fix for the easier cases. (I'll update the check list once I verify that they pass.) I think the remaining ones start to touch on deeper issues related to our code base not being designed to work with adjusted branches. Moreover, I view the explicit sync call in save as just exposing existing issues with our handling, rather than creating new issues. It probably makes sense to mark these as known failures and deal with them when we try to approach these larger issues (e.g., gh-3818).

@mih
Copy link
Member Author

@mih mih commented Dec 17, 2019

Thanks @kyleam for your assessment! I'll try implementing this missing pieces.

@@ -84,7 +84,7 @@ def test_repo_diff(path, norepo):
create_tree(ds.path, {'new': 'empty'})
ds.save(to_git=True)
assert_repo_status(ds.path)
eq_(ds.repo.diff(fr=fr_base + '~1', to=to),
eq_(ds.repo.diff(fr=fr_base + '~1', to=fr_base),
Copy link
Contributor

@kyleam kyleam Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. Thanks for fixing my mistake.

@mih
Copy link
Member Author

@mih mih commented Dec 18, 2019

I am hopeful that this could be it!

@mih
Copy link
Member Author

@mih mih commented Dec 19, 2019

THIS IS IT!!

@mih mih merged commit d1e33bc into datalad:master Dec 19, 2019
17 checks passed
@mih mih deleted the enh-subdataset branch Dec 19, 2019
kyleam added a commit to kyleam/datalad that referenced this issue Apr 2, 2020
As of v0.12.0, specifically 2fda83e (ENH: Record the active branch
of a subdataset in the parent, 2019-10-20, dataladgh-3817), we record the
current branch in the _parent_ repository as the value for
`submodule.<name>.branch` in .gitmodules when saving a new submodule.

There are a few problems with this:

  * The current branch in the parent is recorded.  It seems unlikely
    that that was the intent because there is no reason to assume a
    branch with that name exists in the submodule repository or, if it
    does, to assume that the parent and submodule branches are
    necessarily coupled.  2fda83e mentions dataladgh-1424
    (`Datalad.recall_state() -> `load` command) as the motivation,
    which makes it seem likely the current branch in the submodule,
    not the parent, was supposed be recorded.

    This was mentioned in a currently open PR:
    <datalad#4275 (comment)>

  * Discussion of recording the branch at dataladgh-1424 suggests that the
    idea is to record this information with every save, but 2fda83e
    records it only when the submodule is initially added.  2fda83e
    doesn't say explicitly that its intent was to do it with every
    save (just that the change moves in the direction of dataladgh-1424), but
    still it doesn't seem a particularly useful incremental step to
    have a one-shot record of the current branch at the time the
    submodule is added.

  * It's not clear that `submodule.<name>.branch` is a good spot to
    record the branch information required by dataladgh-1424.
    `submodule.<name>.branch` is about which _remote_ branch is used
    when `--remote` is passed to `submodule update`.  The goal in
    dataladgh-1424 seems to be tracking what the current _local_ branch was
    at the time of a save.  Given these different purposes, it seems
    like it'd be a good idea to track this information in a different
    way (perhaps an entry in .gitmodules with a different key).

Given these issues, let's revert the changes from 2fda83e, along
with the changes from the follow-up commit ee50107 (BF: Do not
record adjust branch in submodule config, 2019-10-21).

Fixes datalad#4373.
kyleam added a commit to kyleam/datalad that referenced this issue Apr 2, 2020
As of v0.12.0, specifically 2fda83e (ENH: Record the active branch
of a subdataset in the parent, 2019-10-20, dataladgh-3817), we record the
current branch in the _parent_ repository as the value for
`submodule.<name>.branch` in .gitmodules when saving a new submodule.

There are a few problems with this:

  * The current branch in the parent is recorded.  It seems unlikely
    that that was the intent because there is no reason to assume a
    branch with that name exists in the submodule repository or, if it
    does, to assume that the parent and submodule branches are
    necessarily coupled.  2fda83e mentions dataladgh-1424
    (`Datalad.recall_state() -> `load` command) as the motivation,
    which makes it seem likely the current branch in the submodule,
    not the parent, was supposed be recorded.

    This was mentioned in a currently open PR:
    <datalad#4275 (comment)>

  * Discussion of recording the branch at dataladgh-1424 suggests that the
    idea is to record this information with every save, but 2fda83e
    records it only when the submodule is initially added.  2fda83e
    doesn't say explicitly that its intent was to do it with every
    save (just that the change moves in the direction of dataladgh-1424), but
    still it doesn't seem a particularly useful incremental step to
    have a one-shot record of the current branch at the time the
    submodule is added.

  * It's not clear that `submodule.<name>.branch` is a good spot to
    record the branch information required by dataladgh-1424.
    `submodule.<name>.branch` is about which _remote_ branch is used
    when `--remote` is passed to `submodule update`.  The goal in
    dataladgh-1424 seems to be tracking what the current _local_ branch was
    at the time of a save.  Given these different purposes, it seems
    like it'd be a good idea to track this information in a different
    way (perhaps an entry in .gitmodules with a different key).

Given these issues, let's revert the changes from 2fda83e, along
with the changes from the follow-up commit ee50107 (BF: Do not
record adjust branch in submodule config, 2019-10-21).

Fixes datalad#4373.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants