-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Record state of a submodule's corresponding branch in the parent #4275
Conversation
lgr.debug( | ||
'Sync corresponding branch %s at %s prior repository ' | ||
'state evaluation', subm_cbranch, f) | ||
subrepo.localsync(managed_only=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be cheaper than this, especially if we just had sync'ed before? Should we make an attempt to count the commits on the adjusted branch, and if there is only one not go for a sync. This would only miss cases, where someone amends commits on the adjusted branch, but that would be problematic behavior anyways, so maybe that is good enough. OTOH, I dunno how clever git-annex is in figuring out that nothing needs to be done....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd vote for sticking with the simpler sync
call for now.
OTOH, I dunno how clever git-annex is in figuring out that nothing needs to be done....
Quickly testing suggests that annex will not sync back the changes if you amend the initial adjusted branch commit and have no other commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, in a world that goes through datalad status
and friends, this seems like it could work. I really have no sense how this will play out in the wild, but, given the current state of adjusted branches and submodules, I'm for any improvement that has a minimum impact on non-adjusted branches.
In addition to my inline comments:
- It'd be nice to see dedicated tests in addition to the modification to
test_subdataset_save
. - I think
.dirty
will need to be updated to go through.status
when on an adjusted branch.
|
||
if subm is None: | ||
# in case it did not happen above | ||
subm = repo_from_path(self.pathobj / path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just replace subm = None
with this (and remove the assignment from under if url is None
)?
# MIH: this looks strange, why would we want to check | ||
# out a branch that matches the name of a branch | ||
# in the parentds, we don't even know if there is | ||
# such a branch... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to ask yourself that: 2fda83e (ENH: Record the active branch of a subdataset in the parent, 2019-10-20). Based on that subject and the fact that self.get_active_branch()
is evaluated for each cand_sm
, I'd guess you were intending to specify the currently checked out branch in cand_sm
, not the parent.
@@ -3975,6 +4037,10 @@ def save_(self, message=None, paths=None, _status=None, **kwargs): | |||
self, | |||
to_stage_submodules, | |||
git_opts=None): | |||
# _save_add doesn't know better, but we know that | |||
# we only passed submodule records | |||
r['type'] = 'dataset' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you're taking the public API exposed via result hooks seriously :]
@@ -4011,6 +4077,19 @@ def save_(self, message=None, paths=None, _status=None, **kwargs): | |||
else tuple())}): | |||
yield r | |||
|
|||
# and lastly fixup subdataset states | |||
for sub in [f for f, props in status.items() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oy, so I think we're looping over all tracked items in the repo again. Can this be guarded by an "on adjusted branch" condition? Also, conceptually this seems a better fit for AnnexRepo._save_post
.
datalad/core/local/tests/test_run.py
Outdated
@@ -252,7 +252,7 @@ def test_run_from_subds_gh3551(path): | |||
assert_repo_status(ds.path) | |||
subds = Dataset(op.join(ds.path, subds_path)) | |||
ok_exists(op.join(subds.path, "f")) | |||
if not on_windows: # FIXME | |||
if ds.repo.is_managed_branch(): # FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tested, but shouldn't the not
stay?
datalad/support/gitrepo.py
Outdated
# have a changed state, we cannot rely on this judgement | ||
# with subdatasets that are potentially in adjusted mode. | ||
# Setting the `state` to None will queue this record for | ||
# further processing below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I find this and related functions hard to grok, so your comments are very much appreciated.
'update-index', '--add', '--replace', '--cacheinfo', | ||
'160000', | ||
subm.get_hexsha(subm_cbranch), | ||
path]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, whoever came up with that invocation is a very clever person :]
b95bff1 (TST: Robustify tests for non-windows crippled FS, 2020-03-10) changed this condition from `not on_windows` to `ds.repo.is_managed_branch()`. This was merged in with gh-4276, but as mentioned in review of gh-4275 [0], it seems very likely that a `not` was unintentionally dropped, because it'd be strange to switch from "not windows" to "ok managed branch, including windows" and because the test now fails on the CrippledFS and Win2019 builds [1]. [0]: #4275 (comment) [1]: https://github.com/datalad/datalad/runs/498868341#step:8:217 https://github.com/datalad/datalad/runs/500836364?check_suite_focus=true#step:8:304
Now there will be the corresponding/real branch checkout out, even if the clone happened from a repository in adjusted mode
For push and pull. Deals with one half of dataladgh-4227
Clone of adjusted dataset is not itself adjusted by default, if filesystem permits this.
Otherwise, that state is liekly to be gone shortly.
Otherwise any subdataset in adjusted mode will have a state recorded that belongs to a branch that is continuously rebased. Also adjust DataLad's own diff-functionality to detect this intentional 'modified' state and report it as 'clean'. This is a major step toward a first serious attempt to support nested dataset in adjusted mode
Codecov Report
@@ Coverage Diff @@
## master #4275 +/- ##
===========================================
- Coverage 88.8% 44.64% -44.16%
===========================================
Files 285 283 -2
Lines 37419 36969 -450
===========================================
- Hits 33230 16505 -16725
- Misses 4189 20464 +16275
Continue to review full report at Codecov.
|
As of v0.12.0, specifically 2fda83e (ENH: Record the active branch of a subdataset in the parent, 2019-10-20, dataladgh-3817), we record the current branch in the _parent_ repository as the value for `submodule.<name>.branch` in .gitmodules when saving a new submodule. There are a few problems with this: * The current branch in the parent is recorded. It seems unlikely that that was the intent because there is no reason to assume a branch with that name exists in the submodule repository or, if it does, to assume that the parent and submodule branches are necessarily coupled. 2fda83e mentions dataladgh-1424 (`Datalad.recall_state() -> `load` command) as the motivation, which makes it seem likely the current branch in the submodule, not the parent, was supposed be recorded. This was mentioned in a currently open PR: <datalad#4275 (comment)> * Discussion of recording the branch at dataladgh-1424 suggests that the idea is to record this information with every save, but 2fda83e records it only when the submodule is initially added. 2fda83e doesn't say explicitly that its intent was to do it with every save (just that the change moves in the direction of dataladgh-1424), but still it doesn't seem a particularly useful incremental step to have a one-shot record of the current branch at the time the submodule is added. * It's not clear that `submodule.<name>.branch` is a good spot to record the branch information required by dataladgh-1424. `submodule.<name>.branch` is about which _remote_ branch is used when `--remote` is passed to `submodule update`. The goal in dataladgh-1424 seems to be tracking what the current _local_ branch was at the time of a save. Given these different purposes, it seems like it'd be a good idea to track this information in a different way (perhaps an entry in .gitmodules with a different key). Given these issues, let's revert the changes from 2fda83e, along with the changes from the follow-up commit ee50107 (BF: Do not record adjust branch in submodule config, 2019-10-21). Fixes datalad#4373.
As of v0.12.0, specifically 2fda83e (ENH: Record the active branch of a subdataset in the parent, 2019-10-20, dataladgh-3817), we record the current branch in the _parent_ repository as the value for `submodule.<name>.branch` in .gitmodules when saving a new submodule. There are a few problems with this: * The current branch in the parent is recorded. It seems unlikely that that was the intent because there is no reason to assume a branch with that name exists in the submodule repository or, if it does, to assume that the parent and submodule branches are necessarily coupled. 2fda83e mentions dataladgh-1424 (`Datalad.recall_state() -> `load` command) as the motivation, which makes it seem likely the current branch in the submodule, not the parent, was supposed be recorded. This was mentioned in a currently open PR: <datalad#4275 (comment)> * Discussion of recording the branch at dataladgh-1424 suggests that the idea is to record this information with every save, but 2fda83e records it only when the submodule is initially added. 2fda83e doesn't say explicitly that its intent was to do it with every save (just that the change moves in the direction of dataladgh-1424), but still it doesn't seem a particularly useful incremental step to have a one-shot record of the current branch at the time the submodule is added. * It's not clear that `submodule.<name>.branch` is a good spot to record the branch information required by dataladgh-1424. `submodule.<name>.branch` is about which _remote_ branch is used when `--remote` is passed to `submodule update`. The goal in dataladgh-1424 seems to be tracking what the current _local_ branch was at the time of a save. Given these different purposes, it seems like it'd be a good idea to track this information in a different way (perhaps an entry in .gitmodules with a different key). Given these issues, let's revert the changes from 2fda83e, along with the changes from the follow-up commit ee50107 (BF: Do not record adjust branch in submodule config, 2019-10-21). Fixes datalad#4373.
It seems the premise of this PR has changed. I dont have the resources to work on this at the moment. I will close it and reassess the state of things when I get to it next. |
Otherwise any subdataset in adjusted mode will have a state recorded
that belongs to a branch that is continuously rebased.
Also adjust DataLad's own diff-functionality to detect this intentional
'modified' state and report it as 'clean'.
This is a major step toward a first serious attempt to support nested
dataset in adjusted mode
Sitting on top of #4273 #4274 #4252 (which fixes #4227)
Conceptually this is done. A number of additional core tests run on crippled FS. Please let me know what you think.
TODO:
Fixes #3818 and fixes #3969