Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust special remote handling for sameas remotes #3856

merged 7 commits into from Dec 9, 2019


Copy link

@kyleam kyleam commented Nov 6, 2019

This series fixes gh-3816 (cloning of annex repos with special remotes that were set up with --sameas).

Some notes:

  • When merging to master, there will be a merge conflict because _handle_possible_annex_dataset was moved to in 8f9ae55 (RF: Move clone-internal helper to, 2019-10-14).

  • This handles the clone case, but other places in DataLad (in particular publish) may need an update for git-annex's sameas functionality.

  • If the user is running a git-annex version that doesn't support sameas special remotes, we're informing them about a remote that they can't actually enable. If others feel this is worth handling, I can rework the code to check the annex version.

Copy link

@codecov codecov bot commented Nov 6, 2019

Codecov Report

Merging #3856 into 0.11.x will decrease coverage by 4.08%.
The diff coverage is 58.06%.

Impacted file tree graph

@@            Coverage Diff             @@
##           0.11.x    #3856      +/-   ##
- Coverage   81.17%   77.08%   -4.09%     
  Files         256      256              
  Lines       33976    34051      +75     
- Hits        27579    26249    -1330     
- Misses       6397     7802    +1405
Impacted Files Coverage Δ
datalad/support/ 50.72% <0%> (-6.87%) ⬇️
datalad/distribution/ 60% <0%> (-25.57%) ⬇️
datalad/distribution/tests/ 94.87% <100%> (-4.67%) ⬇️
datalad/support/tests/ 94.01% <28.57%> (-0.32%) ⬇️
datalad/distribution/tests/ 90.59% <38.88%> (-9.41%) ⬇️
datalad/tests/ 89.29% <92.3%> (+0.1%) ⬆️
datalad/support/ 35.55% <0%> (-48.89%) ⬇️
datalad/distribution/ 36.06% <0%> (-47.55%) ⬇️
datalad/ui/ 50% <0%> (-45.46%) ⬇️
datalad/interface/ 34.56% <0%> (-43.21%) ⬇️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bcf62fb...d1cc383. Read the comment docs.

@kyleam kyleam added the do not merge label Nov 6, 2019
Copy link
Contributor Author

@kyleam kyleam commented Nov 6, 2019

The tests for the git-annex devel run (7.20191017+git2-g7b13db551-1~ndall+1) pass. However the 3rdparty_analysis_workflow run in that build is failing. That same failure occurred in the git annex-devel run of the cron build. I'll open an issue for that.

So aside from the unrelated 3rdparty_analysis_workflow failure, the devel build looks fine. The new test also passes for me locally with the latest git-annex. I'll drop the tip commit.

@kyleam kyleam removed the do not merge label Nov 6, 2019
Copy link

@bpoldrack bpoldrack left a comment

Seems fine.

I do wonder though whether we could come up with a solution within get_special_remotes, that would allow to not fix it on a per-command basis. If that function smartly "resolves" sameas-remotes we might end up with a cleaner way to deal with non-standard setups. Needs some digging, though and I'm fine with merging for now.

Copy link
Contributor Author

@kyleam kyleam commented Nov 15, 2019

@bpoldrack, thanks for taking a look.

I do wonder though whether we could come up with a solution within get_special_remotes, that would allow to not fix it on a per-command basis. If that function smartly "resolves" sameas-remotes we might end up with a cleaner way to deal with non-standard setups.

Right now core has one other get_special_remotes caller aside from the spot this PR touches, and a quick glance suggests it'd probably raise a KeyError if it encountered as sameas remote. So if we'd want to avoid updating these two call sites, we could teach get_special_remotes to re-key "sameas-name" values under "name". For the purpose of avoiding two .get('name') or .get('sameas-name') occurrences, I'm not sure it's worth the potential confusion of deviating from a faithful "git-annex:remote.log -> dictionary" conversion.

If we were to do the "sameas-name -> name" re-keying, that still doesn't cover the .git/config handling in _handle_possible_annex_dataset (the second hunk of the diff). get_special_remotes is currently only concerned with git-annex:remote.log, so making it responsible for also inspecting .git/config to somehow mark up its return value doesn't seem like the way to go.

Needs some digging, though and I'm fine with merging for now.

Yeah, looking at this series again, my thought was "okay, this change fixes the particular issue, and it's hard to imagine that it will make any additional sameas fixes harder, but why not take some more time to do a more thorough check of the other spots sameas could cause trouble?". So, I'll plan to do that. Perhaps in doing so I'll see a good way for callers to not have to be aware of sameas details.

kyleam added 7 commits Nov 15, 2019
It's confusing to say the value is a dictionary with arguments for
'git annex enableremote' because

  * this is just a conversion of git-annex:remote.log's key-value
    pairs. In all cases it includes a timestamp that is not directly
    connected to the {init,enable}remote parameters

  * the values stored are mostly what git-annex stores from the
    initremote call, but they aren't needed for the enableremote call.
    The enableremote might need an additional parameter, such as
    `directory=` for the directory special remote, but in this case a
    value is _not_ stored in remote.log
After cloning an annex repo, we go through the special remotes and
inform users about ones that are not autoenabled.  When accessing the
dict with information from git-annex:remote.log, we allow for the
special remote name to be missing.  But the downstream code can't
actually handle the None value that is used in this case, leading to
the failure seen in dataladgh-3816.

There is nothing useful we can do without the remote name, so just
warn and move on.  As far as I'm aware, a special remote in remote.log
must have a name, with the exception of the recently added support for
"sameas" remotes.  The next commit will handle "sameas" remotes.
The next commits will test our handling of git-annex's new sameas
functionality across a few different test modules.  Add a decorator to
avoid repeating the same setup in each test.
git-annex 7.20191017 added support for having special remotes that
point to the same data store [0].  In this case, the data contained in
git-annex:remote.log as returned by get_special_remotes() looks
something like this:

    {'64e9...35e3': {'sameas-name': 'sa',
                     'sameas-uuid': 'cac3...0161',
     'cac3...0161': {'name': 'main',

The sameas entry has a "sameas-name" key rather than a "name" key, and
it has a "sameas-uuid" value that maps it to another remote.

Put the "sameas-name" under "name" so callers don't have to check both
places.  Depending on what the caller is doing, they may be able to
treat a remote the same regardless of whether it is a sameas remote.


Re: datalad#3856 (comment)
siblings() assumes each record returned by get_special_remotes() has a
"name" key, so it would have failed with the sameas remote before the
last commit.
The last commit addressed the failure reported in dataladgh-3816 that
happened when cloning a dataset with a sameas special remote.  However
the check is still incorrect for sameas remotes because, at the level
of .git/config, the section for a sameas remote has the sameas-uuid
value for annex-uuid, and its configuration UUID (the first field in
git-annex:remote.log) is under annex-config-uuid.  In other words,
different remote sections may share the same annex-uuid value.

Give the annex-config-uuid value precedence over annex-uuid when we
check for a remote in .git/config.  This is necessary because
otherwise we would consider a special remote enabled if _any_ of the
remotes that have the UUID are enabled.

With this change, it's likely that handling of sameas remotes across
the codebase is sufficient.  There are only two callers of
get_special_remotes(), and they both have dedicated tests for sameas
remotes.  That leaves callers of get_remotes() and
is_special_annex_remote().  Those cases rely on the remote's
.git/config section and should work transparently with sameas remotes.
There may be issues with other spots that explicitly retrieve
remote.NAME.annex-uuid, but quickly grepping through these instances,
it seems either correct or adequate to use annex-uuid rather than
annex-config-uuid for sameas remotes.

Closes datalad#3816.
@kyleam kyleam changed the title clone: Handle sameas special remotes in autoenable check Adjust special remote handling for sameas remotes Nov 15, 2019
Copy link
Contributor Author

@kyleam kyleam commented Nov 15, 2019

I've updated this series, having a closer look at other spots that could have an issue. There's now an additional fix for siblings enable, along with a test. I ended up doing the "sameas-name -> name" adjustment I mentioned in my previous post so that callers of get_special_remotes do not need to explicitly consider sameas remotes. As mentioned in d1cc383, I think that in terms of sameas handling we should be in a pretty good place now.

Copy link
Contributor Author

@kyleam kyleam commented Nov 16, 2019

The tests passed in the devel-annex run passed. The failure in that build is due to the known https/git-annex-standalone issue. I'll drop the temporary tip commit.

Copy link

@mih mih commented Dec 9, 2019

I approve this too, but I don't want to meddle with the management of 0.11, so I will not click the button. Feel free though.

@kyleam kyleam merged commit d1cc383 into datalad:0.11.x Dec 9, 2019
2 of 5 checks passed
Copy link
Contributor Author

@kyleam kyleam commented Dec 9, 2019

Thanks for taking a look. I've merge to 0.11.x now and will merge to master, dealing with the conflict, later today.

@kyleam kyleam deleted the clone-annex-sameas branch Dec 9, 2019
kyleam added a commit that referenced this issue Dec 10, 2019
kyleam added a commit that referenced this issue Dec 10, 2019
d3c2550 (Merge pull request #3856 from kyleam/clone-annex-sameas,
2019-12-09) updated _handle_possible_annex_dataset() in
distributions/  On master's side this helper was moved to in 8f9ae55.  Move the changes from d3c2550 there.
yarikoptic added a commit to yarikoptic/datalad that referenced this issue Jan 4, 2020

* origin/0.11.x:
  TST: create-sibling: Loosen check's assumption about Git's behavior
  TST: create-sibling: Remove stale comment
  TST: annexrepo: Ignore racy "is clean?" failure
  patched unicode issue for python2 Second batch for 0.11.9
  BF: do not define pushurl for subdataset if there is no pushurl in parent
  BF: siblings: Check for remote before inheriting annex settings
  TST: siblings: Add basic test for inherit=True
  BF: lgr - use .setLevel() instead of .level =
  CLN: Drop unused imports from d1cc383 Add entry for dataladgh-3856
  BF: auto - if empty list is returned by get, do not try to treat it as a dict record
  ENH: clone: Handle sameas special remotes in autoenable check
  TST: siblings: Add check for enabling a sameas remote
  ENH: annrepo.get_special_remotes: Copy "sameas-name" to "name"
  TST: utils: Add decorator that provides dataset with sameas remote
  BF: clone: Skip special remotes with no name in autoenable check
  DOC: annexrepo: Clarify get_special_remotes() docstring
  DOC: utils: Grammar fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

3 participants