New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust special remote handling for sameas remotes #3856
Conversation
Codecov Report
@@ Coverage Diff @@
## 0.11.x #3856 +/- ##
==========================================
- Coverage 81.17% 77.08% -4.09%
==========================================
Files 256 256
Lines 33976 34051 +75
==========================================
- Hits 27579 26249 -1330
- Misses 6397 7802 +1405
Continue to review full report at Codecov.
|
The tests for the git-annex devel run (7.20191017+git2-g7b13db551-1~ndall+1) pass. However the 3rdparty_analysis_workflow run in that build is failing. That same failure occurred in the git annex-devel run of the cron build. I'll open an issue for that. So aside from the unrelated 3rdparty_analysis_workflow failure, the devel build looks fine. The new test also passes for me locally with the latest git-annex. I'll drop the tip commit. |
Seems fine.
I do wonder though whether we could come up with a solution within get_special_remotes
, that would allow to not fix it on a per-command basis. If that function smartly "resolves" sameas-remotes we might end up with a cleaner way to deal with non-standard setups. Needs some digging, though and I'm fine with merging for now.
@bpoldrack, thanks for taking a look.
Right now core has one other If we were to do the "sameas-name -> name" re-keying, that still doesn't cover the .git/config handling in
Yeah, looking at this series again, my thought was "okay, this change fixes the particular issue, and it's hard to imagine that it will make any additional sameas fixes harder, but why not take some more time to do a more thorough check of the other spots sameas could cause trouble?". So, I'll plan to do that. Perhaps in doing so I'll see a good way for callers to not have to be aware of sameas details. |
It's confusing to say the value is a dictionary with arguments for 'git annex enableremote' because * this is just a conversion of git-annex:remote.log's key-value pairs. In all cases it includes a timestamp that is not directly connected to the {init,enable}remote parameters * the values stored are mostly what git-annex stores from the initremote call, but they aren't needed for the enableremote call. The enableremote might need an additional parameter, such as `directory=` for the directory special remote, but in this case a value is _not_ stored in remote.log
After cloning an annex repo, we go through the special remotes and inform users about ones that are not autoenabled. When accessing the dict with information from git-annex:remote.log, we allow for the special remote name to be missing. But the downstream code can't actually handle the None value that is used in this case, leading to the failure seen in dataladgh-3816. There is nothing useful we can do without the remote name, so just warn and move on. As far as I'm aware, a special remote in remote.log must have a name, with the exception of the recently added support for "sameas" remotes. The next commit will handle "sameas" remotes.
The next commits will test our handling of git-annex's new sameas functionality across a few different test modules. Add a decorator to avoid repeating the same setup in each test.
git-annex 7.20191017 added support for having special remotes that point to the same data store [0]. In this case, the data contained in git-annex:remote.log as returned by get_special_remotes() looks something like this: {'64e9...35e3': {'sameas-name': 'sa', 'sameas-uuid': 'cac3...0161', ...}, 'cac3...0161': {'name': 'main', ...}} The sameas entry has a "sameas-name" key rather than a "name" key, and it has a "sameas-uuid" value that maps it to another remote. Put the "sameas-name" under "name" so callers don't have to check both places. Depending on what the caller is doing, they may be able to treat a remote the same regardless of whether it is a sameas remote. [0]: https://git-annex.branchable.com/tips/multiple_remotes_accessing_the_same_data_store/ Re: datalad#3856 (comment)
siblings() assumes each record returned by get_special_remotes() has a "name" key, so it would have failed with the sameas remote before the last commit.
The last commit addressed the failure reported in dataladgh-3816 that happened when cloning a dataset with a sameas special remote. However the check is still incorrect for sameas remotes because, at the level of .git/config, the section for a sameas remote has the sameas-uuid value for annex-uuid, and its configuration UUID (the first field in git-annex:remote.log) is under annex-config-uuid. In other words, different remote sections may share the same annex-uuid value. Give the annex-config-uuid value precedence over annex-uuid when we check for a remote in .git/config. This is necessary because otherwise we would consider a special remote enabled if _any_ of the remotes that have the UUID are enabled. With this change, it's likely that handling of sameas remotes across the codebase is sufficient. There are only two callers of get_special_remotes(), and they both have dedicated tests for sameas remotes. That leaves callers of get_remotes() and is_special_annex_remote(). Those cases rely on the remote's .git/config section and should work transparently with sameas remotes. There may be issues with other spots that explicitly retrieve remote.NAME.annex-uuid, but quickly grepping through these instances, it seems either correct or adequate to use annex-uuid rather than annex-config-uuid for sameas remotes. Closes datalad#3816.
I've updated this series, having a closer look at other spots that could have an issue. There's now an additional fix for |
The tests passed in the devel-annex run passed. The failure in that build is due to the known https/git-annex-standalone issue. I'll drop the temporary tip commit. |
I approve this too, but I don't want to meddle with the management of 0.11, so I will not click the button. Feel free though. |
Thanks for taking a look. I've merge to 0.11.x now and will merge to master, dealing with the conflict, later today. |
…ex-0.11.x * origin/0.11.x: TST: create-sibling: Loosen check's assumption about Git's behavior TST: create-sibling: Remove stale comment TST: annexrepo: Ignore racy "is clean?" failure patched unicode issue for python2 CHANGELOG.md: Second batch for 0.11.9 BF: do not define pushurl for subdataset if there is no pushurl in parent BF: siblings: Check for remote before inheriting annex settings TST: siblings: Add basic test for inherit=True BF: lgr - use .setLevel() instead of .level = CLN: Drop unused imports from d1cc383 CHANGELOG.md: Add entry for dataladgh-3856 BF: auto - if empty list is returned by get, do not try to treat it as a dict record ENH: clone: Handle sameas special remotes in autoenable check TST: siblings: Add check for enabling a sameas remote ENH: annrepo.get_special_remotes: Copy "sameas-name" to "name" TST: utils: Add decorator that provides dataset with sameas remote BF: clone: Skip special remotes with no name in autoenable check DOC: annexrepo: Clarify get_special_remotes() docstring DOC: utils: Grammar fix
This series fixes gh-3816 (cloning of annex repos with special remotes that were set up with
--sameas
).Some notes:
When merging to master, there will be a merge conflict because
_handle_possible_annex_dataset
was moved to clone.py in 8f9ae55 (RF: Move clone-internal helper to clone.py, 2019-10-14).This handles the clone case, but other places in DataLad (in particular
publish
) may need an update for git-annex's sameas functionality.If the user is running a git-annex version that doesn't support sameas special remotes, we're informing them about a remote that they can't actually enable. If others feel this is worth handling, I can rework the code to check the annex version.