New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF: check special remotes of type git by uuid to (not) inform if they are not enabled #3547
Conversation
… are not enabled Matching by the name is not sufficient, since when we just clone such repository it would get an arbitrary (by default "origin") name which would not match the one git-annex calls it by. Checking by annex UUID seems to be the best way to proceed
Codecov Report
@@ Coverage Diff @@
## 0.11.x #3547 +/- ##
==========================================
+ Coverage 81.29% 81.31% +0.01%
==========================================
Files 255 255
Lines 33588 33639 +51
==========================================
+ Hits 27307 27355 +48
- Misses 6281 6284 +3
Continue to review full report at Codecov.
|
Subject: [PATCH] BF: check special remotes of type git by uuid to (not) inform
if they are not enabledMatching by the name is not sufficient, since when we just clone such
repository it would get an arbitrary (by default "origin") name which
would not match the one git-annex calls it by.
Makes sense. So when we are cloning from a type=git
special remote, we see an unwarranted message:
$ datalad clone ///repronim/containers
[INFO ] Cloning http://datasets.datalad.org/repronim/containers [1 other candidates] into '/tmp/dl-Kv9PJVz/containers'
[INFO ] warning: Not setting branch master as its own upstream.
[INFO ] access to dataset sibling "datasets.datalad.org" not auto-enabled, enable with:
| datalad siblings -d "/tmp/dl-Kv9PJVz/containers" enable -s datasets.datalad.org
Checking by annex UUID seems to be the best way to proceed
Doing so introduces an unwarranted message when cloning from other sources:
$ datalad clone http://github.com/ReproNim/containers
[INFO ] Cloning http://github.com/ReproNim/containers [1 other candidates] into '/tmp/dl-96hiJI2/containers'
[INFO ] warning: Not setting branch master as its own upstream.
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] access to dataset sibling "datasets.datalad.org" not auto-enabled, enable with:
| datalad siblings -d "/tmp/dl-96hiJI2/containers" enable -s datasets.datalad.org
install(ok): /tmp/dl-96hiJI2/containers (dataset)
The problem comes from assuming that the git annex init
call (triggered by _handle_possible_annex_dataset
) configures the UUID when enabling the remote. That isn't (at least always) the case:
# in the last cloned repo from above
$ cat .git/config | tail -3
[remote "datasets.datalad.org"]
url = http://datasets.datalad.org/repronim/containers/.git
fetch = +refs/heads/*:refs/remotes/datasets.datalad.org/*
It isn't until later, say getting a file, that the UUID shows up:
$ datalad get scripts/tests/arg-test.simg
get(ok): /tmp/dl-96hiJI2/containers/scripts/tests/arg-test.simg (file) [from datasets.datalad.org...]
$ cat .git/config | tail -5
[remote "datasets.datalad.org"]
url = http://datasets.datalad.org/repronim/containers/.git
fetch = +refs/heads/*:refs/remotes/datasets.datalad.org/*
annex-bare = false
annex-uuid = 71c620b5-997f-4849-bb30-c42dbb48a51
Why don't we just check the value of the "autoenable" key? That might miss cases where git annex init
failed to enable the remote, but presumably the datalad sibling
snippet we give for enabling will fail for the same reason, so dropping the instructions isn't really hurting the situation.
THANK YOU @kyleam for the review and catching the regression. I should have thought about it ;)
But to check it for what remote exactly, i.e. how (if not using uuid) we can associate any special remote with a known to git remote? may be upon clone and seeing an annex we should run /tmp > datalad clone http://github.com/ReproNim/containers && cd containers
[INFO ] Cloning http://github.com/ReproNim/containers [1 other candidates] into '/tmp/containers'
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
[INFO ] access to dataset sibling "datasets.datalad.org" not auto-enabled, enable with:
| datalad siblings -d "/tmp/containers" enable -s datasets.datalad.org
install(ok): /tmp/containers (dataset)
/tmp/containers > git config remote.datasets.datalad.org.annex-uuid
/tmp/containers > git annex info | grep datasets.datalad.org
71c620b5-997f-4849-bb30-c42dbb48a51e -- [datasets.datalad.org]
/tmp/containers > git config remote.datasets.datalad.org.annex-uuid
71c620b5-997f-4849-bb30-c42dbb48a51e ? |
> Why don't we just check the value of the "autoenable" key?
But to check it for what remote exactly, i.e. how (if not using uuid) we can associate any special remote with a known to git remote?
Hmm, I might be getting confused, but I'll try to elaborate in case you're missing my suggestion.
The current behavior is "for each remote in git-annex:remote.log, try to find the remote in .git/config based on $CRITERIA and, if it isn't found, give the user instructions for enabling it". Your PR is changing $CRITERIA for `type=git`, but I'm asking why not get rid of the .git/config inspection altogether, warning based on the autoenable key for each item in get_special_remotes().
|
ah, got it -- but would not that assume (without checking) that the remote was enabled? |
Yes, but as I said above, I don't see how the check really helps:
|
observation: our "install" is not really causing git-annex to autoenable any of those special remotes. They do get added to git configuration as remotes, but annex doesn't even try to reach them to establish their annex UUID. annex UUID fetching and "autoenable" happens later (or with the initial version of the this PR here due to `annex info`) by e.g. invoking either `git annex info` or `git annex get` (we had some discussion on this topic recently)./tmp > datalad install https://github.com/ReproNim/containers
[INFO ] Cloning https://github.com/ReproNim/containers [1 other candidates] into '/tmp/containers'
[INFO ] Remote origin not usable by git-annex; setting annex-ignore
install(ok): /tmp/containers (dataset)
/tmp > git -C containers config remote.datasets.datalad.org.annex-uuid
/tmp > git -C containers annex get images/bids/bids-validator--1.2.3.sing
get images/bids/bids-validator--1.2.3.sing (from datasets.datalad.org...)
4% 1.3 MiB 1018 KiB/s 35s^C
/tmp > git -C containers config remote.datasets.datalad.org.annex-uuid
71c620b5-997f-4849-bb30-c42dbb48a51e It somewhat also relates to our discussion on "either should we enable special remotes on a remote while publishing?". I think to stay more consistent with git-annex and also avoid spitting out some warnings and failures whenever it is not yet relevant (may be I don't need to But we might need to improve reporting of failures of autoneabling remotes from git-annex (see #3564) later in the game ( |
hm, while coding for it realized that the situation is more fun. git special remote which has no autoenable might also be the one we just cloned! ;) so I will need to do some check by UUID (without calling |
+ provide debug level msg about autoenable"ed remotes + collide messages on how to enable special remotes into a single message (no need to flood the screen) + do not report anything about remotes UUIDs for which already known
It's not accurate to say AFAICT all the remaining information you provide about a UUID not showing up in the config is restating what I said in my initial comment:
|
if srs[True]: | ||
lgr.debug( | ||
"configuration for %s %s added because of autoenable," | ||
" but no UUIDs for them yet known for dataset %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much value in this because it seems to be the norm, but it's of course not a big deal given that it's logged at the debug level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All Debug messages have only hypothetical value of helping to troubleshoot when something goes wrong. Given the tricky nature of all this, might come handy IMHO
0.11.6 (Jul 30, 2019) -- am I the last of 0.11.x? Primarily bug fixes to achieve more robust performance Fixes - Our tests needed various adjustments to keep up with upstream changes in Travis and Git. ([#3479][]) ([#3492][]) ([#3493][]) - `AnnexRepo.is_special_annex_remote` was too selective in what it considered to be a special remote. ([#3499][]) - We now provide information about unexpected output when git-annex is called with `--json`. ([#3516][]) - Exception logging in the `__del__` method of `GitRepo` and `AnnexRepo` no longer fails if the names it needs are no longer bound. ([#3527][]) - [addurls][] botched the construction of subdataset paths that were more than two levels deep and failed to create datasets in a reliable, breadth-first order. ([#3561][]) - Cloning a `type=git` special remote showed a spurious warning about the remote not being enabled. ([#3547][]) Enhancements and new features - For calls to git and git-annex, we disable automatic garbage collection due to past issues with GitPython's state becoming stale, but doing so results in a larger .git/objects/ directory that isn't cleaned up until garbage collection is triggered outside of DataLad. Tests with the latest GitPython didn't reveal any state issues, so we've re-enabled automatic garbage collection. ([#3458][]) - [rerun][] learned an `--explicit` flag, which it relays to its calls to [run][[]]. This makes it possible to call `rerun` in a dirty working tree ([#3498][]). - The [metadata][] command aborts earlier if a metadata extractor is unavailable. ([#3525][]) * tag '0.11.6': (56 commits) [DATALAD RUNCMD] make update-changelog finalize CHANGELOG.md entry and boost version BF(DOC): close [create] with [] to not cause WARNING by md-strict pandoc CHANGELOG.md: Link entry from b3e8adb CHANGELOG.md: Add entry for gh-3547 CHANGELOG.md: Add entry for gh-3561 CHANGELOG.md: Add link for addurls RF: inform about special remotes based on autoenable config CHANGELOG.md: Second batch for 0.11.6 BF: addurls: Process datasets in a stable, breadth-first order BF: addurls: Fix construction of nested subpaths TST: addurls: Don't hard-code path separator BF(TST): skip test_v7_detached_get in direct mode - fails to annex upgrade TST: benchmark-travis-pr: Swap 'pip install' and 'git show' TST: benchmark-travis-pr: Move repeated logic to run_asv() TST: benchmark-travis-pr: Support other bases TST: benchmark-travis-pr: Tweak message about current HEAD TST: benchmark-travis-pr: Simplify two git commands into one TST: benchmark-travis-pr: Reorder and break up lines TST: benchmark-travis-pr: Move command for running asv into function ...
Matching by the name is not sufficient, since when we just clone such
repository it would get an arbitrary (by default "origin") name which
would not match the one git-annex calls it by. Checking by annex UUID
seems to be the best way to proceed
FOI: test takes 1.7sec on my laptop (on network close to the datasets.datalad.org)