-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF: account for annex now "scanning for annexed" instead of "unlocked" files #5705
Conversation
…git-annex Closes datalad#5702 Recently, somewhere around [8.20210428-186-g428c91606](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=428c91606b434512d1986622e751c795edf4df44) git-annex changed its logic for doing initial scanning for unlocked (and now locked?) files. Now I do not observe that message to be printed out and our tests started to fail in daily testing of datalad/git-annex https://github.com/datalad/git-annex/actions/runs/877176725 might be the first one and version reported is 8.20210428+git228-g13a6bfff4 when previous green one was 8.20210428+git202-g9a5981a15 It seems that `annex init` might have got notably slower :-/ (attn @joeyh), e.g. our test tuned here here python -m nose -s -v datalad/support/tests/test_annexrepo.py:test_init_scanning_message runs for 1.3s with 8.20210223-1~ndall+1 and then 3.238s with 8.20210429-g66355b99a although a proper analysis should be done on closer builds. The same goes for `datalad install ///openneuro/ds000001` which went from 2.1sec (with a message about "Scanning for unlocked") to 6 sec without any notification on "Scanning..."
hm, not observing exactly that detrimental effect with plain git-annex invocations when trying out locally built git-annex versions for master and before that change. using this script to time#!/bin/bash
export PS4='> '
set -eu
cd "$(mktemp -d ${TMPDIR:-/tmp}/ga-XXXXXXX)"
git annex version 2>&1 | head -n 1
set -x
mkdir repo; cd repo; git init &>/dev/null
time git annex init
cd ..
git clone http://datasets.datalad.org/openneuro/ds000001/.git/ &>/dev/null
cd ds000001
time git annex init
overall there is a significant improvement since older 8.20210223-1~ndall+1 build but then recent change indeed does make it a bit slower for scanning on an existing repo than before that change(git)lena:~datalad/trash[master]git
$> bash git-annex-timeinit.sh
git-annex version: 8.20210223-1~ndall+1
> mkdir repo
> cd repo
> git init
> git annex init
init (scanning for unlocked files...)
ok
(recording state in git...)
real 0m0.202s
user 0m0.147s
sys 0m0.070s
> cd ..
> git clone http://datasets.datalad.org/openneuro/ds000001/.git/
> cd ds000001
> git annex init
init (merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
(Auto enabling special remote s3-PUBLIC...)
ok
(recording state in git...)
real 0m4.518s
user 0m3.451s
sys 0m1.512s
$> bash git-annex-timeinit.sh
git-annex version: 8.20210428+git202-g9a5981a15-1~ndall+1
> mkdir repo
> cd repo
> git init
> git annex init
init (scanning for unlocked files...)
ok
(recording state in git...)
real 0m0.238s
user 0m0.155s
sys 0m0.102s
> cd ..
> git clone http://datasets.datalad.org/openneuro/ds000001/.git/
> cd ds000001
> git annex init
init (scanning for unlocked files...)
(Auto enabling special remote s3-PUBLIC...)
ok
(recording state in git...)
real 0m0.574s
user 0m0.363s
sys 0m0.171s
$> bash git-annex-timeinit.sh
git-annex version: 8.20210428+git241-g66355b99a-1~ndall+1
> mkdir repo
> cd repo
> git init
> git annex init
init (scanning for annexed files...)
ok
(recording state in git...)
real 0m0.187s
user 0m0.129s
sys 0m0.069s
> cd ..
> git clone http://datasets.datalad.org/openneuro/ds000001/.git/
> cd ds000001
> git annex init
init (scanning for annexed files...)
(Auto enabling special remote s3-PUBLIC...)
ok
(recording state in git...)
real 0m0.870s
user 0m0.597s
sys 0m0.210s
|
Codecov Report
@@ Coverage Diff @@
## maint #5705 +/- ##
===========================================
- Coverage 90.28% 29.51% -60.77%
===========================================
Files 299 296 -3
Lines 42348 42313 -35
===========================================
- Hits 38233 12490 -25743
- Misses 4115 29823 +25708
Continue to review full report at Codecov.
|
Now it would be "scanning for annexed", before was "scanning for unlocked". So decided to just channel any "scannning for ... files" message and test flexibly that we do pass it through. Also decided to not bother "casing" "Scanning" (from reported by annex "scanning") since we do not do similar action for other messages we pass, e.g. now it would become a more consistent IMHO $> rm -rf ds000001 ; time datalad install ///openneuro/ds000001 [INFO ] scanning for annexed files (this may take some time) [INFO ] access to 1 dataset sibling s3-PRIVATE not auto-enabled, enable with: | datalad siblings -d "/tmp/ds000001" enable -s s3-PRIVATE install(ok): /tmp/ds000001 (dataset)
thank you @joeyh - I adjusted our "parsing" out of the message and the test -- we will not skip it but allow for any "scanning for ... files" message |
hm, windows fail is pertinent but I am not sure why check fails:
may be needs a multiline flag, will add and see |
that is odd -- the latest reported appveyor run is failing but on the old code:
so there is no |
not sure if we would finalize |
Closes #5702
Recently, somewhere around 8.20210428-186-g428c91606
git-annex changed its logic for doing initial scanning for
unlocked (and now locked?) files. Now I do not observe that message
to be printed out and our tests started to fail in daily testing of
datalad/git-annex
https://github.com/datalad/git-annex/actions/runs/877176725 might be
the first one and version reported is 8.20210428+git228-g13a6bfff4
when previous green one was 8.20210428+git202-g9a5981a15
It seems that
annex init
might have got notably slower :-/ (attn@joeyh), e.g. our test tuned here here
runs for 1.3s with 8.20210223-1~ndall+1 and then 3.238s with 8.20210429-g66355b99a
although a proper analysis should be done on closer builds.
The same goes for
datalad install ///openneuro/ds000001
which wentfrom 2.1sec (with a message about "Scanning for unlocked") to 6 sec
without any notification on "Scanning..."
So, overall, I feel that there is a need for a better timing comparison and possibly a report to @joeyh on substantial slow down :-/
Not sure if we should merge this workaround or see it resolved in git-annex somehow before its release (ATM it is still at 8.20210428-241-g66355b99a)