Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: account for annex now "scanning for annexed" instead of "unlocked" files #5705

Merged
merged 3 commits into from
Jun 2, 2021

Conversation

yarikoptic
Copy link
Member

Closes #5702

Recently, somewhere around 8.20210428-186-g428c91606
git-annex changed its logic for doing initial scanning for
unlocked (and now locked?) files. Now I do not observe that message
to be printed out and our tests started to fail in daily testing of
datalad/git-annex
https://github.com/datalad/git-annex/actions/runs/877176725 might be
the first one and version reported is 8.20210428+git228-g13a6bfff4
when previous green one was 8.20210428+git202-g9a5981a15

It seems that annex init might have got notably slower :-/ (attn
@joeyh), e.g. our test tuned here here

python -m nose -s -v datalad/support/tests/test_annexrepo.py:test_init_scanning_message

runs for 1.3s with 8.20210223-1~ndall+1 and then 3.238s with 8.20210429-g66355b99a
although a proper analysis should be done on closer builds.

The same goes for datalad install ///openneuro/ds000001 which went
from 2.1sec (with a message about "Scanning for unlocked") to 6 sec
without any notification on "Scanning..."

So, overall, I feel that there is a need for a better timing comparison and possibly a report to @joeyh on substantial slow down :-/

Not sure if we should merge this workaround or see it resolved in git-annex somehow before its release (ATM it is still at 8.20210428-241-g66355b99a)

…git-annex

Closes datalad#5702

Recently, somewhere around [8.20210428-186-g428c91606](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=428c91606b434512d1986622e751c795edf4df44)
git-annex changed its logic for doing initial scanning for
unlocked (and now locked?) files.  Now I do not observe that message
to be printed out and our tests started to fail in daily testing of
datalad/git-annex
https://github.com/datalad/git-annex/actions/runs/877176725 might be
the first one and version reported is 8.20210428+git228-g13a6bfff4
when previous green one was 8.20210428+git202-g9a5981a15

It seems that `annex init` might have got notably slower :-/ (attn
@joeyh), e.g.  our test tuned here here

    python -m nose -s -v datalad/support/tests/test_annexrepo.py:test_init_scanning_message

runs for 1.3s with 8.20210223-1~ndall+1 and then 3.238s with 8.20210429-g66355b99a
although a proper analysis should be done on closer builds.

The same goes for `datalad install ///openneuro/ds000001` which went
from 2.1sec (with a message about "Scanning for unlocked") to 6 sec
without any notification on "Scanning..."
@yarikoptic
Copy link
Member Author

hm, not observing exactly that detrimental effect with plain git-annex invocations when trying out locally built git-annex versions for master and before that change.

using this script to time
#!/bin/bash

export PS4='> '

set -eu

cd "$(mktemp -d ${TMPDIR:-/tmp}/ga-XXXXXXX)"
git annex version 2>&1 | head -n 1

set -x

mkdir repo; cd repo; git init &>/dev/null

time git annex init

cd ..
git clone http://datasets.datalad.org/openneuro/ds000001/.git/ &>/dev/null
cd ds000001
time git annex init
overall there is a significant improvement since older 8.20210223-1~ndall+1 build but then recent change indeed does make it a bit slower for scanning on an existing repo than before that change
(git)lena:~datalad/trash[master]git
$> bash git-annex-timeinit.sh
git-annex version: 8.20210223-1~ndall+1
> mkdir repo
> cd repo
> git init
> git annex init
init  (scanning for unlocked files...)
ok
(recording state in git...)

real	0m0.202s
user	0m0.147s
sys	0m0.070s
> cd ..
> git clone http://datasets.datalad.org/openneuro/ds000001/.git/
> cd ds000001
> git annex init
init  (merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
(Auto enabling special remote s3-PUBLIC...)
ok
(recording state in git...)

real	0m4.518s
user	0m3.451s
sys	0m1.512s

$> bash git-annex-timeinit.sh
git-annex version: 8.20210428+git202-g9a5981a15-1~ndall+1
> mkdir repo
> cd repo
> git init
> git annex init
init  (scanning for unlocked files...)
ok
(recording state in git...)

real	0m0.238s
user	0m0.155s
sys	0m0.102s
> cd ..
> git clone http://datasets.datalad.org/openneuro/ds000001/.git/
> cd ds000001
> git annex init
init  (scanning for unlocked files...)
(Auto enabling special remote s3-PUBLIC...)
ok
(recording state in git...)

real	0m0.574s
user	0m0.363s
sys	0m0.171s

$> bash git-annex-timeinit.sh
git-annex version: 8.20210428+git241-g66355b99a-1~ndall+1
> mkdir repo
> cd repo
> git init
> git annex init
init  (scanning for annexed files...)
ok
(recording state in git...)

real	0m0.187s
user	0m0.129s
sys	0m0.069s
> cd ..
> git clone http://datasets.datalad.org/openneuro/ds000001/.git/
> cd ds000001
> git annex init
init  (scanning for annexed files...)
(Auto enabling special remote s3-PUBLIC...)
ok
(recording state in git...)

real	0m0.870s
user	0m0.597s
sys	0m0.210s

@codecov
Copy link

codecov bot commented May 31, 2021

Codecov Report

Merging #5705 (e58ba5c) into maint (69bb6b1) will decrease coverage by 60.76%.
The diff coverage is 75.00%.

❗ Current head e58ba5c differs from pull request most recent head 1fc79bb. Consider uploading reports for the commit 1fc79bb to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##            maint    #5705       +/-   ##
===========================================
- Coverage   90.28%   29.51%   -60.77%     
===========================================
  Files         299      296        -3     
  Lines       42348    42313       -35     
===========================================
- Hits        38233    12490    -25743     
- Misses       4115    29823    +25708     
Impacted Files Coverage Δ
datalad/support/tests/test_annexrepo.py 0.00% <0.00%> (-97.49%) ⬇️
datalad/support/annexrepo.py 61.66% <100.00%> (-29.76%) ⬇️
datalad/tests/test_api.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/support/digests.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/tests/test_base.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/tests/test_config.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/ui/tests/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/tests/test__main__.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/tests/test_strings.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/ui/tests/test_base.py 0.00% <0.00%> (-100.00%) ⬇️
... and 248 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69bb6b1...1fc79bb. Read the comment docs.

Now it would be "scanning for annexed", before was "scanning for unlocked".
So decided to just channel any "scannning for ... files" message and test
flexibly that we do pass it through.

Also decided to not bother "casing" "Scanning" (from reported by annex
"scanning") since we do not do similar action for other messages we pass, e.g.
now it would become a more consistent IMHO

	$> rm -rf ds000001 ; time datalad install ///openneuro/ds000001
	[INFO   ] scanning for annexed files (this may take some time)
	[INFO   ] access to 1 dataset sibling s3-PRIVATE not auto-enabled, enable with:
	| 		datalad siblings -d "/tmp/ds000001" enable -s s3-PRIVATE
	install(ok): /tmp/ds000001 (dataset)
@yarikoptic
Copy link
Member Author

thank you @joeyh - I adjusted our "parsing" out of the message and the test -- we will not skip it but allow for any "scanning for ... files" message

@yarikoptic yarikoptic changed the title BF(TST): skip testing for having "scanning for unlocked" with recent git-annex BF: account for annex now "scanning for annexed" instead of "unlocked" files May 31, 2021
@yarikoptic
Copy link
Member Author

hm, windows fail is pertinent but I am not sure why check fails:

FAIL: datalad.support.tests.test_annexrepo.test_init_scanning_message
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python39-x64\lib\site-packages\nose\case.py", line 198, in runTest
    self.test(*self.arg)
  File "C:\Python39-x64\lib\site-packages\datalad\tests\utils.py", line 764, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "C:\Python39-x64\lib\site-packages\datalad\support\tests\test_annexrepo.py", line 1312, in test_init_scanning_message
    assert_re_in(".*scanning for .* files", cml.out, flags=re.IGNORECASE)
  File "C:\Python39-x64\lib\site-packages\datalad\tests\utils.py", line 1286, in assert_re_in
    raise AssertionError(
AssertionError: Not a single entry matched '.*scanning for .* files' in ["[INFO] Detected a filesystem without fifo support.\n[INFO] Disabling ssh connection caching.\n[INFO] Detected a crippled filesystem.\n[INFO] scanning for unlocked files (this may take some time)\n[INFO] Entering an adjusted branch where files are unlocked as this filesystem does not support locked files.\n[INFO] Switched to branch 'adjusted/master(unlocked)'\n"]

may be needs a multiline flag, will add and see

@yarikoptic
Copy link
Member Author

that is odd -- the latest reported appveyor run is failing but on the old code:

 File "C:\Python39-x64\lib\site-packages\datalad\support\tests\test_annexrepo.py", line 1312, in test_init_scanning_message
    assert_re_in(".*scanning for .* files", cml.out, flags=re.IGNORECASE)

so there is no match=False as added in the last commit 1fc79bb ... odd ... but seems to be just some issue with reporting back since https://ci.appveyor.com/project/mih/datalad/builds/39412405 corresponding to that commit is all green, so should be good to go

@yarikoptic yarikoptic mentioned this pull request Jun 1, 2021
8 tasks
@yarikoptic yarikoptic added semver-patch Increment the patch version when merged annex Git-annex related issue enhancement labels Jun 1, 2021
@yarikoptic
Copy link
Member Author

not sure if we would finalize auto-ification of releases quickly, so I will proceed with merge for now so we could see datalad/git-annex turn green

@yarikoptic yarikoptic merged commit a1a2106 into datalad:maint Jun 2, 2021
@yarikoptic yarikoptic deleted the bf-test-unlocked branch August 5, 2021 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
annex Git-annex related issue enhancement semver-patch Increment the patch version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants