-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_local_url_with_fetch: fails on crippled fs #4644
Comments
reproduces locally with python 3.6 conda environment:
DataLad 0.13.7.dev614 WTF (configuration, credentials, datalad, dataset, dependencies, environment, extensions, git-annex, location, metadata_extractors, metadata_indexers, python, system)WTFconfiguration <SENSITIVE, report disabled by configuration>credentials
datalad
dataset
dependencies
environment
extensionsgit-annex
location
metadata_extractors
metadata_indexerspython
system
|
reproduced also in conda env with python 3.7.9datalad.core.distributed.tests.test_clone.test_local_url_with_fetch ... ERROR
Versions: annexremote=1.4.3 appdirs=1.4.4 boto=2.49.0 cmd:7z=16.02 cmd:annex=8.20201127+git54-ga1b227171-1~ndall+1 cmd:bundled-git=2.24.0 cmd:git=2.24.0 cmd:system-git=2.29.2 cmd:system-ssh=8.4p1 exifread=2.3.1 humanize=3.0.1 iso8601=0.1.13 keyring=21.4.0 keyrings.alt=4.0.0 msgpack=1.0.0 mutagen=1.45.1 requests=2.24.0 wrapt=1.12.1
Obscure filename: str=b";&%b5{}'\xce\x94\xd0\x99\xd7\xa7\xd9\x85\xe0\xb9\x97\xe3\x81\x82.datc" repr=";&%b5{}'ΔЙקم๗あ.datc"
Encodings: default='utf-8' filesystem='utf-8' locale.prefered='UTF-8'
Environment: LANG='en_US.UTF-8' GIT_PAGER='less --no-init --quit-if-one-screen' PATH='/home/yoh/anaconda-5.2.0-2.7/envs/datalad-py3.7.9/bin:/home/yoh/anaconda-5.2.0-2.7/condabin:/home/yoh/gocode/bin:/home/yoh/gocode/bin:/home/yoh/bin:/home/yoh/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/sbin:/usr/sbin:/usr/local/sbin' GIT_CONFIG_PARAMETERS="'init.defaultBranch=master'"
======================================================================
ERROR: datalad.core.distributed.tests.test_clone.test_local_url_with_fetch
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/yoh/anaconda-5.2.0-2.7/envs/datalad-py3.7.9/lib/python3.7/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 578, in _wrap_with_tree
return t(*(arg + (d,)), **kw)
File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 757, in _wrap_with_tempfile
return t(*(arg + (filename,)), **kw)
File "/home/yoh/proj/datalad/datalad-master/datalad/core/distributed/tests/test_clone.py", line 629, in test_local_url_with_fetch
ds_cloned.repo.fetch()
File "/home/yoh/proj/datalad/datalad-master/datalad/support/gitrepo.py", line 2394, in fetch
git_options=git_options,
File "/home/yoh/proj/datalad/datalad-master/datalad/support/gitrepo.py", line 2411, in fetch_
git_options=git_options)
File "/home/yoh/proj/datalad/datalad-master/datalad/support/gitrepo.py", line 2552, in _fetch_push_helper
"from nor a tracking branch is set up.".format(action))
ValueError: Neither a remote is specified to fetch from nor a tracking branch is set up.
----------------------------------------------------------------------
Ran 1 test in 4.534s
FAILED (errors=1)
and finally (just needed to wait a bit longer) in local environmentdatalad.core.distributed.tests.test_clone.test_local_url_with_fetch ... ERROR
Versions: annexremote=1.4.3 appdirs=1.4.4 boto=2.49.0 cmd:7z=16.02 cmd:annex=8.20201127+git54-ga1b227171-1~ndall+1 cmd:bundled-git=2.24.0 cmd:git=2.24.0 cmd:system-git=2.29.2 cmd:system-ssh=8.4p1 exifread=2.3.2 git=3.1.11 gitdb=4.0.5 humanize=0.0.0 iso8601=0.1.13 keyring=21.5.0 keyrings.alt=4.0.1 msgpack=1.0.0 mutagen=1.45.1 requests=2.24.0 scrapy=2.4.1 wrapt=1.12.1
Obscure filename: str=b";&%b5{}'\xce\x94\xd0\x99\xd7\xa7\xd9\x85\xe0\xb9\x97\xe3\x81\x82.datc" repr=";&%b5{}'ΔЙקم๗あ.datc"
Encodings: default='utf-8' filesystem='utf-8' locale.prefered='UTF-8'
Environment: LANG='en_US.UTF-8' GIT_PAGER='less --no-init --quit-if-one-screen' PATH='/home/yoh/proj/datalad/datalad-master/venvs/dev3/bin:/home/yoh/anaconda-5.2.0-2.7/condabin:/home/yoh/gocode/bin:/home/yoh/gocode/bin:/home/yoh/bin:/home/yoh/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/sbin:/usr/sbin:/usr/local/sbin' GIT_CONFIG_PARAMETERS="'init.defaultBranch=master'"
======================================================================
ERROR: datalad.core.distributed.tests.test_clone.test_local_url_with_fetch
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 578, in _wrap_with_tree
return t(*(arg + (d,)), **kw)
File "/home/yoh/proj/datalad/datalad-master/datalad/tests/utils.py", line 757, in _wrap_with_tempfile
return t(*(arg + (filename,)), **kw)
File "/home/yoh/proj/datalad/datalad-master/datalad/core/distributed/tests/test_clone.py", line 629, in test_local_url_with_fetch
ds_cloned.repo.fetch()
File "/home/yoh/proj/datalad/datalad-master/datalad/support/gitrepo.py", line 2389, in fetch
return list(
File "/home/yoh/proj/datalad/datalad-master/datalad/support/gitrepo.py", line 2400, in fetch_
yield from self._fetch_push_helper(
File "/home/yoh/proj/datalad/datalad-master/datalad/support/gitrepo.py", line 2550, in _fetch_push_helper
raise ValueError(
ValueError: Neither a remote is specified to fetch from nor a tracking branch is set up.
----------------------------------------------------------------------
Ran 1 test in 5.650s
FAILED (errors=1)
so it is just a flaky issue, nothing about "github workflows esoteric nature" |
and boils down again to "config caching":
|
so I guess it is just a manifestation of still open #4363 with a performance-harmful candidate fix #4364 not merged/closed (seems to resolve this issue for me locally). I guess someone should really look into config caching logic -- some time stamping assumptions might need tune ups to avoid the forceful reload proposed by #4364. But the point is -- the issue is still there. |
FWIW: I will furnish a quick PR introducing an alternative approach I had mentioned in #4364 (comment) : reload config whenever a flyweight instance is reused |
nope: no PR since that (that change is in my branch: yarikoptic@84f2981) is not 100% sufficient at Repo level (did not bother on dataset), even if I |
It helps to mitigate but does not 100% resolve the datalad#4644 -- it still might fail once in a rare while
I jinxed it!!! right before I was about to interrupt (so it lasted many more cycles than without a fix) -- a test run failed :-/ it does help quite a bit so I sent out #5275 |
f.cking A: I see the rabbit but don't know how it came to live like this with the following changes to reload upon non-readonly command and also printing useful information while reloadingdiff --git a/datalad/config.py b/datalad/config.py
index d56ee2dff..f138d8dc2 100644
--- a/datalad/config.py
+++ b/datalad/config.py
@@ -347,8 +347,14 @@ class ConfigManager(object):
# if mtime age is less than worst resolution assume modified
(current_time - curmtimes[c]) > 2.0
for c in curmtimes):
- return False
- return True
+ reload = False
+ else:
+ reload = True
+ print(f"IN NEED RELOAD with runner cwd {self._runner.cwd} WITH {store['mtimes']} for {store['files']} at"
+ f" {current_time} "
+ f"decided to reload"
+ f"={reload}" )
+ return reload
def _reload(self, run_args):
# query git-config
diff --git a/datalad/support/gitrepo.py b/datalad/support/gitrepo.py
index 94cd1ebfc..c6b7f78bc 100644
--- a/datalad/support/gitrepo.py
+++ b/datalad/support/gitrepo.py
@@ -2187,6 +2187,9 @@ class GitRepo(RepoInterface, metaclass=PathBasedFlyweight):
finally:
if not read_only:
self._write_lock.release()
+ # ensure that we react to possible changes to config
+ print("RELOADING!")
+ self._cfg.reload()
out = res['stdout']
err = res['stderr']
I have got a failed test with
while running in a loop and keeping temp directories etc$> DATALAD_TESTS_TEMP_FSSIZE=100 tools/eval_under_testloopfs 'for s in {1..60}; do DATALAD_TESTS_TEMP_KEEP=1 DATALAD_LOG_LEVEL=debug python -m nose -s -v --pdb datalad/core/distributed/tests/test_clone.py:test_local_url_with_fetch; sudo rm -rf $TMPDIR/datalad-fs-*/*; done' note: ~~strange thing is that rm -rf seems to be not effective for me for some reason -- they keep piling up... not sure if relates, but might ~~ nah -- we just overload TMPDIR so I need to remove everything there ;) and what makes little sense is that difference between
30 seconds!!! the whole test run even with heavy output seems to take about 7 seconds.... so I am ready to blame everything including fuse and vfat and python itself ... ;-) so I give up for now and welcome your attempts as well! |
grr.. just to solidify. Added to runner printing where it is actually running ... so below you can see that we do invoke a number of `git config --local add` and even `git annex init` etc calls under `/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a` BUT st_mtime does not change and we decide to not reload since by then over 2 sec passed...IN NEED RELOAD with runner cwd /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a WITH stored {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.datalad/config'): 1609870704.0} for {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.datalad/config')} at 1609870705.6237636 with {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.datalad/config'): 1609870704.0}decided to reload=True
IN NEED RELOAD with runner cwd /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a WITH stored {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tnmsq1jk/.gitconfig'): 1609870702.0, PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.git/config'): 1609870704.0, PosixPath('/etc/gitconfig'): 1608746390.7433054} for {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tnmsq1jk/.gitconfig'), PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.git/config'), PosixPath('/etc/gitconfig')} at 1609870705.623854 with {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tnmsq1jk/.gitconfig'): 1609870702.0, PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.git/config'): 1609870704.0, PosixPath('/etc/gitconfig'): 1608746390.7433054}decided to reload=True
[DEBUG ] Async run ['git', 'config', '-z', '-l', '--show-origin'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'config', '-z', '-l', '--show-origin']
[DEBUG ] Process 2608963 started
[DEBUG ] Waiting for process 2608963 to complete
[DEBUG ] Process 2608963 exited with return code 0
[DEBUG ] Async run ['git', 'config', '-z', '-l', '--show-origin', '--file', '/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.datalad/config'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'config', '-z', '-l', '--show-origin', '--file', '/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.datalad/config']
[DEBUG ] Process 2608979 started
[DEBUG ] Waiting for process 2608979 to complete
[DEBUG ] Process 2608979 exited with return code 0
[DEBUG ] Async run ['git', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes']
[DEBUG ] Process 2608995 started
[DEBUG ] Waiting for process 2608995 to complete
[DEBUG ] Process 2608995 exited with return code 0
[DEBUG ] Async run ['git', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes']
[DEBUG ] Process 2609011 started
[DEBUG ] Waiting for process 2609011 to complete
[DEBUG ] Process 2609011 exited with return code 0
[DEBUG ] Async run ['git', 'rev-parse', '--verify', 'HEAD^{commit}'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'rev-parse', '--verify', 'HEAD^{commit}']
[DEBUG ] Process 2609027 started
[DEBUG ] Waiting for process 2609027 to complete
[DEBUG ] Process 2609027 exited with return code 0
[DEBUG ] Async run ['git', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes']
[DEBUG ] Process 2609043 started
[DEBUG ] Waiting for process 2609043 to complete
[DEBUG ] Process 2609043 exited with return code 0
[DEBUG ] Initializing annex repo at /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Async run ['git', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads', 'refs/remotes']
[DEBUG ] Process 2609059 started
[DEBUG ] Waiting for process 2609059 to complete
[DEBUG ] Process 2609059 exited with return code 0
[DEBUG ] Async run ['git', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'for-each-ref', '--format=%(refname:strip=2)', 'refs/heads']
[DEBUG ] Process 2609075 started
[DEBUG ] Waiting for process 2609075 to complete
[DEBUG ] Process 2609075 exited with return code 0
[DEBUG ] Async run ['git', 'config', '--local', '--add', 'branch.master.remote', 'origin'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'config', '--local', '--add', 'branch.master.remote', 'origin']
[DEBUG ] Process 2609091 started
[DEBUG ] Waiting for process 2609091 to complete
[DEBUG ] Process 2609091 exited with return code 0
[DEBUG ] Async run ['git', 'config', '--local', '--add', 'branch.master.merge', 'refs/heads/master'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'config', '--local', '--add', 'branch.master.merge', 'refs/heads/master']
[DEBUG ] Process 2609107 started
[DEBUG ] Waiting for process 2609107 to complete
[DEBUG ] Process 2609107 exited with return code 0
[DEBUG ] Async run ['git', 'annex', 'init', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3'] under /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a
[DEBUG ] Launching process ['git', 'annex', 'init', '-c', 'annex.dotfiles=true', '-c', 'annex.retry=3']
[DEBUG ] Process 2609123 started
[DEBUG ] Waiting for process 2609123 to complete
[INFO ] Detected a filesystem without fifo support.
| Disabling ssh connection caching.
[INFO ] Detected a crippled filesystem.
[INFO ] Scanning for unlocked files (this may take some time)
[DEBUG ] Process 2609123 exited with return code 0
IN NEED RELOAD with runner cwd /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a WITH stored {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.datalad/config'): 1609870704.0} for {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.datalad/config')} at 1609870706.002961 with {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.datalad/config'): 1609870704.0}decided to reload=False
IN NEED RELOAD with runner cwd /home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a WITH stored {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tnmsq1jk/.gitconfig'): 1609870702.0, PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.git/config'): 1609870704.0, PosixPath('/etc/gitconfig'): 1608746390.7433054} for {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tnmsq1jk/.gitconfig'), PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.git/config'), PosixPath('/etc/gitconfig')} at 1609870706.003189 with {PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tnmsq1jk/.gitconfig'): 1609870702.0, PosixPath('/home/yoh/.tmp/datalad-fs-vPjCx/datalad_temp_tree_test_local_url_with_fetchuy4d2y9s/subdir/a/.git/config'): 1609870704.0, PosixPath('/etc/gitconfig'): 1608746390.7433054}decided to reload=False
there is an additional little issue with that time checking code that if a config file is removed it considers it to be "ok" and not changed (worth a PR on its own) but it is unrelated. Given that I saw 30 seconds difference above, tune up of "go above 2 seconds buffer" would not help us really either. I think the only logical getaway would be to consider more "complete" fingerprinting of the files, so reacting also to atime changes (who knows -- might be no longer readable!!) and size (definetely a change) changes. From quick benchmarking -- likely be the same in terms of "performance" since any additional hit (I see larger variance) would be infinitely tiny in respect to other operations
|
Weird and not yet explained behavior on crippled fs does happen where we do get mtime "stuck" in the past (saw even 30 seconds) on stating .git/config, see datalad#4644 (comment) . For current tests failures at hand, probably should have been sufficient to just add collection of file sizes, and reacting on those changes as well. But in some cases I see value to react even on atime change (e.g. file was made non-readable), so why not to compare that as well? To not bother subselecting which stat to worry about, I have decided to collect/compare all stats. I do not think performance impact would be noticeable In [9]: mtime = p.stat().st_mtime In [10]: stat = p.stat() In [11]: %timeit p.stat().st_mtime == mtime 1.57 µs ± 5.92 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [12]: %timeit p.stat() == stat 1.45 µs ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) but we might react on more possibly valid cases when to reload a config. Also, part of the change is not to go/compare only "existing" paths, but all known. If path disappeared, and did exist before (we had non-None stat) -- we must reload as well.
As present in #4620 https://github.com/datalad/datalad/pull/4620/checks?check_run_id=788686809 and possibly in other PRs etc (didn't check).
The text was updated successfully, but these errors were encountered: