-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NF: implement a non asyncio-based runner #5667
NF: implement a non asyncio-based runner #5667
Conversation
This patch implements a thread-based runner, that uses the asyncio protocol interface to communicate process output to protocol instances
Codecov Report
@@ Coverage Diff @@
## master #5667 +/- ##
==========================================
- Coverage 90.48% 88.60% -1.88%
==========================================
Files 305 305
Lines 41814 41999 +185
==========================================
- Hits 37834 37212 -622
- Misses 3980 4787 +807
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a first look. Cool!
From my POV this needs some real-world exposure. I will try to use this branch in some of the more tricky tasks, and report back.
THX big time!
There are a number of places where datalad tests examine debug output. Logging the output of sub processes throws those tests off.
Woooohooo! I see the light! I am going to it!!! thank you @christian-monch ! only 2 failing tests out of >1k in one of the travis runs======================================================================
FAIL: datalad.dataset.tests.test_gitrepo.test_gitrepo_call_git_methods
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/dl-miniconda-q7xiqj3h/lib/python3.8/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/tmp/dl-miniconda-q7xiqj3h/lib/python3.8/site-packages/datalad/tests/utils.py", line 585, in _wrap_with_tree
return t(*(arg + (d,)), **kw)
File "/tmp/dl-miniconda-q7xiqj3h/lib/python3.8/site-packages/datalad/dataset/tests/test_gitrepo.py", line 242, in test_gitrepo_call_git_methods
check("fatal: bad source", cml.out)
AssertionError: 'fatal: bad source' unexpectedly found in "[DEBUG] Async run:\n cwd=/tmp/datalad_temp_tree_test_gitrepo_call_git_methodsfoymy59e\n cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'mv', '--', 'notthere', 'dest']\n[DEBUG] Process 25881 started\n[DEBUG] ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ff5bb0>) started\n[DEBUG] ReaderThread(<_io.FileIO name=10 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ff5bb0>) started\n[DEBUG] ReaderThread(<_io.FileIO name=10 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ff5bb0>) exiting (stream end)\n[DEBUG] ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ff5bb0>) got data: b'fatal: bad source, source=notthere, destination=dest\\n'\n[DEBUG] ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ff5bb0>) exiting (stream end)\n[DEBUG] Process 25881 exited with return code 128\n"
-------------------- >> begin captured logging << --------------------
datalad.cmd: DEBUG: Async run:
cwd=/tmp/datalad_temp_tree_test_gitrepo_call_git_methodsfoymy59e
cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'mv', '--', 'notthere', 'dest']
datalad.cmd: DEBUG: Process 25878 started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=25 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ef9550>) started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=23 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ef9550>) started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=23 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ef9550>) exiting (stream end)
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=25 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ef9550>) got data: b'fatal: bad source, source=notthere, destination=dest\n'
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=25 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ef9550>) exiting (stream end)
datalad.cmd: DEBUG: Process 25878 exited with return code 128
datalad.dataset.gitrepo: Level 11: CommandError: 'git -c diff.ignoreSubmodules=none mv -- notthere dest' failed with exitcode 128 under /tmp/datalad_temp_tree_test_gitrepo_call_git_methodsfoymy59e [err: 'fatal: bad source, source=notthere, destination=dest']
datalad.cmd: DEBUG: Async run:
cwd=/tmp/datalad_temp_tree_test_gitrepo_call_git_methodsfoymy59e
cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'mv', '--', 'notthere', 'dest']
datalad.cmd: DEBUG: Process 25881 started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ff5bb0>) started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=10 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ff5bb0>) started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=10 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ff5bb0>) exiting (stream end)
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ff5bb0>) got data: b'fatal: bad source, source=notthere, destination=dest\n'
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f52ff5bb0>) exiting (stream end)
datalad.cmd: DEBUG: Process 25881 exited with return code 128
--------------------- >> end captured logging << ---------------------
======================================================================
FAIL: datalad.support.tests.test_gitrepo.test_gitrepo_call_git_methods
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/dl-miniconda-q7xiqj3h/lib/python3.8/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/tmp/dl-miniconda-q7xiqj3h/lib/python3.8/site-packages/datalad/tests/utils.py", line 585, in _wrap_with_tree
return t(*(arg + (d,)), **kw)
File "/tmp/dl-miniconda-q7xiqj3h/lib/python3.8/site-packages/datalad/support/tests/test_gitrepo.py", line 1632, in test_gitrepo_call_git_methods
check("fatal: bad source", cml.out)
AssertionError: 'fatal: bad source' unexpectedly found in "[DEBUG] Async run:\n cwd=/tmp/datalad_temp_tree_test_gitrepo_call_git_methodsxtvk6i7c\n cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'mv', '--', 'notthere', 'dest']\n[DEBUG] Process 4319 started\n[DEBUG] ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f42da0430>) started\n[DEBUG] ReaderThread(<_io.FileIO name=10 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f42da0430>) started\n[DEBUG] ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f42da0430>) got data: b'fatal: bad source, source=notthere, destination=dest\\n'\n[DEBUG] ReaderThread(<_io.FileIO name=10 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f42da0430>) exiting (stream end)\n[DEBUG] ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f42da0430>) exiting (stream end)\n[DEBUG] Process 4319 exited with return code 128\n"
-------------------- >> begin captured logging << --------------------
datalad.cmd: DEBUG: Async run:
cwd=/tmp/datalad_temp_tree_test_gitrepo_call_git_methodsxtvk6i7c
cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'mv', '--', 'notthere', 'dest']
datalad.cmd: DEBUG: Process 4316 started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=125 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f4291e280>) started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=123 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f4291e280>) started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=125 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f4291e280>) got data: b'fatal: bad source, source=notthere, destination=dest\n'
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=125 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f4291e280>) exiting (stream end)
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=123 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f4291e280>) exiting (stream end)
datalad.cmd: DEBUG: Process 4316 exited with return code 128
datalad.dataset.gitrepo: Level 11: CommandError: 'git -c diff.ignoreSubmodules=none mv -- notthere dest' failed with exitcode 128 under /tmp/datalad_temp_tree_test_gitrepo_call_git_methodsxtvk6i7c [err: 'fatal: bad source, source=notthere, destination=dest']
datalad.cmd: DEBUG: Async run:
cwd=/tmp/datalad_temp_tree_test_gitrepo_call_git_methodsxtvk6i7c
cmd=['git', '-c', 'diff.ignoreSubmodules=none', 'mv', '--', 'notthere', 'dest']
datalad.cmd: DEBUG: Process 4319 started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f42da0430>) started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=10 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f42da0430>) started
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f42da0430>) got data: b'fatal: bad source, source=notthere, destination=dest\n'
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=10 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f42da0430>) exiting (stream end)
datalad.runner: DEBUG: ReaderThread(<_io.FileIO name=14 mode='rb' closefd=True>, <queue.Queue object at 0x7f4f42da0430>) exiting (stream end)
datalad.cmd: DEBUG: Process 4319 exited with return code 128 Left a few tiny individual comments. Would be nice if tried within jupyter and also absorbed the test |
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Those should be fixed with commit d64e61d (the reader threads debug-logged the data coming from the sub-process, which confuses some tests that capture all output). |
FWIW, tried on windows VM conda install jupyter lab -- although I have not exhibited all the issues others saw, for me it just worked well with this PR. COOL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is just a draft PR, but I am too excited ;-) Left a few minor comments.
I have adjusted description with issues this PR would close and TODOs in my view to accomplish (easier to keep track of them when present in the top description) - they could always be addressed or argued against and crossed out ;) Re
|
This is great to see. Thanks, @christian-monch. Obviously this addressing outstanding problems is more important, but it's worth noting that the benchmarks are looking good too: benchmarks
https://github.com/datalad/datalad/runs/2599712075 |
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, that's what I wanted to see.
What I'd love to see in addition, is a WriterThread
as well. ;-)
BrokenPipeError is only raised on writing to a pipe, we do not have to check for it here.
The done_future-argument is no longer used in WitlessRunner instances. Deprecate it.
@yarikoptic: I checked the ability of pycrunch-engine to execute test_ria_basic with this PR in a python 3.6 environment. Seems fine (#5100 (comment)). If I understand task 3 correctly, you would like a test that runs uses an async context and executed the runner successfully within this context? I added such a test, so I consider task 3 done ;-) |
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
sorry that my suggestion for bd7bb59 was not entire kosher . it is what broke CI and running tests locally. let me know if you need a hand to mitigate |
oh no!!! the plague of stalling is upon us? please NO! looking at red appveyor run:
|
a456f2b
to
7f18e9f
Compare
FTR: benchmarks are still happy and suggestive of improvement for quickly running commands:
|
@yarikoptic: from my point of view this PR is ready. It seems that the tests that are executed with DATALAD_LOG_LEVEL=2 do not contribute to the coverage database. @kyleam Maybe you know why? |
Great job @christian-monch ! This PR is nearly perfectly done, and could only be made a bit more kosher (left 2 comments on that) so we do not need to deal with indigestion later on ;-) |
Both dealt with :-) |
FWIW, note on benchmarks -- short commands get consistently faster but surprisingly the beast of a studyforrest one is consistently slower a bit: (tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/logs/2021/05$ grep 'usecases.study_forrest.time_make_studyforrest_mockup' */pr/*5667/*/*success/*/8*txt
19/pr/5667/d1888cf/github-Benchmarks-3085-success/vs-master/8_Compare.txt:2021-05-19T20:42:29.0192372Z 45.5±0.1s 48.7±0.04s 1.07 usecases.study_forrest.time_make_studyforrest_mockup
20/pr/5667/1424902/github-Benchmarks-3108-success/vs-master/8_Compare.txt:2021-05-20T19:50:02.2174723Z 40.4±0.1s 43.3±0.2s 1.07 usecases.study_forrest.time_make_studyforrest_mockup
20/pr/5667/144f8f9/github-Benchmarks-3099-success/vs-master/8_Compare.txt:2021-05-20T11:25:36.2577054Z 43.0±0.1s 47.0±0.1s 1.09 usecases.study_forrest.time_make_studyforrest_mockup
20/pr/5667/7f18e9f/github-Benchmarks-3092-success/vs-master/8_Compare.txt:2021-05-20T08:15:15.2595110Z 44.2±0.2s 47.0±0.5s 1.06 usecases.study_forrest.time_make_studyforrest_mockup
20/pr/5667/7f6cc85/github-Benchmarks-3094-success/vs-master/8_Compare.txt:2021-05-20T10:19:06.5116745Z 45.2±0.6s 51.2±0.5s ~1.13 usecases.study_forrest.time_make_studyforrest_mockup
20/pr/5667/e8e43ff/github-Benchmarks-3100-success/vs-master/8_Compare.txt:2021-05-20T12:59:23.9873648Z 42.9±0.03s 45.9±0.02s 1.07 usecases.study_forrest.time_make_studyforrest_mockup
(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/logs/2021/05$ grep 'core.witlessrunner.time_echo_gitrunner' */pr/*5667/*/*success/*/8*txt
19/pr/5667/d1888cf/github-Benchmarks-3085-success/vs-master/8_Compare.txt:2021-05-19T20:42:29.0181216Z - 3.32±0.08ms 2.80±0.1ms 0.84 core.witlessrunner.time_echo_gitrunner
19/pr/5667/d1888cf/github-Benchmarks-3085-success/vs-master/8_Compare.txt:2021-05-19T20:42:29.0182450Z 3.51±0.03ms 3.27±0.05ms 0.93 core.witlessrunner.time_echo_gitrunner_fullcapture
20/pr/5667/1424902/github-Benchmarks-3108-success/vs-master/8_Compare.txt:2021-05-20T19:50:02.2164221Z 2.79±0.05ms 2.45±0.04ms ~0.88 core.witlessrunner.time_echo_gitrunner
20/pr/5667/1424902/github-Benchmarks-3108-success/vs-master/8_Compare.txt:2021-05-20T19:50:02.2165249Z 3.06±0.1ms 2.77±0.05ms ~0.90 core.witlessrunner.time_echo_gitrunner_fullcapture
20/pr/5667/144f8f9/github-Benchmarks-3099-success/vs-master/8_Compare.txt:2021-05-20T11:25:36.2564968Z 2.98±0.05ms 2.62±0.1ms ~0.88 core.witlessrunner.time_echo_gitrunner
20/pr/5667/144f8f9/github-Benchmarks-3099-success/vs-master/8_Compare.txt:2021-05-20T11:25:36.2566224Z 3.26±0.04ms 3.10±0.09ms 0.95 core.witlessrunner.time_echo_gitrunner_fullcapture
20/pr/5667/7f18e9f/github-Benchmarks-3092-success/vs-master/8_Compare.txt:2021-05-20T08:15:15.2581174Z - 3.09±0.09ms 2.76±0.04ms 0.89 core.witlessrunner.time_echo_gitrunner
20/pr/5667/7f18e9f/github-Benchmarks-3092-success/vs-master/8_Compare.txt:2021-05-20T08:15:15.2582573Z 3.37±0.2ms 3.11±0.09ms 0.92 core.witlessrunner.time_echo_gitrunner_fullcapture
20/pr/5667/7f6cc85/github-Benchmarks-3094-success/vs-master/8_Compare.txt:2021-05-20T10:19:06.5105150Z 3.17±0.06ms 2.94±0.04ms 0.93 core.witlessrunner.time_echo_gitrunner
20/pr/5667/7f6cc85/github-Benchmarks-3094-success/vs-master/8_Compare.txt:2021-05-20T10:19:06.5106280Z 3.31±0.06ms 3.18±0.2ms 0.96 core.witlessrunner.time_echo_gitrunner_fullcapture
20/pr/5667/e8e43ff/github-Benchmarks-3100-success/vs-master/8_Compare.txt:2021-05-20T12:59:23.9861584Z 3.08±0.1ms 2.65±0.04ms ~0.86 core.witlessrunner.time_echo_gitrunner
20/pr/5667/e8e43ff/github-Benchmarks-3100-success/vs-master/8_Compare.txt:2021-05-20T12:59:23.9862804Z 3.21±0.06ms 2.98±0.2ms 0.93 core.witlessrunner.time_echo_gitrunner_fullcapture |
ok, @mih supported my motion, so totals to 2 approvals, let's proceed and fix if there is anything to fix after |
PR for an non asyncio-based runner.
This patch implements a thread-based runner, that uses the asyncio protocol interface to communicate
process output to protocol instances.
This change was done to support running datalad commands from processes within a concurrent.futures.ProcessPoolExecutor. This did not work reliably if the datalad command invoked subprocesses, and if asyncio-based subprocess were used. In the asyncio-based implementation processes would randomly wait forever on some synchronization primitive.
Motivation: I am creating a pipeline/orchestration command for metalad that allows a user to specify an information source element, e.g. a dataset-traverser, and a number of processing elements, e.g. a metadata-extractor and a metadata-adder, which would be run on the output of the previous element. So the pipeline would look something like this:
[dataset-traverser->metadata-extractor->metadata-adder]
And I would like to use concurrent.futures (or some other process-based parallelization) to distribute pipeline element processing to multiple cores.
This PR is marked as draft-pull request, since it changes a central piece of datalad, and since I did not yet test it on Windows or MacOS
Closes #5422, #5409, #5100
TODOs:
test_popen_invocation
from https://github.com/datalad/datalad/pull/5369/files#diff-4aa1a9688db72f7a475890e0ee6a343ea98af70ccc3bfa66f17da82c819dfa8dR230