New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RF: Use WitlessRunner for config manager #4699
Conversation
Do you expect any positive or negative effect on run time? config reading is in the core of all operations, that is why I wonder. According to benchmarks run on github actions -- no effect either way pretty much. |
I have no strong expectations either way. I have two motivations: 1. It would be nice to have all code use a single runner. 2. I want to find the suspected bug reflected in #4698 by exercising this runner extremely often, on something very simple. |
I have long suspected it, but never saw it that clearly. Here is running diff --git a/datalad/tests/test_config.py b/datalad/tests/test_config.py
index 7a078cef0..1d9fefe1d 100644
--- a/datalad/tests/test_config.py
+++ b/datalad/tests/test_config.py
@@ -520,6 +520,10 @@ def test_bare(path):
# can we handle a bare repo?
gr = GitRepo(path, create=True, bare=True)
# any sensible (and also our CI) test environment(s) should have this
+ print('CFG', gr.config._store)
+ print('RUNNERCWD', gr.config._runner.cwd)
+ print('REPOPATH', gr.path)
+ print('CFGFILES', gr.config._cfgfiles)
assert_in('user.name', gr.config)
# not set something that wasn't there
obscure_key = 'sec.reallyobscurename!@@.key' Standalone output (test passes):
Moduletest output (test fails):
what we are seeing here is the state of a ConfigManager instance that doesn't even exist in the context of that specific test. It has residuals from various other tests (e.g. I believe that this has at least partly to do with the runner ( Update: It reads different config files, updated the output above |
I suspect it as nothing to do with the runner itself, but with our flyweights and Just a theory by now, but it seems more likely to me ATM. |
Push the fix from #4703 -- works locally. |
I'm not convinced. It's plausible that this fix mitigates the issue, of course (and that fix might be desirable either way). But why does it have an effect in case of - say Do you want to try, whether I'm right or should I dig into it independently? |
@bpoldrack Independent is better. I do not debate that there could be more issues (in all likelyhood there are), but #4703 definitely fixes one of them. |
Ok. Tested and it looks like I'm wrong. Ran the entire test battery w/ a patch to record the paths However, still not clear why it's the same instance, so need to keep digging. |
I'm trying to reproduce the issues you pointed to in previous post about how
appears to be an incorrect assessment, I think. Picking up those configs from |
Sounds reasonable ... and orthogonal to this PR. Do you want to open a separate one? |
|
run_async_cmd() waits for a Future that is marked as done in WitlessProtocol.process_exited() and then collects the output---which was received via WitlessProtocol.pipe_data_received()---and returns it. However, if the command exits quickly, process_exited() may be called before pipe_data_received(), in which case run_async_cmd() returns empty output. Some of our calls to git seem fast enough to trigger this issue (dataladgh-4773). This appears to be a bug in asyncio [1] and can be triggered using a Protocol class that is identical to one in the Python documentation [2] aside from using a faster command. Even if this is resolved upstream, though, we need a workaround for the Python versions that we support. Attempt to reliably capture the output by waiting for `transport._wait()`. While there may be more proper ways to solve this, it's the only workaround I've been able to come up. The following suggests that it might do the trick: * create_subprocess_{exec,shell} don't seem to suffer from this problem. The Process object they return has a .wait method that waits for `transport._wait`. However, those functions use a very different protocol than WitlessProtocol, so it's possible that that wait isn't the key difference. Still, it suggests it's a safe thing to do. * An open PR, dataladgh-4699, switches the ConfigManager over to using the WitlessRunner and is triggering a spread of failures that is plausibly due to losing output of the quick 'git config' calls. Merging the change here into that PR resulted in a passing Travis build [3]. * With the script posted at the Python bug report, I've been unable to trigger any dropped output on two runs with 10000 iterations if I add `await asyncio.ensure_future(transport._wait())`. Without that line, I've consistently been able to trigger it with many fewer iterations. * In a reproducer script (included in this PR) that involves Datalad and `git commit -- non-existing-file` (inspired by dataladgh-4773), I've been unable to trigger the error in hundreds of iterations with the changes in this series. Before these changes, I've been able to consistently trigger it using fewer than 100 iterations. This reverts 2955ba1 (TST: Skip known failure in test_AnnexRepo_commit, 2020-08-04), as it should resolve the source of that flaky test failure. [1] Filed at https://bugs.python.org/issue41594 [2] https://docs.python.org/3/library/asyncio-protocol.html#loop-subprocess-exec-and-subprocessprotocol [3] https://travis-ci.org/github/datalad/datalad/builds/721449200 Closes datalad#4773
Rebased after merge of #4835 |
All good on linux now! Wonderful! Windows test failure point to the fact that |
… default This makes it possible to force UTF-8 encoding for a WitlessProtocol, like the old Runner is doing it for its processing.
…bscure On Windows we would have some codepage being default, but the content of a git-config setting can easily use UTF-8 encoding (and it is somewhat safe to assume that UTF-8 is the correct encoding for git-config output).
Note: benchmarks seems to suggest up to 20% penalty |
I have learned to largely ignore the benchmarks. However, I also see a 5% slowdown (stddev=3%) in the the config manager test runtime. Please keep in mind that the focus of this PR is |
Codecov Report
@@ Coverage Diff @@
## master #4699 +/- ##
==========================================
- Coverage 89.69% 89.68% -0.02%
==========================================
Files 289 289
Lines 40480 40481 +1
==========================================
- Hits 36310 36305 -5
- Misses 4170 4176 +6
Continue to review full report at Codecov.
|
Conflicts: datalad/config.py
Merged present master to incorporate recent ConfigManager optimizations. |
As they will likely be of interest, here are the benchmarks:
Looks "same enough" to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't see anything wrong with this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mih:
I have two motivations: 1. It would be nice to have all code use a single runner. 2. I want to find the suspected bug reflected in #4698 by exercising this runner extremely often, on something very simple.
I very much agree with one, and two is no longer relevant, so I'm in favor of merging this.
Wonderful, thx for the feedback! |
Includes 109ac04 from #4703 that exposed a critical flaw in
GitWitlessRunner
and a limitation inWitlessRunner