New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF: in a new process, test first if we can run any command, and if not - old ways #5367
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5367 +/- ##
==========================================
- Coverage 89.86% 89.83% -0.03%
==========================================
Files 300 298 -2
Lines 42104 42013 -91
==========================================
- Hits 37838 37744 -94
- Misses 4266 4269 +3
Continue to review full report at Codecov.
|
works great :) Thank you! |
Ftr: no notable effect on benchmarks run |
Got a bit complicated to avoid child keep running the other tests, and we could not just sys.exit(0) from the child - then parent process would not know to mark it as a failure. So we really needed to kill the child. I really hope that OS/Python/my terminology will not trigger interest of any child abuse services/initiatives, and demand Python to rename "kill" to "letgo" or something alike to not impose some violent behavior in our code.
dedicated test added in 3e403fb |
…lways fork see https://docs.python.org/3/library/os.html#os.fork on possible scenarios where it might puke.
Sleeping more helped but mac consistently fails one parallel test, seemingly unrelated (#5309). Will need to handle it here as well |
apparently on CI needs even longer sleep. Hopefully this one would be enough
e4a5dcc
to
1202200
Compare
As it seems to resolve also #5362 marked it for 0.14.0 |
TBH, I'm quite confused about the design of this. If we look at It says it's about figuring out whether or not we are in a new process. But then it raises an exception that is not about something like "I couldn't figure whether to respond with So - not sure. Weird to me and I see my future self massively struggling with making meaningful changes to this if it turns out to be needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left comments re the nature of the exceptions raised, but otherwise I view this as "if it works, why not?" My own explorations of this topic have not been particularly informative or even effective ;-)
cls._loop_pid = pid | ||
cls._loop_need_new = None | ||
elif cls._loop_need_new: | ||
raise RuntimeError("we know we need a new loop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure when this would happen, but maybe the error needs to be specific about which loop it is talking about and why this knowledge must be expressed as a runtimeerror.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is because of outside try/except RuntimeError
. This message will not be seen anywhere in outside code, so notion of a loop
is clear as to me in this limited context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's the design I struggle with (although it is of course "valid if it works"). The function implements assumptions about how exactly and for what purpose it is called from within what context. Why not just return True
/False
and check for _loop_need_new
outside of it?
# exhibits in https://github.com/ReproNim/testkraken/issues/95 | ||
lgr.debug("It seems we need a new loop when running our commands: %s", exc_str(e)) | ||
cls._loop_need_new = True | ||
raise RuntimeError("the loop is not reusable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same spirit as the comment above: Why is this a runtime error? Why not a new loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above, and it will be a new loop since we are raising an exception here.
it is just coding paradigm (like |
just for clarity, since I think diff is making it harder to grasp, here is the code on deciding either to start a new loop or not try:
is_new_proc = self._check_if_new_proc()
event_loop = asyncio.get_event_loop()
if is_new_proc:
self._check_if_loop_usable(event_loop, stdin)
if event_loop.is_closed():
raise RuntimeError("the loop was closed - use our own")
new_loop = False
except RuntimeError:
event_loop = self._get_new_event_loop()
new_loop = True unrolling it into a sequence of new_loop = False
is_new_proc = self._check_if_new_proc()
try:
event_loop = asyncio.get_event_loop()
if is_new_proc:
if not self._check_if_loop_usable(event_loop, stdin):
new_loop = True
if event_loop.is_closed():
new_loop = True
except RuntimeError:
new_loop = True
if new_loop:
event_loop = self._get_new_event_loop() and not sure that it would have made it more readable etc |
Exactly. It's actually about returning three different values, but we use the return value for two of them and an exception for the third one. Why not return 1 of three values then instead of a boolean? Or a tuple with its second element being optional? something like that? It's not the same as |
Those functions' usage of exceptions is anticipating the particular logic they are called from within. Don't use them elsewhere w/o refactoring.
Verdict on the call was general approval and that we'd wait on Ben's fixups. Those have come, and I think are unlikely to be a point of contention... merging. |
To overcome a problem with unruly super processes, ref: ReproNim/testkraken#95 may be just on OSX.
We will track if we are trying to run in a new PID, and if new - we will first test if we can run a command in the default loop, if not -- we revert to "old ways" for that PID to start loop for each process
@djarecka please check if works for you. Tried on OSX with 3.7 on testkraken -- seems to be "good as old" ;)
Also adds a known failure for parallel on osx, closes #5309
Checked on a local VM, it also seems to Closes #5362 (although I am not yet 100% sure why/how since we are not catching NotImplementedError ;))