Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BF: in a new process, test first if we can run any command, and if not - old ways #5367

Merged
merged 8 commits into from Jan 26, 2021

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Jan 23, 2021

To overcome a problem with unruly super processes, ref: ReproNim/testkraken#95 may be just on OSX.

We will track if we are trying to run in a new PID, and if new - we will first test if we can run a command in the default loop, if not -- we revert to "old ways" for that PID to start loop for each process

@djarecka please check if works for you. Tried on OSX with 3.7 on testkraken -- seems to be "good as old" ;)

  • ideally needs a test

Also adds a known failure for parallel on osx, closes #5309

Checked on a local VM, it also seems to Closes #5362 (although I am not yet 100% sure why/how since we are not catching NotImplementedError ;))

@codecov
Copy link

codecov bot commented Jan 23, 2021

Codecov Report

Merging #5367 (3a107ed) into master (30412cd) will decrease coverage by 0.02%.
The diff coverage is 71.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5367      +/-   ##
==========================================
- Coverage   89.86%   89.83%   -0.03%     
==========================================
  Files         300      298       -2     
  Lines       42104    42013      -91     
==========================================
- Hits        37838    37744      -94     
- Misses       4266     4269       +3     
Impacted Files Coverage Δ
datalad/tests/utils.py 86.70% <30.00%> (-0.60%) ⬇️
datalad/cmd.py 92.60% <81.25%> (-0.63%) ⬇️
datalad/support/tests/test_parallel.py 95.00% <100.00%> (+0.03%) ⬆️
datalad/tests/test_witless_runner.py 92.72% <100.00%> (-7.28%) ⬇️
datalad/distributed/tests/test_ria_basics.py 97.95% <0.00%> (-0.37%) ⬇️
datalad/distributed/ora_remote.py 32.01% <0.00%> (-0.12%) ⬇️
datalad/interface/__init__.py 100.00% <0.00%> (ø)
datalad/local/configuration.py
datalad/local/tests/test_configuration.py
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30412cd...3a107ed. Read the comment docs.

@djarecka
Copy link

works great :) Thank you!

@yarikoptic
Copy link
Member Author

Ftr: no notable effect on benchmarks run

Got a bit complicated to avoid child keep running the other tests, and we could not
just sys.exit(0) from the child - then parent process would not know to mark it as a failure.
So we really needed to kill the child.

I really hope that OS/Python/my terminology will not trigger interest of any
child abuse services/initiatives, and demand Python to rename "kill" to "letgo"
or something alike to not impose some violent behavior in our code.
@yarikoptic
Copy link
Member Author

dedicated test added in 3e403fb

@yarikoptic
Copy link
Member Author

yarikoptic commented Jan 24, 2021

Sleeping more helped but mac consistently fails one parallel test, seemingly unrelated (#5309). Will need to handle it here as well

apparently on CI needs even longer sleep.  Hopefully this one would be enough
@yarikoptic yarikoptic added the merge-if-ok OP considers this work done, and requests review/merge label Jan 25, 2021
@yarikoptic yarikoptic added the severity-important major effect on the usability of a package, without rendering it completely unusable to everyone label Jan 25, 2021
@yarikoptic yarikoptic added this to the 0.14.0 milestone Jan 25, 2021
@yarikoptic
Copy link
Member Author

As it seems to resolve also #5362 marked it for 0.14.0

@bpoldrack
Copy link
Member

TBH, I'm quite confused about the design of this. If we look at _check_if_new_proc for example:

It says it's about figuring out whether or not we are in a new process. But then it raises an exception that is not about something like "I couldn't figure whether to respond with True or False", but instead is there to communicate that we found something other than the answer to the question and this other thing is also interesting to the intended caller of the function. So, the answer would validly be False, but screw that - we raise. ;-)

So - not sure. Weird to me and I see my future self massively struggling with making meaningful changes to this if it turns out to be needed.

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left comments re the nature of the exceptions raised, but otherwise I view this as "if it works, why not?" My own explorations of this topic have not been particularly informative or even effective ;-)

cls._loop_pid = pid
cls._loop_need_new = None
elif cls._loop_need_new:
raise RuntimeError("we know we need a new loop")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure when this would happen, but maybe the error needs to be specific about which loop it is talking about and why this knowledge must be expressed as a runtimeerror.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is because of outside try/except RuntimeError. This message will not be seen anywhere in outside code, so notion of a loop is clear as to me in this limited context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's the design I struggle with (although it is of course "valid if it works"). The function implements assumptions about how exactly and for what purpose it is called from within what context. Why not just return True/False and check for _loop_need_new outside of it?

# exhibits in https://github.com/ReproNim/testkraken/issues/95
lgr.debug("It seems we need a new loop when running our commands: %s", exc_str(e))
cls._loop_need_new = True
raise RuntimeError("the loop is not reusable")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same spirit as the comment above: Why is this a runtime error? Why not a new loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, and it will be a new loop since we are raising an exception here.

@yarikoptic
Copy link
Member Author

but screw that - we raise. ;-)

it is just coding paradigm (like StopIteration for the loops which could come from break or just finishing iterating): raise there is just to fit the try/except paradigm above which is also to catch exceptions which might be raised from get_event_loop. So it is pretty much a "False with a message" so while debugging you can quickly assess in except why we start to mint our own loops (instead of going line by line until one of them evaluates to False).

@yarikoptic
Copy link
Member Author

just for clarity, since I think diff is making it harder to grasp, here is the code on deciding either to start a new loop or not

        try:
            is_new_proc = self._check_if_new_proc()
            event_loop = asyncio.get_event_loop()
            if is_new_proc:
                self._check_if_loop_usable(event_loop, stdin)
            if event_loop.is_closed():
                raise RuntimeError("the loop was closed - use our own")
            new_loop = False
        except RuntimeError:
            event_loop = self._get_new_event_loop()
            new_loop = True

unrolling it into a sequence of ifs (if I RFed those _check to return bool instead of rising) and try/excepts could have looked like

        new_loop = False
        is_new_proc = self._check_if_new_proc()
        try:
            event_loop = asyncio.get_event_loop()
           if is_new_proc:
                if not self._check_if_loop_usable(event_loop, stdin):
                     new_loop = True
           if event_loop.is_closed():
              new_loop = True
        except RuntimeError:
            new_loop = True
        if new_loop:
            event_loop = self._get_new_event_loop()

and not sure that it would have made it more readable etc

@bpoldrack
Copy link
Member

bpoldrack commented Jan 26, 2021

So it is pretty much a "False with a message"

Exactly. It's actually about returning three different values, but we use the return value for two of them and an exception for the third one. Why not return 1 of three values then instead of a boolean? Or a tuple with its second element being optional? something like that?

It's not the same as StopIteration. That would be raised if you can't validly iterate anymore. That's what an exception is for: Can't return a regular response for reason x. In opposition to that, here we can still validly tell whether the "answer" is True or False but raise nevertheless.

Those functions' usage of exceptions is anticipating the particular
logic they are called from within. Don't use them elsewhere w/o
refactoring.
@kyleam
Copy link
Contributor

kyleam commented Jan 26, 2021

Verdict on the call was general approval and that we'd wait on Ben's fixups. Those have come, and I think are unlikely to be a point of contention... merging.

@kyleam kyleam merged commit 7429311 into datalad:master Jan 26, 2021
@kyleam kyleam mentioned this pull request Feb 4, 2021
@yarikoptic yarikoptic deleted the bf-workaround-forked branch April 29, 2021 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-if-ok OP considers this work done, and requests review/merge severity-important major effect on the usability of a package, without rendering it completely unusable to everyone
Projects
None yet
5 participants