-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I terminate Pools cleanly? #72
Comments
Try doing this and see what happens::
|
The first Ctrl+C appears ignored, and a second one gives:
This is always with the patch for the AttributeError, otherwise it fails there. |
Hmmm. Will have to investigate further. One thing is critically important though--quitting without calling kernel.run(shutdown=True) is likely to cause a lot of chaos. This is supposed to issue a cancellation to all remaining tasks which will cause them to cleanly exit. I'm not exactly sure what might be going on in the above code that would require you to have to issue a second KeyboardInterrupt. Big picture: It should be possible to do this with curio if you are careful about how it happens though. It might require some experimentation and patches. |
I think I have a better idea of what is happening here and possibly how to fix it. Short version: The keyboard interrupt is being caught in a strange spot and the kernel is not able to cleanly cancel things. Need to spend some time fiddling with it. |
Wow, okay. Control-C is tremendously evil and tricky. Let's discuss. First, in order to have an orderly shutdown of coroutines, it's not sufficient to simply catch KeyboardInterrupt and exit. You'll probably get a ton of errors about GeneratorExit, non-awaited coroutines, and similar things. So, that final step of shutting down the kernel is going to be a critical step. Even if you do the shutdown, the arrival of KeyboardInterrupt itself is crazy if you consider where it might arrive:
It seems that one can restore a lot of sanity to the situation by trying to force all Control-C handling into a coroutine like this::
If you do this, Control-C will be routed to a known and predictable location in the program (always to the kb_interrupt coroutine). When it happens, the kernel will stop. After that, you can perform the cleanup step and terminate. Yikes. I've pushed a few minor fixes to make this work correctly. Open to all ideas on how to make this more sane. |
I got around playing with this and I got some AttributeErrosrs in Kernel.del. If I change it to:
Then everything seems to work fine. Could there be an option in |
Reviewing some of the code, it seems that there is some kind of duplication between del() and _shutdown_resources(). In the big picture, I think that destroying a Kernel without some kind of controlled shutdown should be considered an error. So, I've modified the _shutdown_resources() method so clean up pools in addition to its usual cleanup. This should now happen automatically for the curio.run() method or if you explicitly call Kernel.run(shutdown=True). I've modified del() to raise an exception if it's invoked without some kind of cleanup taking place first (note: maybe a warning message is better?). I've also made the Kernel class work as a context manager to give better control over the shutdown process. |
A problem I am seeing now is the following: If I have a very deeply nested set of task running in pools, e.g. a generalized DAG dependency resolver as I am trying in the code above I find no reasonable way of shutting down everything once I get an error in one of the tasks. After trying with many combinations of try except, the tasks scheduled with spawn seem to try to run in the pool before giving the opportunity to cancel them, creating a very long cascade of errors and unwanted work. The workaround I have found is:
which I believe works thanks to the special way in which Keyboard interrupt is treated now. Then this can be handled like:
Could there be something like |
I'm a bit curious. Instead of raising KeyboardInterrupt, does raising SystemExit also work? |
Yes, it works in the same way. |
Been thinking about this a bit more. Are you thinking that their ought to be a special Shutdown exception one could raise in a coroutine and have that shut down the entire kernel and all running tasks? |
Yes. I think that's an useful functionality that there is currently no easy way to do (other than with the above hacks, which would be equivalent). Also I'm wondering if that should be the default behaviour in the face of an unhandled exception that propagates down to the point where you call |
I could imagine a scenario where the same kernel is used to run multiple coroutines submitted to it. So, for that case, I'm not sure it makes sense to shutdown all of the resources upon completion of each run() call. I could also envision a situation where there are some background (daemon) tasks sitting there waiting to do things on each run() invocation--and you wouldn't want to have them shutdown. Let me play around with it. In thinking about this problem, I realize that there is a very subtle bug in the whole shutdown process that could cause the kernel to spin indefinitely if SystemExit or KeyboardInterrupt is raised as described in this bug. So, I'd like to fix that too. |
I have added a new exception KernelExit that can make the kernel exit. It's still going to function more-or-less like SystemExit though. So, you'd need to do something like this::
Or alternatively:
I'm still kicking around the concept of whether or not there should be a more general kind of abort() or terminate() functionality. I don't know. It seems that raising an exception and kicking it out to the top level can probably work as well as anything. All things equal, I'd like to have "one way to do it" as opposed to many different ways of terminating though. Leaving the issue open as I'm still open to ideas on how to improve all of this. |
Closing for now. Think shutdown is working correctly in the current version, but if not this should be reopened. |
Looking at #70, this looks like a difficult problem. I would like a way to shut down cleanly a Kernel using pools when the user cancels the program (e.g.
KeyboardInterrupt
ing it). The most obvious problem is incurio/workers.py
, here https://github.com/dabeaz/curio/blob/master/curio/workers.py#L284: By the time the finally is executed,shutdown
is already called, setting the workers toNone
, and thus resulting in an attribute error. Once that is worked around, with e.g.There are still some issues. If I have a simple program like:
and I run it and then press Ctrl+C, it almost works, except for some noisy multiprocessing delete action:
If I change
f()
to explicitly ignore the interrupt signal, it shuts quiet:However, if I have a more complex code (specifically this https://github.com/NNPDF/reportengine/blob/25f7b0c4680a91b692653e24b1a0b5f8c222dc99/src/reportengine/resourcebuilder.py#L135) with more tasks being ran and waited for I get the dreaded
GeneratorExit
thing 4 times, once for each concurrent run_in_process. Not really sure why it behaves differently from the simple case:The text was updated successfully, but these errors were encountered: