New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promote {defer,allow}_cancellation as first class constructs #149
Comments
Yes, this needs better documentation. On it. |
A more general question. Would it make sense for
Or as a context manager::
Just thinking out loud... |
result = await defer_cancellation(coro(args))
async with defer_cancellation():
statements
I think both are relevant, like timeout_after..
|
I've never wanted the function-call version (as you can probably guess
:-)), and like I mentioned in the other thread I have a vague feeling that
this is a case where TOOWTDI should apply, but no strong opinion beyond
that.
|
The nesting of (defer|allow)_cancellation seems particularly tricky to me. Consider this example::
Does the As currently implemented, it seems that the above code could be cancelled because the internal check for cancellation only looks at the top-most level of nesting. I'm now wondering if it should consider all of the levels. That is, cancellation is only allowed if ALL of the levels allow it. Thoughts? |
Also, can the |
Bigger question: Do we even need |
Wouldn't be logical that inner most *_cancellation wins? async def cancel_me():
async with defer_cancellation:
# Here defered <-->
async with allow_cancellation:
# Here allowed <---> Cancellation can only happen here
async with defer_cancellation:
# Here defered <-->
task = await spawn(cancel_me())
await task.cancel() Same thing applies to call stack. |
Each context enter/exit can update "cancellable" status of |
A single "cancellable" flag will not handle below example. async def cancel_me():
async with defer_cancellation:
# Here defered <-->
async with defer_cancellation:
# Here defered
task = await spawn(cancel_me())
await task.cancel()
|
Right now, the behavior is implemented such that the inner-most block determines what happens. However, I don't like it. I think it makes any use of Big picture: I don't think a singe inner-most |
Yeah I'm eagerly flip-flopping: :) |
On 12/29/16, David Beazley ***@***.***> wrote:
If some outer `defer_cancellation` is in effect, it was done for
a reason--it shouldn't be something that can be arbitrarily overridden by
accident within the innards of a library function that happens to use
`allow_cancellation` someplace.
I disagree. If a function down the call stack allows/defers cancellation,
we should assume it knows what it is doing and it is prepared
for handling the outcome. That's its territory.
Below code should not change behaviour simply
because some function 10 stacks up changes something.
That will make everything untrackable.
Below code seems so much natural for me..
I think its implementation would be much simpler unless
someone disproves that "restoring previous
state on *_cancellation context exit" simply handles it.
```python
async def cancel_me():
async with defer_cancellation:
# Here defered <-->
async with allow_cancellation:
# Here allowed <---> Cancellation can only happen here
async with defer_cancellation:
# Here defered <-->
```
|
I'm sorry, but this makes no sense whatsoever. If I write code like this:
My expectation is that this code is guaranteed to not receive a cancellation in that block of statements. No matter what As Curio is currently implemented, ANY use of The bottom line is that any use of The whole situation is further complicated by the fact that timeouts are intermingled with cancellation as a concept. Unlike a true cancellation, a timeout can be caught and acted upon. So, that's it's own problem on top of everything else. So, with all of this, I'm not exactly sure how I'd want to proceed. I just know that I don't want to be fielding a bunch of bug reports regarding tasks being mysteriously cancelled despite the fact that they're in the middle of a |
My gut feeling on all of this is to remove |
Your assumptions are consistent only if "allow_cancellation is non-existent". Lets think "allow_cancellation" as an advanced option that can open |
This is agreeable. But what's the current status if I catch and try to continue on Can I disregard a cancellation in a task? I theory I guess I can. |
So history: I started out implementing just async def _supervise_loop(self):
assert self._started
chained_exc = None
async with curio.defer_cancellation:
while self._tasks or not self._shutting_down:
try:
# This is the only line where we want to allow
# cancellation / timeouts:
async with curio.allow_cancellation:
task = await self._q.get()
await self._process_finished_task(task)
except BaseException as exc:
if not self._shutting_down:
await self.start_shutdown()
if chained_exc is not None:
exc.__context__ = chained_exc
chained_exc = exc
# All done!
await self._finished.set()
if chained_exc is not None:
raise chained_exc You're absolutely right that the current semantics are weird. Basically my rationale was just, AFAICT all possible semantics are kinda weird, it only makes sense to use these in tight coordination with each other, so I just went with the simplest weird semantics. It would be nice if there were something better, just not sure what that is. I'm also very strucky by your point about I have some not-quite-formed thoughts about how a more-reified notion of cancellation like this one might help here, but I'll stop here for now rather than try to write a dissertation at half-past-midnight or delay the rest of this post indefinitely :-) |
Here is an unkillable task! Any sufficiently determined one And it may be better to review task management code from curio import spawn, sleep, run
async def cancel_if_you_can():
while True:
try:
print('Alive!')
await sleep(1)
except:
pass
async def main():
inmortal = await spawn(cancel_if_you_can())
while True:
sleep(1)
await inmortal.cancel()
run(main()) |
Yes, you can make a task unkillable. However, if you do, a lot of other stuff is going to break. For instance, |
I've been doing a lot of thinking about this feature and simply don't see how I can make it work as formulated. I believe it is inherently ill-defined as a context-manager. Specifically, if I see code like this:
I fully expect that the "Done" message will appear no matter what happens with respect to cancellation. I put it in the same category as code that might appear in a There are all sorts of problems with it though:
I just don't see any way out of this entanglement. I think you could fix it if you reformulated
This is a lot simpler. It also allows the shielded coroutine to use timeouts and other features. Yes, it requires you to put the uninterruptible code into its own coroutine--so you'd lose the context-manager aspect of it. However, I think the restored sanity makes up for it. |
Another possibility is to disallow nesting entirely. Curio could be made to raise a runtime error if you did it. |
As a documentation note, lets warn users that try:
await timeout_after(10, co)
except:
pass will eat up your cancellations/timeouts... Each try catch block in such a tree "with some awaits in it", This is something we dont have to think for sync code. |
I don't see how to safely write my task supervisor loop using this... am I missing something?
This would be OK, I guess, treating these as a pretty low-level finicky constructs that most people shouldn't interact with. But it would still mean that "whether this function uses a timeout internally" would become an external part of its api (because Here's a sketch of an alternative way, based on that cancellation token idea I linked up-thread. First, we reify the idea of a "cancellation token" as an first-class object. I'm not sure how much public API we want, but conceptually it's something like a Each task has associated with it an "active set" of cancellation tokens. If one of these objects becomes "set", then the associated exception is raised at the next cancellation point. (I guess if multiple are set, then we can raise a chained exception?) When we create a new task, we also create an associated cancellation token, which we stash somewhere in the task object. Initially, the task's "active set" consists of this token alone. class Task:
def __init__(self, ...):
self.cancel_token = CancelToken()
self.cancel_set = {self.cancel_token}
def cancel(self):
self.cancel_token.set(curio.TaskCancelledError())
... There are some primitives that let us manipulate the cancel set, in particular adding and removing tokens. (When a token that is already 'set' is added, it immediately raises an error.)
async with defer_cancellation as cancel_handle:
...
async with allow_cancellation(cancel_handle):
...
...
Interesting example number 1: async with timeout_after(10):
...
async with defer_cancellation:
# code in here is protected from Task.cancel() and from the timeout_after(10)
...
async with timeout_after(5):
# code in here is protected from those, but is still subject to the timeout_after(5)
...
# once we leave the defer_cancellation block, the timeout_after(10) is back in force Interesting example number 2: create a global cancellation token representing "the server is being shut down". Whenever we spawn a new task, use some kwarg to spawn to add this token to the task's cancel set. Then when we want to cancel all tasks and shut down, we just set this one token. |
async with ignore_after(10) as token:
...
if token.is_set():
# the timeout fired
... (This is related to something I've been meaning to propose for a bit, actually. There are basically three things people might want to do after a timeout has fired: (a) ignore, (b) catch and handle, (c) throw an exception like "I give up (btw it was because of a timeout)". Right now we use |
I've been doing a lot of thinking about this whole subject over the last few days and I'm afraid that {allow|defer}_cancellation is going on the chopping block. The semantics are too complicated and the current implementation is logically unsound in the presence of nesting. I don't know if adding tokens saves it or not, but I do know that "if the implementation is hard to explain, it's a bad idea" is part of the Zen of Python ;-). I also know that my head is starting to explode. I think a better solution to this whole problem is going to be cancellation polling. As an option, it's easy to make it so that tasks NEVER receive a CancelledError exception. Instead, a flag can be set that must be checked via explicit polling::
When launching the supervisor, you'd supply some kind of option to
Timeouts, although related to cancellation, are a different kind of animal. Timeouts can be handled and ignored. The semantics of nesting them is more well-defined. I'm not inclined to change how they currently work. If the above supervisor is concerned about getting a timeout from outside, then don't apply a timeout to it. Or wrap it:
|
An explicit flag (for polling) could also be used to control things more finely.
I suppose that could be wrapped in some kind of context manager to bring it back to something similar to what's there now. Main difference is that the implementation would need to explicitly check the As it stands now though, the implementation is far too complicated. Dedicated kernel traps, internal stacks, and numerous support functions. It doesn't need to be this complicated.. |
Polling also doesn't work! The loop I posted blocks blocks in a queue.get()
until either it is cancelled or the queue has something for it. How do I
write that without these tools?
|
Use a timeout! Partly joking... maybe. I am current sorting through a reworking of the machinery involved with this. Right now, the implementation of {defer|allow}_cancellation is too complicated and the semantics are too weird to explain to mortals. I think there is an alternative way to approach this that will allow you to do what's needed and which can be explained more sanely. Working on it now. |
Oh hmm, somehow I didn't see this comment until just now. Sure, one could replace the kernel traps with direct manipulation of the task struct, and the stack could be replaced with a single scalar. Most of the implementation complexity IMO comes from how timeouts works in the kernel (not sure this is avoidable), and most of the semantic complexity comes from the issues discussed in this thread around nesting and interaction with timeouts. But I'll be interested to see what you come up with :-) |
This whole issue is quite hard and interesting. I love thinking about this stuff ;-). |
I have implemented a major reworking on cancellation control and related features. Rather than describe it here, please read this: http://curio.readthedocs.io/en/latest/devel.html#cancellation-control This change will break all existing use of {defer|allow}_cancellation (sorry). There are comparable features with {disable|enable}_cancellation() functions though. It should not break other Curio code so far as I know. Basically gist is this: cancellation can still be controlled, but there are some refined rules about when things happen, handling of exceptions, and nesting. |
It really seems fine. Thanks. |
Right now, it does not allow any combination of nesting. So something like disable->enable->enable is illegal and results in a RuntimeError. disable->enable->disable->enable is okay though. Here is my general thought on this... cancellation control is a pretty advanced thing to be introducing into your code. It makes everything more difficult to reason about and requires a LOT of attention to fine details. Because of that, its use really requires a fair amount of forethought. My general fear is that people who don't quite understand what they're doing would start using these features haphazardly. For example, writing a bunch of library functions like this:
IMO, just having a bare From an implementation standpoint, the restriction is somewhat artificial. There's nothing about the implementation that nesting breaks. The risk of breakage is more with user-written code that happens to use these features. |
One thing I'm not entirely sure about is the actual delivery of cancellation exceptions. In the current implementation, it only happens on the next blocking operation after a
Raising a cancellation exception in the |
disable->enable->enable is illegal and results in a RuntimeError.
From an implementation standpoint, the restriction is somewhat artificial.
There's nothing about the implementation that nesting breaks. The risk of
breakage is more with user-written code that happens to use these features.
I understand that it is for user discipline, not a result of an underlying
technical requirement. Is RuntimeError on a singleton enable_cancellation
for the same reason?
If I were the author, I would remove the constraint, and encourage wellformedness
in documentation. It is always better to free who knows what he's doing rather
than restrict the one who doesn't now what he's doing. That degree of freedom
can be employed in interesting library combinations even if it is not well formed.
Anyway, it is me. Apart from differences between personal preference for
such a hypothetical borderline case I guess I have finished my comments on this
issue since I see all my previous comments are satisfied. May be the summary of this
last comments can also be added to documentation.
Impressed with your determination and speed on subject. Thanks.
|
I think I'm going to play it "safe" to start with this. This whole cancellation handling issue is fascinating and hairy at the same time. Putting some constraints on it at the start might be a prudent move ;-). |
Closing for now. enable/disable cancellation are implemented and available. Revisit in a new issue of further ideas/improvements arise. |
That's the correct answer. Imagine implementing a mutex. About the whole async def _supervise_loop(self):
assert self._started
chained_exc = None
b1 = disable_cancellation()
async with b1:
while self._tasks or not self._shutting_down:
try:
# This is the only line where we want to allow
# cancellation / timeouts:
async with restore_cancellation(b1):
task = await self._q.get()
await self._process_finished_task(task)
except BaseException as exc:
if not self._shutting_down:
await self.start_shutdown()
if chained_exc is not None:
exc.__context__ = chained_exc
chained_exc = exc
# All done!
await self._finished.set()
if chained_exc is not None:
raise chained_exc Also, I also happen to like this topic. I might necrobump a few more threads while I follow the past discussions. |
Currently they are documented as a single line
in "Low-level Kernel System Calls" section.
I think they can be documented in "Tasks"
section with some examples.
They have direct everyday usage to protect a block
of code from cancellation in the middle or having
cancellation occur at a favorable point.
One example would be resetting a http2 stream
only at frame boundries.
The text was updated successfully, but these errors were encountered: