New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Request : Interrupt and terminate a task #6283

Open
amitmurthy opened this Issue Mar 27, 2014 · 18 comments

Comments

10 participants
@simonster

This comment has been minimized.

Member

simonster commented Mar 27, 2014

Related: #4037

@JeffBezanson

This comment has been minimized.

Member

JeffBezanson commented Mar 27, 2014

Also related: #1700

We can already do this using schedule. The only problem is that the task can catch the exception and retry; there is no way to be sure the task ends. Maybe the interface can just be t.state = :done. We just need to update schedule to drop finished tasks.

@amitmurthy

This comment has been minimized.

Member

amitmurthy commented Mar 28, 2014

We should probably still have terminate(t::Task) = (t.state = :done; :ok) defined. Just seems odd that in this particular case, we expect the user to access a member field directly, while in other cases, a Task object is effectively used as an opaque handle.

@JeffBezanson

This comment has been minimized.

Member

JeffBezanson commented Mar 28, 2014

Nah, that won't be sufficient anyway. We should remove the task from whatever wait queue it is in, so it can be GC'd.

@kshyatt

This comment has been minimized.

Contributor

kshyatt commented Sep 15, 2016

Bumping. Do we have this functionality yet? Do we want it?

@amitmurthy

This comment has been minimized.

Member

amitmurthy commented Sep 15, 2016

Don't have it yet. We should. Killing a task should also release whatever resource it is waiting on - file fd, socket, remote reference, etc.

@vtjnash

This comment has been minimized.

Member

vtjnash commented Sep 15, 2016

I don't think we should add this, since "releasing whatever resource" is generally impractical and buggy. I don't know of any APIs that have this sort of feature but don't warn you not to use it due to the infeasibility of cleaning up state afterwards:
http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686717%28v=vs.85%29.aspx
https://internals.rust-lang.org/t/thread-cancel-support/3056 (rust doesn't have it, this is a discussion on why not)

@amitmurthy

This comment has been minimized.

Member

amitmurthy commented Sep 15, 2016

After reading those links, an appropriate solution would be to define an interrupt(t::Task).

Currently it should just throw an InterruptException in the target task if it is waiting on I/O, remote reference, Condition, etc. For compute bound tasks, if and when we have tasks scheduled on different threads, it could send an interrupt signal to the specific thread (if possible).

@rofinn

This comment has been minimized.

Contributor

rofinn commented Nov 22, 2016

+1 for interrupt(t::Task). @amitmurthy Do you have any idea of how to throw the InterruptException on the task? I feel like this solution would also help me figure out how to implement a stacktrace(t::Task) method (ie: run stacktrace() in the task for debugging).

@yuyichao

This comment has been minimized.

Member

yuyichao commented Nov 22, 2016

stacktrace(::Task) is much easier than interrupt(::Task)

@kshyatt kshyatt added the I/O label Nov 24, 2016

@vtjnash

This comment has been minimized.

Member

vtjnash commented Jan 24, 2017

Another reference on this topic is https://news.ycombinator.com/item?id=13470452

Note, that the current preferred mechanism of aborting another Task is to close the shared resource and let the runtime clean it up synchronously. This has the benefit of being reliable, easy to code, and already exists. It also is less racy, since close is stateful (once closed, the resource remains closed) rather than an edge-driven event, and typically also an expected condition (so it doesn't require any extra effort to handle).

@amitmurthy

This comment has been minimized.

Member

amitmurthy commented Jan 25, 2017

That works for libuv resources implementing close and Channels only. For tasks waiting on a remotecall or waiting on a Future/RemoteChannel users have no access to the Condition variables the task is waiting on. And implementing close(::Condition) which would invalidate all current and future calls on a Condition object I think is not correct. If we do that we may as well have interrupt(::Task) call close on the waiting condition which would bring us back to the issue of proper cleanup in the libuv case. Right?

@vtjnash

This comment has been minimized.

Member

vtjnash commented Jan 25, 2017

No, it would still be different because it would no longer be specific to intended interruption. For example, you might end up aborting a call to close or showerror instead of the intended job.

I think the remotecall functions generally have async versions which return a handle to the Channel? I think in most other cases, the resource is passed in as an argument which gives the caller some leverage. In the worst case for remotecall, since the resource argument is worker-pid, you could rmprocs(p) to kill / close the connection to that remote worker.

@amitmurthy

This comment has been minimized.

Member

amitmurthy commented Jan 25, 2017

I think the remotecall functions generally have async versions which return a handle to the Channel

The calls return a Future and a wait on a Future results in a remote task that waits on the backing channel waiting for data. On the caller we are waiting on a Condition which will be triggered by a response from the remote wait.

In a statement like @async remotecall_fetch(....) we only have access to a Task object. rmprocs(p) seems like an overkill but is probably the correct way to do it currently, as we don't have a means to interrupt the specific remote task.

@vtjnash

This comment has been minimized.

Member

vtjnash commented Jan 25, 2017

Right, but I thought that's why remotecall is available. And terminating that Task wouldn't actually notify the remote worker to stop, but might confuse / corrupt it when it tries to report it's results. I realize that ensuring cancel-ability may require thinking about how it'll work and threading out handles to the objects that can be used to stop the work. But I don't see how it could be done any other way. There's no guarantee that in the @async example there that it's not actually implemented as @async wait(@async remotecall_fetch()) (hopefully not intentionally...), so all that killing the Task directly accomplishes is destroying the monitoring process.

@s2maki

This comment has been minimized.

Contributor

s2maki commented Apr 27, 2018

Someone has written an article that draws an equivalence of @schedule-like behavior to the evils of goto.

https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful

There is a Julia-specific thread to this conversation at https://discourse.julialang.org/t/schedule-considered-harmful/10540

As someone who learned programming on old-fashioned BASICs like Applesoft BASIC on the Apple II+, and then had to move to procedural programming in C, I initially had the same gut-response to this article as I did back then. I'd call it "instinctive repulsion". But of course we do all now recognize goto as evil, so I reset my thinking and gave it an honest review.

In doing so, I have come to the conclusion that the article makes some great points. There is no good way to universally handle uncaught exceptions at the top of the @schedule other than to drop them on the floor. There isn't really a global (or even local) repository of outstanding Tasks that were created by @schedule, so you can't really even know what's running, or what may have been left out there by a black-box function call you have made.

But the reason I mention this article in this particular discussion is that I think having to support such unbounded @schedule calls may be one of the reasons that the ability to cancel a task is so hard. I haven't reviewed the Trio Python library itself or delved into the details of the Nursery concept as described other than to recognize it as similar to wrapping @async in @sync. But it does occur to me that there may be some concepts in there relating to "checkpoints" that begin to enable task cancels. (https://trio.readthedocs.io/en/latest/reference-core.html#checkpoints). Perhaps, if @schedule itself is dropped and the only way to schedule a task is to wrap an @async inside a @sync, then dealing with the resulting fallout cleaning up resources a task is holding on to may become easier.

Anyhow, just some food for thought here. I'm not necessarily proposing anything, but rather hoping to move the idea of task cancellation back into active discussion.

@StefanKarpinski

This comment has been minimized.

Member

StefanKarpinski commented May 8, 2018

I think we should look closely at the Trio approach to I/O in the future—i.e. post 1.0 (so this may have to be optional in 1.x or it may have to wait until 2.0). It has some really nice properties, including:

  1. Every task spawned within a function finishes before that function returns unless you spawn the task within an explicit "task nursery" that outlives the spawning function.

  2. There's a natural parent task to handle every child task failure—no more task failures disappearing into the void. This has been a frequent point of contention between @JeffBezanson and myself; the Trio approach provides a nice clean solution that could make both of us happy.

  3. There's a clear structure for cancellation of tasks and subtasks: if you kill a task, that also kills all subtasks. In particular, this means that instead of having timeout arguments on every possible blocking operation, you can do external cancellation of blocking tasks correctly—this composes better and means you don't have to wait until a potentially indefinite chain of timers expires.

The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together some time since Nathaniel is a much more effective and explainer of and advocate for the Trio model than I am. (Hope you don't mind the ping, Nathaniel!)

@njsmith

This comment has been minimized.

njsmith commented May 11, 2018

The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together

Sounds like a fun time to me :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment