New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WISH: Add method for interrupting / terminating a future #93
Comments
Thanks for the detailed answer! |
@HenrikBengtsson Are there any platform specific workarounds for this until it gets implemented? I develop things on Windows, but hosting is done on Linux servers, so I guess even just a way to do this on Linux would be useful. It is as simple as just getting the process pid and killing it? In windows, if I force close the background processes ("R for Windows front end") then R get's very unhappy - I get an error P.S. I can't actually find how to get the pid from a future in the documentation. You have lots of examples which have |
Not really, unless you're using fork R processes, i.e.
Just to clarify, on Windows multiprocess -> multisession, and on *nix/macOS multiprocess -> multicore. So, to reproduce Windows behavior everywhere, one can use
So, multisession futures uses workers that are part of a local "cluster" which is basically the same as when you set up a cluster using This is in contrast to the
It's not possible. However, you could achieve it by setting up a cluster manually, e.g. library("future")
cl <- makeClusterPSOCK(availableCores()) and grab the PIDs with the following hack: for (kk in seq_along(cl)) {
parallel:::sendCall(cl[[kk]], fun = Sys.getpid, args = list())
pid <- parallel:::recvResult(cl[[kk]])
attr(cl[[kk]]$host, "pid") <- pid
} That will annotate each worker in 'cl' with also the PID. (I've thought about adding this as an automatic feature that could be added to To use this cluster with futures, you do: plan(cluster, workers = cl) With this, you can grab the worker information for a particular future like this:
That gives you the PID of the process for a particular future. Note however, that that Then you can send various system signals (e.g. However, to get this working properly you probably also need to make your future expression / future code interrupt aware using As you see from the above, there are lots of things that needs to be put in place in order to be able to interrupt a future - and the above is just when you run on your local machine. To do this remotely is even harder. Having said this, it should be possible to add bits and pieces to the future that moves this in the right directly. |
Forgot to say, if possible, you could of course always write the future code to once in a while peek at a shared file for instructions from master. For instance, if a file |
@HenrikBengtsson Thanks for the very detailed response. Your additional comment is not helpful to me as I'm running a function from an external package in the future, so can't really make it check a file. The rest makes sense, and I can see now that it really is a lot of work. I'll maybe have a play on a linux machine if I have time, but this probably won't happen anytime soon. The reason I'm asking is that I have a "Stop" button in an R shiny app which currently makes the main R process just entirely forget about the future'd function. However, locally on Windows if I try do get more than 3 or 4 (I have 4 cores) things going at a time, everything locks up as there are no more cores. Potentially, the external function could take hours if the user puts bad inputs in, so it'd be useful to properly interrupt the process. The "Stop" button doesn't stop the function (which is running in a future), but will let it continue and just forget about it. Thus is a single user tries running something which will take ages once, everything is pretty much dead until one of their jobs finishes, if that makes sense. I'll happily help contribute towards a solution to this in any way possible, but wouldn't really know how to start I'm afraid... If a solution to this just isn't going to happen for a long time then fair enough. I understand that this is currently a limitation of how things are done. Thanks for your time! |
I do feel like this is an important issue that shouldn't be ignored though! It would be super nice to have some kind of
would be ideal, and I feel like it's applicable to a wide range of situations. |
It sounds like the hardest case out of the ones you mentioned is the "process on another machine" case. To me, it seems like the simplest solution to that problem is to start and maintain a control process on each remote machine, which just sits idle until instructed to kill something by the master process. That should solve the running out of resources problem since the control process gets to claim those resources before any workers do. |
Thanks for spending time thinking about this. Yes, for SOCK-like cluster, having a janitor process on each machine seems to be the one option. This can be done by launching one janitor per machine. However, this would require a new backend or a heavily modified version of parallel's An alternative approach would be that for each worker (future) launched a main monitor/janitor process which then launches the actual working process. This can be done with the existing SOCK framework of parallel. A third alternative is to have the main R session simply log in to the machine when needed and kill the given set of workers. However, these type of features may better be suited for a separate backend package (think parallel, snow, processx, sys, batchtools, ..., new_backend_pkg). |
Another option is to make cancellation support optional for future backends, and have the cancellation function return a logical vector indicating which futures were actually cancelled (and probably issue a warning if it can't cancel something). The worst case for a future that can't be cancelled is that it finishes running and then the result is discarded, which is fairly benign in many cases. Presumably you'd also want a function to assert that the current backend supports cancellation so you can check that before starting a future that you can't cancel. |
I was using this function
Here is and example (derived from original examples)
|
Just an FYI, in relationship to this, I just filed bug report PR17395 with a patch for |
UPDATE: The |
Hello, External killA tad hacky and obviously needs some logic before promise = future(myFunctionWichDoesAsyncStuff())
async_pid = promise$job$pid
system(paste("kill -9", async_pid)) Submit and retrieve with the
|
I am in a similar situation to stop async process. The promise from the future() is piped to the file_rows() reactive output value. Based on the tip by @raphaelvannson i tried the below way to stop the future.
However, it returns the error: "Warning: Error in observeEventHandler: object 'fut' not found" when promise is given as fut<<- future() in the above code then it returns the below error: sh: 1: kill: Usage: kill [-s sigspec | -signum | -sigspec] [pid | job]... or Could someone hint what is wrong here? |
The usage message is telling you that you are calling |
This is not doing anything and the log file is also empty. It does not look like the issue is with
The log file has the below message:
It is terminating the pid running "/opt/shiny-server/R/SockJSAdapter.R" but not terminating the PID running the It looks like the problem could be in fetching the PID of future() or the syntax of using future/promises in the above code. |
Oh, you're right, I forgot paste defaults to using a space in between. Still, the error message you got from |
It only prints the PID given by |
In the meantime the above question from @Mehar-git was also carried to Stackoverflow: please see my Edit part.
I guess this is the case because the fututre isn't interrupt aware as mentioned by @HenrikBengtsson earlier in this thread. Maybe someone can point me in the right direction for the implementation of using |
Don't have a solution and very little time to look into this, but note that when the promises package is involved, things are getting much more complicated because you are now also dealing with a background, asynchronous event loop (by the later package). That becomes rather complicated to troubleshoot and you are likely to run into issues that are hard to replicate (e.g. "works a few times but then fails").
Correct that there is no support for terminating/interrupting futures in the core Future API. Also, the different future backends hasn't really been implemented making them agile to the R workers failing/terminating - but some try to do post-mortem analysis involving trying to detect when a background R process is gone. |
Thanks! I found that using stopFuture(ft)
try(value(ft)) cleans > library(parallel)
> parallel:::children()
list() |
Background / Question
@gdevailly, sorry I didn't see your June 16 question on Twitter until now;
The simple answer is that this is not possible.
CLARIFICATION 2018-03-15: but I hope we can add something that can be used (e.g. although not part of the official Future API there could this be functions that the end user can use to manually kill workers)
Ideas / Thoughts
Technically, it should be possible to interrupt and even terminate background R processes that evaluates a future. At least if they run on the local machine.
For instance, for multicore futures we already have an internal handle for the process ID (pid) of the forked process. For multisession futures, we could retrieve the pid for each cluster worker before launching the actual future. With this pid, we should be able to send an interrupt signal, which should signal an
interrupt
condition within R for that process. What complicates this is that we need to come up with a platform-independent method for terminating/signalling processes. We most likely need to reach out to asystem()
call for this. This should be doable, but increases the risk for not working everywhere.Signalling a process running on another machine is a bit more complicated. It would basically require being able to launch a separate signalling process on that same machine. Not impossible, but also not guaranteed to work, e.g. maybe the future process already running occupies the last available port / socket for that machine.
For futures running on a cluster via a job scheduler (as created by the future.BatchJobs package) it should also be possible to terminate such futures / jobs using the
killJob()
functionality already provided by the BatchJobs package.That's just some of my thoughts for now.
The text was updated successfully, but these errors were encountered: