Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove multicore worker if process crashed #677

Closed
HenrikBengtsson opened this issue Apr 18, 2023 · 1 comment
Closed

Remove multicore worker if process crashed #677

HenrikBengtsson opened this issue Apr 18, 2023 · 1 comment
Milestone

Comments

@HenrikBengtsson
Copy link
Owner

Issue

If a multicore future terminates the underlying forked R process, then
it occupies one of the worker slots.

Example:

library(future)

plan(multicore, workers = 4)
stopifnot(nbrOfWorkers() == 4)
stopifnot(nbrOfFreeWorkers() == 4)

f <- future({ Sys.sleep(2) })
stopifnot(nbrOfWorkers() == 4)
stopifnot(nbrOfFreeWorkers() == 3)

v <- value(f)
stopifnot(nbrOfWorkers() == 4)
stopifnot(nbrOfFreeWorkers() == 4)

f <- future({ tools::pskill(Sys.getpid()) })
stopifnot(nbrOfWorkers() == 4)
stopifnot(nbrOfFreeWorkers() == 3)

res <- tryCatch({
  v <- value(f)
}, error = identity)
stopifnot(inherits(res, "FutureError"))
conditionMessage(res)

## [1] "Failed to retrieve the result of MulticoreFuture (<none>) 
## from the forked worker (on localhost; PID 1632517). Post-mortem
## diagnostic: No process exists with this PID, i.e. the forked 
## localhost worker is no longer alive"

stopifnot(nbrOfWorkers() == 4)
stopifnot(nbrOfFreeWorkers() == 4)  ## FAIL; here we're stuck as 3

Suggestion

Detect when forked process is terminated (cf. post-mortem analysis), and remove the corresponding MulticoreFuture from the internal FutureRegistry to free up the slot.

This should be safe to do for multicore futures, because they're transient R processes.

@HenrikBengtsson HenrikBengtsson added this to the Next release milestone Apr 18, 2023
HenrikBengtsson added a commit that referenced this issue Apr 19, 2023
…crashed, it did not release the corresponding parallel-worker slot [#677]
@HenrikBengtsson
Copy link
Owner Author

Implemented; a "crashed" multicore future is now fully released making its "slot" available again.

HenrikBengtsson added a commit to HenrikBengtsson/future.callr that referenced this issue Apr 19, 2023
…rashed,

the corresponding parallel-worker slot was never released.
Related to HenrikBengtsson/future#677
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant