Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is future.lapply(..., future.lazy = TRUE) supposed to do? #1

Closed
HenrikBengtsson opened this issue Dec 6, 2017 · 12 comments
Closed

Comments

@HenrikBengtsson
Copy link
Owner

This issue is based on the question/discussion in HenrikBengtsson/future#179.

When first implementing future_lapply() I added future.nnn arguments to expose the corresponding nnn arguments of the future() function, e.g. future.globals. In part of this process, I also added future.lazy to control future(..., lazy = future.lazy). However, give that future_lapply() returns values (not futures), it is not obvious/clear what purpose this argument has. In other words, is there a difference between the default:

y <- future_lapply(x, fun, future.lazy = FALSE)

and

y <- future_lapply(x, fun, future.lazy = TRUE)

Are there use cases where it matters/is needed? Can/should the future.lazy argument be dropped?

PS. The overhead of having this argument is zero.

@yonicd
Copy link

yonicd commented Jun 29, 2018

I started to use this option now (on sge), expecting it to run the jobs and free up the console for other things. is there a way to do that in future.*apply? couldnt the whole list be the promise in this case?

@HenrikBengtsson
Copy link
Owner Author

You're looking for a feature making future_*apply() functions non-blocking. Unfortunately, that's a different thing than lazy evaluation. Lazy evaluation of a future is about starting the evaluation of the future (=tasks/jobs) only when you explicitly request its value. This is sometimes handy, but more common for individual futures. This is asking the question: what difference does it make (to set future.lazy = TRUE), because at the end, future_*apply() will still collect the values and thereby block.

To make a non-blocking future_*apply() call, you can do:

plan(list(multiprocess, batchtools_sge))
y %<-% future_lapply(X, FUN = my_fun)

This will cause the first layer of futures (y %<-% { ... }) to be processed in the background on your local machine, and the second layer (in future_lapply()) in the background via batchtools/SGE.

As soon as you "touch" (e.g. print) y, it will block until future_lapply() is complete and its value has been collected.

@yonicd
Copy link

yonicd commented Jun 29, 2018

thank you for the quick response.

some clarification for me (sorry if this is basic stuff)

the jobs wont get sent until i touch the object.

then the console will be blocked on the master while they are running? or do you mean that if i touch y and the process is still running i will need to wait?

@yonicd
Copy link

yonicd commented Jun 29, 2018

it also looks like the template gets lost in the mix

> future::plan(list(multiprocess, future.batchtools::batchtools_sge),
+              template = 'batchtools.sge-new.tmpl')
> Y1 %<-% future_lapply(rep(30, 10),
+                    FUN = function(nr){solve( matrix(rnorm(nr^2), nrow=nr, ncol=nr))},
+                    future.scheduling = 3)
> x <- Y1
Error in Y1 %<-% future_lapply(rep(30, 10), FUN = function(nr) { : 
  Assertion on 'template' failed: May not be NA.

@HenrikBengtsson
Copy link
Owner Author

the jobs wont get sent until i touch the object.
then the console will be blocked on the master while they are running? or do you mean that if i touch y and the process is still running i will need to wait?

No, all futures (both layers) will use "eager" evaluation by default (in contrast to "lazy"). This means, that they will start processing immediately. OTH, if you'd ask the first layer to be resolved lazily as in:

plan(list(multiprocess, batchtools_sge))
y %<-% { future_lapply(X, FUN = my_fun) } %lazy% TRUE

then the first layer of futures - the one can evaluates future_lapply(X, FUN = my_fun) - would not be started until you "touch"/"look" at y. As soon as you'd touch y, it would try to get the value of that future. I just used %<-% in my example because it's more convenient here; the above would be equivalent to:

plan(list(multiprocess, batchtools_sge))
fy <- future({ future_lapply(X, FUN = my_fun) }, lazy = TRUE)

and here it's more clear that it's basically just creating a future fy that sits there an waits to get started. It starts only when you do:

y <- value(fy)

Hope this clarifies it.

@yonicd
Copy link

yonicd commented Jun 29, 2018

i'll keep plugging away with the example you gave.

thank you!

@HenrikBengtsson
Copy link
Owner Author

it also looks like the template gets lost in the mix

future::plan(list(multiprocess, future.batchtools::batchtools_sge),
+              template = 'batchtools.sge-new.tmpl')

You want to use the tweak() here:

library(future)
plan(list(
  multiprocess,
  tweak(future.batchtools::batchtools_sge, template = 'batchtools.sge-new.tmpl')
))

That's works as if you'd created your own custom future plan. You can also write the above as:

my_sge <- tweak(future.batchtools::batchtools_sge, template = 'batchtools.sge-new.tmpl')
plan(list(multiprocess, my_sge))

@yonicd
Copy link

yonicd commented Jun 29, 2018

works! thanks :).

small last questions that are eluding me... can i route the output from /.future to an exposed location and how do I pass job.name into <%= job.name %> from the R side api?

thanks again!

@HenrikBengtsson
Copy link
Owner Author

Nothing yet, but hopefully soon;

can i route the output from /.future to an exposed location and

Issue HenrikBengtsson/future#232

how do I pass job.name into <%= job.name %> from the R side api?

Issue #15

@yonicd
Copy link

yonicd commented Jun 29, 2018

thank you

@metabiota-vikram
Copy link

metabiota-vikram commented Aug 25, 2020

@HenrikBengtsson

Hi Henrik - This is an old discussion but I have a quick question along the same lines. As you indicated, like the the older implementation future::future_lapply, future.apply::future_lapply is also blocking.

But I have noticed say with a cluster of servers controlled by a main server with future::plan strategy = cluster (earlySignal by default seems to be FALSE), and run a job spread across each of the individual servers in the cluster, when the job is initiated across all the servers in the cluster with the future.future_lapply call, it starts out blocking as expected. But when one of the servers in the cluster is terminated (called away by the cloud provider) and hence the worker dies unexpectedly, future.apply::future_lapply returns. The individual jobs on the other servers are still running but since the function returns because future_lapply returns in this case the downstream script starts processing when it should wait till the entire job spanning across the cluster of servers is complete.

base-r - 4.0.1 (main server and all workers)
future.apply - 1.6.0
future - 1.17.0

Is this expected?

Much thanks,

@HenrikBengtsson
Copy link
Owner Author

I've decided to remove this argument, cf. #94. Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants