What is future.lapply(..., future.lazy = TRUE) supposed to do? #1

HenrikBengtsson · 2017-12-06T21:36:18Z

This issue is based on the question/discussion in HenrikBengtsson/future#179.

When first implementing future_lapply() I added future.nnn arguments to expose the corresponding nnn arguments of the future() function, e.g. future.globals. In part of this process, I also added future.lazy to control future(..., lazy = future.lazy). However, give that future_lapply() returns values (not futures), it is not obvious/clear what purpose this argument has. In other words, is there a difference between the default:

y <- future_lapply(x, fun, future.lazy = FALSE)

and

y <- future_lapply(x, fun, future.lazy = TRUE)

Are there use cases where it matters/is needed? Can/should the future.lazy argument be dropped?

PS. The overhead of having this argument is zero.

The text was updated successfully, but these errors were encountered:

yonicd · 2018-06-29T15:48:16Z

I started to use this option now (on sge), expecting it to run the jobs and free up the console for other things. is there a way to do that in future.*apply? couldnt the whole list be the promise in this case?

HenrikBengtsson · 2018-06-29T16:16:01Z

You're looking for a feature making future_*apply() functions non-blocking. Unfortunately, that's a different thing than lazy evaluation. Lazy evaluation of a future is about starting the evaluation of the future (=tasks/jobs) only when you explicitly request its value. This is sometimes handy, but more common for individual futures. This is asking the question: what difference does it make (to set future.lazy = TRUE), because at the end, future_*apply() will still collect the values and thereby block.

To make a non-blocking future_*apply() call, you can do:

plan(list(multiprocess, batchtools_sge))
y %<-% future_lapply(X, FUN = my_fun)

This will cause the first layer of futures (y %<-% { ... }) to be processed in the background on your local machine, and the second layer (in future_lapply()) in the background via batchtools/SGE.

As soon as you "touch" (e.g. print) y, it will block until future_lapply() is complete and its value has been collected.

yonicd · 2018-06-29T17:04:11Z

thank you for the quick response.

some clarification for me (sorry if this is basic stuff)

the jobs wont get sent until i touch the object.

then the console will be blocked on the master while they are running? or do you mean that if i touch y and the process is still running i will need to wait?

yonicd · 2018-06-29T17:10:55Z

it also looks like the template gets lost in the mix

> future::plan(list(multiprocess, future.batchtools::batchtools_sge),
+              template = 'batchtools.sge-new.tmpl')
> Y1 %<-% future_lapply(rep(30, 10),
+                    FUN = function(nr){solve( matrix(rnorm(nr^2), nrow=nr, ncol=nr))},
+                    future.scheduling = 3)
> x <- Y1
Error in Y1 %<-% future_lapply(rep(30, 10), FUN = function(nr) { : 
  Assertion on 'template' failed: May not be NA.

HenrikBengtsson · 2018-06-29T17:13:49Z

the jobs wont get sent until i touch the object.
then the console will be blocked on the master while they are running? or do you mean that if i touch y and the process is still running i will need to wait?

No, all futures (both layers) will use "eager" evaluation by default (in contrast to "lazy"). This means, that they will start processing immediately. OTH, if you'd ask the first layer to be resolved lazily as in:

plan(list(multiprocess, batchtools_sge))
y %<-% { future_lapply(X, FUN = my_fun) } %lazy% TRUE

then the first layer of futures - the one can evaluates future_lapply(X, FUN = my_fun) - would not be started until you "touch"/"look" at y. As soon as you'd touch y, it would try to get the value of that future. I just used %<-% in my example because it's more convenient here; the above would be equivalent to:

plan(list(multiprocess, batchtools_sge))
fy <- future({ future_lapply(X, FUN = my_fun) }, lazy = TRUE)

and here it's more clear that it's basically just creating a future fy that sits there an waits to get started. It starts only when you do:

y <- value(fy)

Hope this clarifies it.

yonicd · 2018-06-29T17:16:30Z

i'll keep plugging away with the example you gave.

thank you!

HenrikBengtsson · 2018-06-29T17:17:08Z

it also looks like the template gets lost in the mix

future::plan(list(multiprocess, future.batchtools::batchtools_sge),
+              template = 'batchtools.sge-new.tmpl')

You want to use the tweak() here:

library(future)
plan(list(
  multiprocess,
  tweak(future.batchtools::batchtools_sge, template = 'batchtools.sge-new.tmpl')
))

That's works as if you'd created your own custom future plan. You can also write the above as:

my_sge <- tweak(future.batchtools::batchtools_sge, template = 'batchtools.sge-new.tmpl')
plan(list(multiprocess, my_sge))

yonicd · 2018-06-29T17:33:36Z

works! thanks :).

small last questions that are eluding me... can i route the output from /.future to an exposed location and how do I pass job.name into <%= job.name %> from the R side api?

thanks again!

HenrikBengtsson · 2018-06-29T18:16:25Z

Nothing yet, but hopefully soon;

can i route the output from /.future to an exposed location and

Issue HenrikBengtsson/future#232

how do I pass job.name into <%= job.name %> from the R side api?

Issue #15

yonicd · 2018-06-29T18:17:01Z

thank you

metabiota-vikram · 2020-08-25T19:22:13Z

@HenrikBengtsson

Hi Henrik - This is an old discussion but I have a quick question along the same lines. As you indicated, like the the older implementation future::future_lapply, future.apply::future_lapply is also blocking.

But I have noticed say with a cluster of servers controlled by a main server with future::plan strategy = cluster (earlySignal by default seems to be FALSE), and run a job spread across each of the individual servers in the cluster, when the job is initiated across all the servers in the cluster with the future.future_lapply call, it starts out blocking as expected. But when one of the servers in the cluster is terminated (called away by the cloud provider) and hence the worker dies unexpectedly, future.apply::future_lapply returns. The individual jobs on the other servers are still running but since the function returns because future_lapply returns in this case the downstream script starts processing when it should wait till the entire job spanning across the cluster of servers is complete.

base-r - 4.0.1 (main server and all workers)
future.apply - 1.6.0
future - 1.17.0

Is this expected?

Much thanks,

HenrikBengtsson · 2021-09-27T05:23:35Z

I've decided to remove this argument, cf. #94. Closing this one.

)

HenrikBengtsson added the question label Dec 6, 2017

HenrikBengtsson mentioned this issue Dec 7, 2017

How to use future_lapply's future.lazy? HenrikBengtsson/future#179

Closed

HenrikBengtsson added this to the Future release (not next) milestone Jan 15, 2018

HenrikBengtsson mentioned this issue Jun 29, 2018

setting .future as defaultRegistery HenrikBengtsson/future.batchtools#25

Open

danschrage mentioned this issue May 24, 2019

Non-blocking, local evaluation of future_lapply()? #44

Open

HenrikBengtsson mentioned this issue Sep 27, 2021

CLEANUP: Remove argument future.lazy #94

Closed

HenrikBengtsson closed this as completed Sep 27, 2021

HenrikBengtsson added a commit that referenced this issue Sep 28, 2021

Removed moot argument 'future.lazy' from all functions (fixes #94 and #1

4c518ab

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is future.lapply(..., future.lazy = TRUE) supposed to do? #1

What is future.lapply(..., future.lazy = TRUE) supposed to do? #1

HenrikBengtsson commented Dec 6, 2017

yonicd commented Jun 29, 2018 •

edited

HenrikBengtsson commented Jun 29, 2018

yonicd commented Jun 29, 2018 •

edited

yonicd commented Jun 29, 2018

HenrikBengtsson commented Jun 29, 2018

yonicd commented Jun 29, 2018

HenrikBengtsson commented Jun 29, 2018

yonicd commented Jun 29, 2018

HenrikBengtsson commented Jun 29, 2018

yonicd commented Jun 29, 2018

metabiota-vikram commented Aug 25, 2020 •

edited

HenrikBengtsson commented Sep 27, 2021

What is future.lapply(..., future.lazy = TRUE) supposed to do? #1

What is future.lapply(..., future.lazy = TRUE) supposed to do? #1

Comments

HenrikBengtsson commented Dec 6, 2017

yonicd commented Jun 29, 2018 • edited

HenrikBengtsson commented Jun 29, 2018

yonicd commented Jun 29, 2018 • edited

yonicd commented Jun 29, 2018

HenrikBengtsson commented Jun 29, 2018

yonicd commented Jun 29, 2018

HenrikBengtsson commented Jun 29, 2018

yonicd commented Jun 29, 2018

HenrikBengtsson commented Jun 29, 2018

yonicd commented Jun 29, 2018

metabiota-vikram commented Aug 25, 2020 • edited

HenrikBengtsson commented Sep 27, 2021

yonicd commented Jun 29, 2018 •

edited

yonicd commented Jun 29, 2018 •

edited

metabiota-vikram commented Aug 25, 2020 •

edited