/
registerDoFuture.R
242 lines (234 loc) · 10.7 KB
/
registerDoFuture.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
#' Use the Foreach `%dopar%` Adapter with Futures
#'
#' The `registerDoFuture()` function makes the
#' \code{\link[foreach:\%dopar\%]{\%dopar\%}} operator of the
#' \pkg{foreach} package to process foreach iterations via any of
#' the future backends supported by the \pkg{future} package, which
#' includes various parallel and distributed backends.
#' In other words, _if a computational backend is supported via
#' the Future API, it'll be automatically available for all functions
#' and packages making using the \pkg{foreach} framework._
#' Neither the developer nor the end user has to change any code.
#'
#' @section Parallel backends:
#' To use futures with the \pkg{foreach} package and its
#' \code{\link[foreach:\%dopar\%]{\%dopar\%}} operator, use
#' `doFuture::registerDoFuture()` to register \pkg{doFuture} to be
#' used as a `%dopar%` adapter. After this, `%dopar%` will
#' parallelize with whatever \pkg{future} backend is set by
#' [future::plan()].
#'
#' The built-in \pkg{future} backends are always available, e.g.
#' \link[future:sequential]{sequential} (sequential processing),
#' \link[future:multicore]{multicore} (forked processes),
#' \link[future:multisession]{multisession} (background R sessions),
#' and \link[future:cluster]{cluster} (background R sessions on
#' local and remote machines).
#' For example, `plan(multisession)` will make `%dopar%`
#' parallelize via R processes running in the background on the
#' local machine, and
#' `plan(cluster, workers = c("n1", "n2", "n2", "n3"))` will
#' parallelize via R processes running on external machines.
#'
#' Additional backends are provided by other future-compliant
#' packages. For example, the \pkg{future.batchtools} package
#' provides support for high-performance compute (HPC) cluster
#' schedulers such as SGE, Slurm, and TORQUE / PBS.
#' As an illustration, `plan(batchtools_slurm)` will parallelize
#' by submitting the foreach iterations as tasks to the Slurm
#' scheduler, which in turn will distribute the tasks to one
#' or more compute nodes.
#'
#' @section Global variables and packages:
#' Unless running locally in the global environment (= at the \R prompt),
#' the \pkg{foreach} package requires you do specify what global variables
#' and packages need to be available and attached in order for the
#' "foreach" expression to be evaluated properly. It is not uncommon to
#' get errors on one or missing variables when moving from running a
#' \code{res <- foreach() \%dopar\% { ... }} statement on the local machine
#' to, say, another machine on the same network. The solution to the
#' problem is to explicitly export those variables by specifying them in
#' the `.export` argument to [foreach::foreach()],
#' e.g. `foreach(..., .export = c("mu", "sigma"))`. Likewise, if the
#' expression needs specific packages to be attached, they can be listed
#' in argument `.packages` of `foreach()`.
#'
#' When using `registerDoFuture()`, the above becomes less
#' critical, because by default the Future API identifies all globals and
#' all packages automatically (via static code inspection). This is done
#' exactly the same way regardless of future backend.
#' This automatic identification of globals and packages is illustrated
#' by the below example, which does _not_ specify
#' `.export = c("my_stat")`. This works because the future framework
#' detects that function `my_stat()` is needed and makes sure it is
#' exported. If you would use, say, `cl <- parallel::makeCluster(2)`
#' and `doParallel::registerDoParallel(cl)`, you would get a run-time
#' error on \code{Error in \{ : task 1 failed - \"could not find function "my_stat" ...}.
#'
#' Having said this, note that, in order for your "foreach" code to work
#' everywhere and with other types of foreach adapters as well, you may
#' want to make sure that you always specify arguments `.export`
#' and `.packages`.
#'
#' @section Load balancing ("chunking"):
#' Whether load balancing ("chunking") should take place or not can be
#' controlled by specifying either argument
#' `.options.future = list(scheduling = <ratio>)` or
#' `.options.future = list(chunk.size = <count>)` to `foreach()`.
#'
#' The value `chunk.size` specifies the average number of elements
#' processed per future ("chunks").
#' If `+Inf`, then all elements are processed in a single future (one worker).
#' If `NULL`, then argument `future.scheduling` is used.
#'
#' The value `scheduling` specifies the average number of futures
#' ("chunks") that each worker processes.
#' If `0.0`, then a single future is used to process all iterations;
#' none of the other workers are not used.
#' If `1.0` or `TRUE`, then one future per worker is used.
#' If `2.0`, then each worker will process two futures (if there are
#' enough iterations).
#' If `+Inf` or `FALSE`, then one future per iteration is used.
#' The default value is `scheduling = 1.0`.
#'
#' The name of `foreach()` argument `.options.future` follows the naming
#' conventions of the \pkg{doMC}, \pkg{doSNOW}, and \pkg{doParallel} packages,
#. i.e. `.options.multicore` and `.options.snow`.
#' _This argument should not be mistaken for the \R
#' \link[future:future.options]{options of the future package}_.
#'
#' For backward-compatibility reasons with existing foreach code, one may
#' also use arguments `.options.multicore = list(preschedule = <logical>)` and
#' `.options.snow = list(preschedule = <logical>)` when using \pkg{doFuture}.
#" Using the latter corresponds to the following `.options.future` settings:
#' `.options.multicore = list(preschedule = TRUE)` is equivalent to
#' `.options.future = list(scheduling = 1.0)` and
#' `.options.multicore = list(preschedule = FALSE)` is equivalent to
#' `.options.future = list(scheduling = +Inf)`.
#' and analogously for `.options.snow`.
#' Argument `.options.future` takes precedence over argument
#' `.option.multicore` which takes precedence over argument `.option.snow`,
#' when it comes to chunking.
#'
#' @section Random Number Generation (RNG):
#' The doFuture adapter registered by `registerDoFuture()` does _not_ itself
#' provide a framework for generating proper random numbers in parallel.
#' This is a deliberate design choice based on how the foreach ecosystem is
#' set up and to align it with other foreach adapters, e.g. **doParallel**.
#' To generate statistically sound parallel RNG, it is recommended to use
#' the \pkg{doRNG} package, where the \code{\link[doRNG:\%dorng\%]{\%dorng\%}}
#' operator is used in place of \code{\link[foreach:\%dopar\%]{\%dopar\%}}.
#' For example,
#'
#' ```r
#' y <- foreach(i = 1:3) %dorng% {
#' rnorm(1)
#' }
#' ```
#'
#' This works because \pkg{doRNG} is designed to work with any type of foreach
#' `%dopar%` adapter including the one provided by \pkg{doFuture}.
#'
#' If you forget to use `%dorng%` instead of `%dopar%` when the foreach
#' iteration generates random numbers, \pkg{doFuture} will detect the
#' mistake and produce an informative warning.
#'
#' @section For package developers:
#' Please refrain from modifying the foreach backend inside your package or
#' functions, i.e. do not call any `registerNnn()` in your code. Instead,
#' leave the control on what backend to use to the end user. This idea is
#' part of the core philosophy of the \pkg{foreach} framework.
#'
#' However, if you think it necessary to register the \pkg{doFuture} backend
#' in a function, please make sure to undo your changes when exiting the
#' function. This can be done using:
#'
#' \preformatted{
#' oldDoPar <- registerDoFuture()
#' on.exit(with(oldDoPar, foreach::setDoPar(fun=fun, data=data, info=info)), add = TRUE)
#' [...]
#' }
#'
#' This is important, because the end-user might have already registered a
#' foreach backend elsewhere for other purposes and will most likely not known
#' that calling your function will break their setup.
#' _Remember, your package and its functions might be used in a greater
#' context where multiple packages and functions are involved and those might
#' also rely on the foreach framework, so it is important to avoid stepping on
#' others' toes._
#'
#' @section Reporting on progress:
#' How to report on progress is a frequently asked question, especially
#' in long-running tasks and parallel processing. The **foreach**
#' framework does _not_ have a built-in mechanism for progress
#' reporting(*).
#'
#' When using **doFuture**, and the Futureverse in general, for
#' processing, the **progressr** package can be used to signal progress
#' updates in a near-live fashion. There is special argument related to
#' `foreach()` or **doFuture** to achieve this. Instead, one calls a
#' a, so called, "progressor" function within each iteration. See
#' the [**progressr**](https://cran.r-project.org/package=progressr)
#' package and its `vignette(package = "progressr")` for examples.
#'
#' (*) The legacy **doSNOW** package uses a special `foreach()` argument
#' `.options.doSNOW$progress` that can be used to make a progress update
#' each time results from a parallel workers is returned. This approach
#' is limited by how chunking works, requires the developer to set that
#' argument, and the code becomes incompatible with foreach adaptors
#' registered by other **doNnn** packages.
#'
#' @return
#' `registerDoFuture()` returns, invisibly, the previously registered
#' foreach `%dopar%` backend.
#'
#' @example incl/doFuture.R
#'
#' @importFrom future nbrOfWorkers
#' @importFrom foreach setDoPar
#' @importFrom utils packageVersion
#' @export
#' @keywords utilities
registerDoFuture <- function() { #nolint
info <- function(data, item) {
switch(item,
name = "doFuture",
version = packageVersion("doFuture"),
workers = nbrOfWorkers(),
)
}
## Tell doRNG (>= 1.8.2) to not check the RNG type
value <- getOption("doRNG.rng_change_warning_skip")
if (!isTRUE(value)) {
if (isFALSE(value)) {
warning("doRNG option 'doRNG.rng_change_warning_skip' was set to FALSE, which was overridden by doFuture::registerDoFuture()")
value <- NULL
}
## Append to existing character vector, if any
value <- unique(c(value, "doFuture"))
options(doRNG.rng_change_warning_skip = value)
}
## WORKAROUND:
## Until https://github.com/RevolutionAnalytics/foreach/issues/19
## is supported. /HB 2020-12-28
oldDoPar <- .getDoPar()
setDoPar(doFuture, info = info)
invisible(oldDoPar)
}
.getDoPar <- function() {
ns <- getNamespace("foreach")
.foreachGlobals <- get(".foreachGlobals", envir = ns)
if (exists("fun", envir = .foreachGlobals, inherits = FALSE)) {
structure(list(
fun = .foreachGlobals$fun,
data = .foreachGlobals$data,
info = .foreachGlobals$info
), class = "DoPar")
} else {
structure(list(
fun = get("doSEQ", mode = "function", envir = ns),
data = NULL,
info = NULL
), class = c("DoPar", "DoSeq"))
}
}