BENCHMARKING: Record timing and memory stats for the various steps in futures #59

HenrikBengtsson · 2016-03-14T21:59:10Z

Record timing and memory stats for the various steps in futures, e.g. creation, identification of global variables, exporting globals, launching future, time for future to complete, collection of value/exceptions etc. This should be done (optional?) for all types of futures.

HenrikBengtsson · 2016-06-11T19:26:11Z

For memory profile, we have utils::Rprofmem() which logs all memory allocations done by allocateVector() (part of the native R API). Note that the information logged by Rprofmem() could be improved, but that would require updates to core R, cf. HenrikBengtsson/Wishlist-for-R#25

HenrikBengtsson · 2017-02-26T21:58:02Z

See also snow::snow.time().

HenrikBengtsson · 2017-12-13T01:44:10Z

"Note to self": This will most likely require an update to the Future API. More precisely, a backend needs to return not only the value but also other information. Currently, backends return the value (or errors) "as is". In order to return other information, this needs to be updated, e.g. list(value = value, benchmarks = benchmarks).

UPDATE 2018-12-27: This part was resolved in future 1.8.0 (2018-04-08) where FutureResult was introduced.

…m resolved futures [#25, #59, #67, #154, #188 #199, #200]. This also fixes #199 and #200.

sethberry · 2018-05-31T22:08:07Z

Hi Henrik - Curious if you have done any benchmarking on what the memory usage is when multiple R sessions are generated (ie, multisession) using a future? Do the new R sessions get new RAM allocations in Windows or are they constrained by the initial R session’s RAM allocation? Have used tcltk::tclTaskSchedule in the past to do similar non blocking parallelization and ran into this RAM issue. Just curious if you’ve run across this at all? Any insight you might have would be greatly appreciated. Your package looks a like it might be better than going the task scheduler route. Thx, Seth

HenrikBengtsson · 2018-06-14T01:53:54Z

Hi. When using multisession, a new R session is launched in the background (basically as if you'd start another R session manually). If you launch a vanilla R session you can use the Windows Task Manager to see how much memory that consumes. That'll be your lower-bound memory consumption per background workers. Then, if you parallel code uses functions in packages, then those packages will be loaded/attached as well. That'll consume additional memory (look at Task Manager) - again, this is per worker. On top of that, you'll find that "input data" (arguments and global variables) that the future expression needs, will be exported to the workers, which adds to the memory usage. Because multisession workers live over multiple futures, that is, they don't shut down immediately after a future is resolved, any packages loaded will stay loaded in those workers. However, input data and other created objects will be erased and garbage collected as soon as each worker is done with a future - that helps to keep the memory down.

As a rule of thumb, there is no magic parallelization method/framework in R that is more memory efficient that others. I often assume they all use roughly the same amount. Making sure to rm() large objects no longer needed and avoid coercions is often the best way to keep the memory usage down - regardless of sequential or parallel processing.

Also, some people argue that forked processes (used by multicore futures, mclapply(), ... - so not Windows) may consume less memory because of the "shared memory" property of process forking. However, it has been shown/mentioned several that R's garbage collector can really mess this up - if the garbage collector starts running in one of the forked child processes, or the master process, then that originally shared memory can no longer be shared and the operating system starts copying memory blocks into each child process. Since the garbage collector runs whenever it wants to, there is no simple way to avoid this.

Also, just in case your tcltk::tclTaskSchedule approach relied on it: In R (< 3.3.2) there was an inefficiency in the parallel package (used my multisession futures) causing the workers to hold on to results from previous calls. For large results, that meant that each worker consumed twice the amount of memory as really need. We have since fixed that (HenrikBengtsson/Wishlist-for-R#27).

Hope this helps.

HenrikBengtsson added the enhancement label Mar 14, 2016

HenrikBengtsson changed the title ~~BENCHMARKING: Record time stats for the various steps in futures~~ BENCHMARKING: Record timing stats for the various steps in futures Mar 14, 2016

HenrikBengtsson changed the title ~~BENCHMARKING: Record timing stats for the various steps in futures~~ BENCHMARKING: Record timing and memory stats for the various steps in futures Mar 27, 2016

HenrikBengtsson mentioned this issue May 25, 2016

MEMORY: Garbage collect future process after value has been collected #69

Closed

HenrikBengtsson added the feature request label Jul 8, 2016

HenrikBengtsson modified the milestone: Future release (not next) Jul 27, 2016

HenrikBengtsson mentioned this issue May 19, 2017

runtime performance comparison: future vs. pbmclapply vs. mclapply vs. foreach #146

Closed

This was referenced Oct 23, 2017

Allowing future's to timeout #169

Closed

DESIGN: Future API - Minimal/Core/Essential API and Extended/Optional API #172

Open

HenrikBengtsson modified the milestones: Future release (not next), Next release Nov 8, 2017

HenrikBengtsson mentioned this issue Nov 17, 2017

Collect MaxRSS/Elapsed from sacct when using SLURM? HenrikBengtsson/future.batchtools#16

Open

HenrikBengtsson modified the milestones: Next release, Future release (not next) Dec 20, 2017

HenrikBengtsson modified the milestones: Future release (not next), Next release Feb 10, 2018

HenrikBengtsson mentioned this issue Feb 15, 2018

ROBUSTNESS: Produce an error (not NULL) when a forked (mc)process crashes #199

Closed

HenrikBengtsson added a commit that referenced this issue Feb 23, 2018

Introducing FutureResult for returning richer sets of information fro…

f8a9afe

…m resolved futures [#25, #59, #67, #154, #188 #199, #200]. This also fixes #199 and #200.

HenrikBengtsson modified the milestones: Next release, Future release (not next) Mar 25, 2018

HenrikBengtsson mentioned this issue Jun 14, 2018

plan(list(tweak(multicore, workers = N), tweak(multicore, workers = N))); future(...) fails with illegal error message #231

Closed

HenrikBengtsson mentioned this issue Jul 26, 2018

Fitting large models in parallel #198

Closed

HenrikBengtsson mentioned this issue Feb 12, 2019

steps to speed up job submission? HenrikBengtsson/future.batchtools#36

Open

HenrikBengtsson mentioned this issue Nov 14, 2019

Why is doFuture/multicore have different behaviour than doMC HenrikBengtsson/doFuture#39

Closed

HenrikBengtsson mentioned this issue Jun 1, 2020

Add arguments future.stdout (and future.stderr) HenrikBengtsson/future.apply#26

Closed

HenrikBengtsson added the feature/profiling label Dec 26, 2020

schloerke mentioned this issue May 5, 2021

future + promises within Docker is slow rstudio/plumber#802

Closed

philipp-baumann mentioned this issue Dec 7, 2022

💾 We can probably reduce the memory footprint (alloc read & parsed size) spectral-cockpit/opusreader2#52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BENCHMARKING: Record timing and memory stats for the various steps in futures #59

BENCHMARKING: Record timing and memory stats for the various steps in futures #59

HenrikBengtsson commented Mar 14, 2016

HenrikBengtsson commented Jun 11, 2016

HenrikBengtsson commented Feb 26, 2017

HenrikBengtsson commented Dec 13, 2017 •

edited

Loading

sethberry commented May 31, 2018

HenrikBengtsson commented Jun 14, 2018

BENCHMARKING: Record timing and memory stats for the various steps in futures #59

BENCHMARKING: Record timing and memory stats for the various steps in futures #59

Comments

HenrikBengtsson commented Mar 14, 2016

HenrikBengtsson commented Jun 11, 2016

HenrikBengtsson commented Feb 26, 2017

HenrikBengtsson commented Dec 13, 2017 • edited Loading

sethberry commented May 31, 2018

HenrikBengtsson commented Jun 14, 2018

HenrikBengtsson commented Dec 13, 2017 •

edited

Loading