Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

furrr functions within functions #26

Closed
Fritzimponis opened this issue Jul 29, 2018 · 6 comments
Closed

furrr functions within functions #26

Fritzimponis opened this issue Jul 29, 2018 · 6 comments

Comments

@Fritzimponis
Copy link

Hi Davis,

Apologies for the question cause I am clearly missing something here and I would appreciate your help in understanding the usage of furrr functions within functions.

The below function fails and I do understand that this is the expected behavior

  x <- c(1,2)
  y <- 2
  future_map(.x = x, .f = ~ .x + y, .options = future_options(globals = "x"))

However in the below example it seems to be working fine and I don't really understand why all the objects defined within a function are essentially deemed to be "globals".

test_fn <- function() {
  x <- c(1,2)
  y <- 2
  future_map(.x = x, .f = ~ .x + y, .options = future_options(globals = "x"))
}

test_fn()

The problem I am facing with this is that I may have sizable objects in my function but I don't want them to be exported to every worker as it materially degrades performance.

@DavisVaughan
Copy link
Owner

First off, are you on Windows or Mac or Linux?

What plan() are you setting? multicore or multiprocess or what?

@Fritzimponis
Copy link
Author

Apols, should have specified those already.

I am on Windows and use plan(multisession)

R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] furrr_0.1.0 future_1.8.1

loaded via a namespace (and not attached):
[1] compiler_3.4.4 magrittr_1.5 parallel_3.4.4 tools_3.4.4 listenv_0.7.0 yaml_2.1.18 codetools_0.2-15
[8] digest_0.6.15 globals_0.11.0 rlang_0.2.0 purrr_0.2.4

@DavisVaughan
Copy link
Owner

@HenrikBengtsson, I thought you might be interested in this too as it affects future.apply.

library(future.apply)
plan(multisession)

test_fn <- function() {
  x <- c(1,2)
  y <- 2
  future_lapply(
    X = x, 
    FUN = function(.x) .x + y,  # <- this y should not be exported?
    future.globals = "x"
  )
}

test_fn()
#> [[1]]
#> [1] 3
#> 
#> [[2]]
#> [1] 4


# Where is that y?

test_fn_envs <- function() {
  x <- c(1,2)
  y <- 2
  
  # return the envs of those processes
  future_lapply(
    X = x, 
    FUN = function(.x) {
      .x + y
      env <- environment()
      parent_env <- parent.env(env)
      return(list(this_env = env, parent_env = parent_env))
    }, 
    future.globals = "x"
  )
}

ret <- test_fn_envs()

names(ret[[1]]$this_env)
#> [1] "parent_env" "env"        ".x"
names(ret[[1]]$parent_env) # <- here is the y, function scoping rules find it in FUN
#> [1] "y" "x"

Created on 2018-09-01 by the reprex
package
(v0.2.0).

@DavisVaughan
Copy link
Owner

My thoughts on this are, the anonymous FUN's environment is the enclosing environment that contains the y variable. FUN holds onto it as it get's exported? I don't think this is altered at any time in future.apply or future so it get's passed through

@DavisVaughan
Copy link
Owner

DavisVaughan commented Sep 1, 2018

This is definitely bad because @vrontosc's concerns are valid, any object in that function will be exported, whether it is used in FUN or not.

library(future.apply)
plan(multisession)

test_fn_envs <- function() {
  x <- c(1,2)
  y <- 2
  really_large_thing <- 1
  
  # return the envs of those processes
  future_lapply(
    X = x, 
    FUN = function(.x) {
      .x + y
      env <- environment()
      parent_env <- parent.env(env)
      return(list(this_env = env, parent_env = parent_env))
    }, 
    future.globals = "x"
  )
}

ret <- test_fn_envs()

names(ret[[1]]$parent_env) # <- really_large_thing came through
#> [1] "really_large_thing" "y"                  "x"

Created on 2018-09-01 by the reprex
package
(v0.2.0).

@gacolitti
Copy link

gacolitti commented Dec 18, 2019

I am also trying to use furrr::future_map() inside another function on Windows with plan(multiprocess). I thought I was doing something wrong because it was much slower than purrr::map(), but I think it might be working as intended after reading this post--and after running future_map() outside of the function resulting in much faster performance.

Are there any updates on this issue? Can you suggest a workaround in the meantime for those of us that want to run things in parallel inside our own function with furrr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants