Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple core execution doesn't update environments #107

Closed
markromanmiller opened this issue Apr 22, 2022 · 7 comments
Closed

Multiple core execution doesn't update environments #107

markromanmiller opened this issue Apr 22, 2022 · 7 comments

Comments

@markromanmiller
Copy link
Contributor

It appears that when code is run across multiple cores, the .results object containing the universe environments aren't updated properly. I would expect both of these methods to have equivalent results:

library(tidyverse)
library(multiverse)
#> Loading required package: knitr
#> 
#> Attaching package: 'multiverse'
#> The following object is masked from 'package:tidyr':
#> 
#>     expand

M_cores_1 <- multiverse()

inside(M_cores_1, {
  variable_inside_env <- branch(
    var_num,
    "var1" ~ 1,
    "var2" ~ 2,
    "var3" ~ 3
  )
})

execute_multiverse(M_cores_1, cores = 1)

multiverse_results_1 <- expand(M_cores_1) %>%
  mutate(
    environment_variables = map_dbl(.results, ~length(ls(envir = .x))),
    environment_names = map_chr(.results, ~paste0(ls(envir = .x), collapse = ", "))
  ) %>%
  select(-.parameter_assignment, -.code)

print(multiverse_results_1)
#> # A tibble: 3 × 5
#>   .universe var_num .results environment_variables environment_names  
#>       <int> <chr>   <list>                   <dbl> <chr>              
#> 1         1 var1    <env>                        1 variable_inside_env
#> 2         2 var2    <env>                        1 variable_inside_env
#> 3         3 var3    <env>                        1 variable_inside_env

# Multiple cores

M_cores_2 <- multiverse()

inside(M_cores_2, {
  variable_inside_env <- branch(
    var_num,
    "var1" ~ 1,
    "var2" ~ 2,
    "var3" ~ 3
  )
})

execute_multiverse(M_cores_2, cores = 2)

multiverse_results_2 <- expand(M_cores_2) %>%
  mutate(
    environment_variables = map_dbl(.results, ~length(ls(envir = .x))),
    environment_names = map_chr(.results, ~paste0(ls(envir = .x), collapse = ", "))
  ) %>%
  select(-.parameter_assignment, -.code)

print(multiverse_results_2)
#> # A tibble: 3 × 5
#>   .universe var_num .results environment_variables environment_names    
#>       <int> <chr>   <list>                   <dbl> <chr>                
#> 1         1 var1    <env>                        1 "variable_inside_env"
#> 2         2 var2    <env>                        0 ""                   
#> 3         3 var3    <env>                        0 ""

Created on 2022-04-22 by the reprex package (v2.0.1)

I'm running R 4.1.2 with RStudio 2021.09.2 on Ubuntu 20.04 LTS with 11th Gen Intel® Core™ i7-1165G7 @ 2.80GHz × 8. I'm not sure if I'm missing any software libraries to enable this capability.

@abhsarma
Copy link
Collaborator

Thanks for pointing this out! Yes, they should have the same output. Seems like the multi-core instance is not executing all the universes (or is not executing in the correct environment). I'll take a look

@abhsarma
Copy link
Collaborator

abhsarma commented Apr 22, 2022

This seems to be an issue with how we use environments and parallel::mcmapply, since the code works fine with both mapply and futures.apply::future_mapply
I'm trying to figure out an alternative solution which supports multicores com, perhaps using the futures package, which has been a long standing discussion (#54 , #89 ) but this likely means that we can't use pbmcapply either --- I'll try to look for an alternate implementation of progress bars

@abhsarma
Copy link
Collaborator

Actually, this problem seems to exist for any multicore / multisession library. The problem probably lies somewhere in the use of environments in parallel, but I can't seem to figure out what it is...

@markromanmiller
Copy link
Contributor Author

I'm going to hazard a guess that mc*apply functions are designed to return a value, not necessarily carry over the side-effects of running code - as how could one tell what those side-effects are?

One approach could be requiring the user to be specific about what objects they want to return - if mc*apply functions return one object per function, perhaps that can be the environment? I don't know, I'm spitballing here. I do currently expect to use cluster computing with multiverse in the next month or two, so I have some time to put into this feature if my need arises.

@abhsarma
Copy link
Collaborator

abhsarma commented Apr 23, 2022

tl;dr your approach of rewriting the environments makes sense. I describe below *what I think* is going wrong but I'll see if @mjskay has any alternative suggestions


Interesting, so it seems like mc*apply functions does something weird with environments:

library(rlang)
library(purrr)

env_list = list(new.env(), new.env(), new.env(), new.env()) # creates four new environments, with the global env as the parent
code_list = list(expr({a = 111}), expr({b = 112}), expr({c = 113}), expr({d = 114})) # random code

res = mapply(eval, expr = code_list, envir = env_list) # executes the code in each environment

map(env_list, env_names) # returns the names of the variables defined in each environment
env_list_2 = list(new.env(), new.env(), new.env(), new.env())

res = mcmapply(eval, expr = code_list, envir = env_list_2)

map(env_list_2, env_names) # returns `character(0)`

On further inspection (based on the approach you described), it seems like mc*apply functions do not return the same environments that were initially used, but rather returns entirely new environments:

eval_in_env = function(c, e) {
  eval(expr = c, envir = e)
  e
}

env_mapply = mapply(eval_in_env, code_list, env_list)
map2(env_mapply, env_list, identical) # returns TRUE for all


env_mcmapply = mcmapply(eval_in_env, code_list, env_list)
map2(env_mcmapply, env_list, identical) # returns FALSE for all

This second issue is why the output differs, because the actual environments in which mc*apply is executing the code is not stored anywhere. This makes me wonder if we should just use mc*apply instead mapply (even for single core operations) and change how we deal with environments instead of having two separate pathways...

@mjskay
Copy link
Contributor

mjskay commented Apr 23, 2022

Yeah, I don't know exactly how R environments work with threads or multiple processes, but I would guess that they can't be shared across them. So I would guess that the parallel versions of apply copy environment contents into a new environment on a separate thread or process and then copy results back upon completion. So they would not be able to directly modify environments in the original thread.

This second issue is why the output differs, because the actual environments in which mcapply is executing the code is not stored anywhere. This makes me wonder if we should just use mcapply instead mapply (even for single core operations) and change how we deal with environments instead of having two separate pathways...

Having a single pathway makes sense. Though, did we end up implementing the crazy tree-of-environments approach or not? Would that need to change for a multithreaded approach?

If we are going to change this around, I would suggest moving to {future} at the same time as this should make it easier for users doing this on a cluster with custom setups.

@abhsarma
Copy link
Collaborator

abhsarma commented Apr 23, 2022

We actually have the tree-of-environments implemented (and I do remember the parallel apply functions working at some point in time), but I don’t think it should be an issue.
I think simply changing things to creating environments on execution instead of creating them apriori makes sense here

I'll write some tests for checking parallel execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants