Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parallel execution of universes #54

Closed
mjskay opened this issue Mar 4, 2020 · 7 comments
Closed

Support parallel execution of universes #54

mjskay opened this issue Mar 4, 2020 · 7 comments
Assignees

Comments

@mjskay
Copy link
Contributor

mjskay commented Mar 4, 2020

probably using future? https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html

@abhsarma abhsarma added this to the pre-CHI beta release milestone Jul 1, 2020
@abhsarma
Copy link
Collaborator

abhsarma commented Jul 3, 2020

  • compare the execution speeds of the two different approaches on a multiverse with thousands of parameters
  • allow to users to set cores by options detect_cores

@abhsarma
Copy link
Collaborator

abhsarma commented Aug 12, 2020

So I've played around with future for a while, and it was incredibly frustrating. It cannot find functions from loaded packages which can be solved (not great but whatever).
It also cannot seem to find variables declared in the global environment which is incredibly annoying and I could not figure out a way to address that. the solution is to load the dataset into the package directly.

Say we want to execute all the universes for this:

M = multiverse()

data("durante")

inside(M, {
  df <- durante  %>%
    mutate( ComputedCycleLength = StartDateofLastPeriod - StartDateofPeriodBeforeLast )  %>%
    dplyr::filter( branch(cycle_length,
        "cl_option1" ~ TRUE,
        "cl_option2" ~ ComputedCycleLength > 25 & ComputedCycleLength < 35,
        "cl_option3" ~ ReportedCycleLength > 25 & ReportedCycleLength < 35
    )) %>%
    mutate(NextMenstrualOnset = branch(menstrual_calculation,
        "mc_option1" %when% (cycle_length != "cl_option3") ~ StartDateofLastPeriod + ComputedCycleLength,
        "mc_option2" %when% (cycle_length != "cl_option2") ~ StartDateofLastPeriod + ReportedCycleLength,
        "mc_option3" ~ StartDateNext)
    )
})

The following implementation fails by being unable to find the dataset durante (However, interestingly enough, if I change to plan(multicore) it works just fine)

m_diction = attr(M, "multiverse")$multiverse_diction

.to_exec = seq_len(m_diction$size()) 
.m_list <- m_diction$as_list()[[1]]

.code_list = lapply(.m_list, `[[`, "code")
.env_list = lapply(.m_list, `[[`, "env")

plan(multiprocess)
future_map2(.code_list, .env_list, execute_code_from_universe, .options = future_options(packages = "dplyr"))

the solution to this problem appears not to be to pass the variable name like future_options(globals = structure(TRUE, add = "durante")) as described here: https://cran.r-project.org/web/packages/future/vignettes/future-4-issues.html
but rather something like this:

M = multiverse()

inside(M, {
  data("durante")

  df <- durante ...
}

and then the call to future_map as written out above works

@mjskay
Copy link
Contributor Author

mjskay commented Aug 12, 2020

Yeah that's because "multicore" uses forking instead of a completely new process. For any parallelization approach except forking (clusters, multiple processes, etc) I assume we will eventually have to address this problem whether we use future or not. It's just a fundamental limitation of the underlying mechanism of parallelization.

I think we could make this a caveat for now and later try to pull stuff automatically the way future does by analyzing the multiverse code with the globals package the way future examines other code.

@abhsarma
Copy link
Collaborator

abhsarma commented Aug 12, 2020

EDIT: this call would also work apparently: future_map2(.code_list, .env_list, execute_code_from_universe, .options = future_options(packages = "dplyr", globals = "durante"))

However, identifying global variables that are being used in the code is much more complicated? Or do we just chuck everything in?

@mjskay
Copy link
Contributor Author

mjskay commented Aug 12, 2020

I'd say don't worry about it for now - quicker to just write an example explaining the issue in the docs and leave an issue for this post chi. Other things are probably higher priority.

@abhsarma
Copy link
Collaborator

got it, so leave it on the user to write the multiverse code so that they don't run into this problem and document it properly?

@mjskay
Copy link
Contributor Author

mjskay commented Aug 12, 2020

Yeah unless you feel like futzing with the globals package and whatnot. But it seems like we have other priorities?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants