-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parallel execution of universes #54
Comments
|
So I've played around with future for a while, and it was incredibly frustrating. It cannot find functions from loaded packages which can be solved (not great but whatever). Say we want to execute all the universes for this: M = multiverse()
data("durante")
inside(M, {
df <- durante %>%
mutate( ComputedCycleLength = StartDateofLastPeriod - StartDateofPeriodBeforeLast ) %>%
dplyr::filter( branch(cycle_length,
"cl_option1" ~ TRUE,
"cl_option2" ~ ComputedCycleLength > 25 & ComputedCycleLength < 35,
"cl_option3" ~ ReportedCycleLength > 25 & ReportedCycleLength < 35
)) %>%
mutate(NextMenstrualOnset = branch(menstrual_calculation,
"mc_option1" %when% (cycle_length != "cl_option3") ~ StartDateofLastPeriod + ComputedCycleLength,
"mc_option2" %when% (cycle_length != "cl_option2") ~ StartDateofLastPeriod + ReportedCycleLength,
"mc_option3" ~ StartDateNext)
)
}) The following implementation fails by being unable to find the dataset m_diction = attr(M, "multiverse")$multiverse_diction
.to_exec = seq_len(m_diction$size())
.m_list <- m_diction$as_list()[[1]]
.code_list = lapply(.m_list, `[[`, "code")
.env_list = lapply(.m_list, `[[`, "env")
plan(multiprocess)
future_map2(.code_list, .env_list, execute_code_from_universe, .options = future_options(packages = "dplyr")) the solution to this problem appears not to be to pass the variable name like M = multiverse()
inside(M, {
data("durante")
df <- durante ...
} and then the call to |
Yeah that's because "multicore" uses forking instead of a completely new process. For any parallelization approach except forking (clusters, multiple processes, etc) I assume we will eventually have to address this problem whether we use future or not. It's just a fundamental limitation of the underlying mechanism of parallelization. I think we could make this a caveat for now and later try to pull stuff automatically the way future does by analyzing the multiverse code with the globals package the way future examines other code. |
EDIT: this call would also work apparently: However, identifying global variables that are being used in the code is much more complicated? Or do we just chuck everything in? |
I'd say don't worry about it for now - quicker to just write an example explaining the issue in the docs and leave an issue for this post chi. Other things are probably higher priority. |
got it, so leave it on the user to write the multiverse code so that they don't run into this problem and document it properly? |
Yeah unless you feel like futzing with the globals package and whatnot. But it seems like we have other priorities? |
probably using
future
? https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.htmlThe text was updated successfully, but these errors were encountered: