Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array job support... #256

Open
jgrn307 opened this issue Oct 18, 2018 · 1 comment
Open

Array job support... #256

jgrn307 opened this issue Oct 18, 2018 · 1 comment

Comments

@jgrn307
Copy link

jgrn307 commented Oct 18, 2018

I thought I'd cross-post a bit from future.batchtools to here (see HenrikBengtsson/future.batchtools#23 for the original request) -- we use SLURM which, like many batch systems, supports "array jobs" -- basically two-level job hierarchies. These are implemented, in part, so batch systems don't get overwhelmed by people submitting thousands of jobs.

My understanding is future isn't quite there when it comes to e.g. hierarchical looping. It seems to move forward on this, there's an easy fix which is simply the job x array, from future's point of view, is flattened, e.g. if you have an array with max 100 jobs, and you have 250 things to loop through, you just create/distribute the iterations across 3 jobs (loop ids 1:100, 101:200, 201:250). I would think the "embarassingly easy" way to think of array jobs could keep the current API and just require the backend settings to be set with e.g. array.jobs=TRUE and max.array.jobs=100.

A more complicated solution would be to begin supporting truly hierarchical looping structures, almost like a nested for loop (outer loop = job, inner loop = array). Of course this is more complicated conceptually.

In the meantime, we'd love to use future for a problem we're trying to solve (batch processing 1000s of satellite images) but we can't because our HPC limits the number of jobs we can run at once.

@HenrikBengtsson
Copy link
Owner

Thanks for starting this discussion. This is a complicated design issue, but I definitely have it on my radar (although quite far out).

The concept of array "jobs" will likely be part of APIs outside of the Core Future API (as implemented in the future package). However, we could probably make some changes to the core of futures to make that easier to implement. One though I have is to support merging/combining multiple Future objects into a single one, e.g.

a <- 42
b <- 3.14
c <- 2.71
f1 <- future({ 2 * a + b }, lazy =TRUE)
f2 <- future({ c * a }, lazy = TRUE)
f <- merge(f1, f2)
v <- value(f)
v1 <- v[[1]]
v2 <- v[[2]]

Such merging might be a fundamental construct that would help support array jobs/workers (a concept that belongs to how and where futures are resolved, not what they are).

Things that obviously complicates above merging is how to handle when globals (and other state-related dependencies) does not agree across futures. For instance, how should the following be handled:

a <- 42
b <- 3.14
c <- 2.71
f1 <- future({ 2 * a + b }, lazy =TRUE)
a <- 13
f3 <- future({ c * a }, lazy = TRUE)
f <- merge(f1, f3)

Just throwing this out there to share that I'm thinking of it but at the same time don't have a solution for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants