Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpslots to query for availability of nested parallel cores #36

Closed
mtmorgan opened this issue Feb 14, 2014 · 9 comments
Closed

bpslots to query for availability of nested parallel cores #36

mtmorgan opened this issue Feb 14, 2014 · 9 comments

Comments

@mtmorgan
Copy link
Collaborator

From Michael Lang:

Consider the task to align N sequence files (each containing millions of
reads) to a reference genome. We want to perform this in R using
parallelization on a parallel backend with C CPU cores.

If N is large compared to the number of CPU cores C, efficient
parallelization can be achieved by distributing the N sequence files to
the C workers, which will each perform single-threaded alignments.

The use-case arises when N is smaller than C. I further assume that
multiple cores are available on a single physical compute node, which I
think is typical these days. Because the alignment algorithm is able to
run multiple parallel threads, efficient parallelization could be
achieved by engaging N workers, each using P parallel threads.
Optimally, P is such that P*N == C, so P could be close to C/N. In
practice, P is the (minimal) number of cores available on a physical
node. The use case could be thought of as "nested parallelization"
(across workers and across threads).

It is a very typical case at our institute and for our collaborators at
the University of Basel: N is often below 20 (the number of samples in a
sequencing experiment, e.g. 3 genotypes times 3 replicates), and C is
well over 100 (and can be as high as 8000 on the University cluster,
with many 64-core nodes). We think that splitting one sequence file into
smaller chunks (thereby increasing N) would not be a feasible solution
(slow IO performance).

I was hoping that there would be an abstraction allow me to write
parallel code that would be independent of the parallel backend. I am
very excited about BiocParallel and bpworkers(), which is a great way
(for QuasR) to learn about the "first level" of the nested
parallelization, in a manner that is independent of the parallel
backend. What I am missing is a standardized way to also query
BiocParallel for the second level of the nested parallelization (the
number of parallel threads that can run on one worker), e.g. using a
function such as bpslots().

When using a BatchJobs backend, I imagine bpslots() would return a value
that is comparable to the value in the NSLOTS environment variable on a
SGE cluster node. The value would be set by the user when creating the
instance of BatchJobsParam(), similarly as it is now done with "workers":

prm <- BatchJobsParam(workers=n, slots=s, ...)

On other backends, some convention would have to be agreed on, e.g.
that bpslots() returns:
1L for a SerialParam backend
a user-set value (default: NA) for SnowParam, DoparParam and MulticoreParam backends

Currently, QuasR tries to guess bpworkers() and bpslots() by querying
the parallel::SOCKcluster object provided by the user, similar as in the
following example (xenon2 and xenon3 are local machine names):

> library(parallel)
> cl <- makeCluster(rep(c("xenon2","xenon3"),each=3))
> cl
socket cluster with 6 nodes on hosts 'xenon2', 'xenon3'
> tnn <- table(unlist(clusterEvalQ(cl, Sys.info()['nodename'])))
> tnn
xenon2.fmi.ch xenon3.fmi.ch
            3             3
> length(tnn) # bpworkers()
[1] 2
> min(tnn) # bpslots()
[1] 3

This only works for this particular type of parallel backend. If we want
to support multiple parallel backends, we will have to write
backend-specific code into QuasR. Alternatively, if this is abstracted
by BiocParallel with bpslots(), QuasR could parallelize across
bpworkers() nodes, using up to bpslots() parallel threads on each of them.

@DarwinAwardWinner
Copy link

To me, the most straightforward way to implement this would be a sort of "stack" of BiocParallelParam objects, where each parallel level of nesting would pop a param off the stack and use it. So for the described use case of N workers each with P parallel threads, it would look something like this:

library(BiocParallel)

## Example values
N <- 20
P <- 8

input.files <- sprintf("input%s.fastq", N)

bp1 <- BatchJobsParam(workers=N)
bp2 <- MulticoreParam(workers=P)
## Making up a class name here
bppstack <- ParamStack(bp1, bp2)

## This bplapply call would parallelize using the topmost param,
## i.e. the BatchJobsParam
result <- bplapply(input.files, function(seqfile) {
  seqs <- readFastq(seqfile)
  ## This call would automatically parallelize using the second
  ## param, i.e. the MulticoreParam
  bpvec(seqs, function.that.does.alignment)
  ## Alternatively, we could just use the inner param to get the
  ## number of workers. This call would automatically query the
  ## second param on the stack.
  p <- bpworkers()
  system2("programThatDoesAlignment", "--input", seqfile, "--ncores", p)
}, BPPARAM=bppstack)

Does this make sense?

@lawremi
Copy link

lawremi commented Feb 14, 2014

I think it makes sense. There would always be the option of simply being explicit and having the library function take e.g. two different BPPARAM parameters, one for each level. That might be more self-documenting than having one parameter that should be a stack. However, there are benefits to encapsulating the strategy as a single object.

Also, it would seem bpworkers() would need to be dual-dispatch, on both parameter objects. One to know we're looking for cores, the other to know how to query the cluster for the number of allocated cores.

@DarwinAwardWinner
Copy link

Well, the idea is that functions like bpworkers would always use the top param on the stack, and every level of parallelization (bplapply, bpvec), would use the top param for its own parallelization, then pop it off during the execution of FUN, and the push it back on afterward.

@lawremi
Copy link

lawremi commented Feb 14, 2014

Yea, I understand that, and the approach is valid. I was just pointing out
the alternative.

On Fri, Feb 14, 2014 at 11:38 AM, Ryan Thompson notifications@github.comwrote:

Well, the idea is that functions like bpworkers would always use the top
param on the stack, and every level of parallelization (bplapply, bpvec),
would use the top param for its own parallelization, then pop it off during
the execution of FUN, and the push it back on afterward.

Reply to this email directly or view it on GitHubhttps://github.com//issues/36#issuecomment-35117131
.

@mllg
Copy link
Collaborator

mllg commented Feb 19, 2014

The possibility to give bp[lm]apply an ID would also be useful in many cases. Imagine you have a 3x nested call to bplapply and lets say the two inner most calls are in a BioConductor or CRAN package so that you cannot conveniently alter the code.

In some scenarios, depending on your architecture, you might want to parallelize only the second call, but in others you might want to parallelize the inner most and outer most. This is actually still ok to write down using stacked BPPARAM objects. But you probably also want to always parallelize the the inner most call using multicore, regardless of how nested it is (and you don't want to dig through package code to count if unsure).

So what I really would want to write down is something like this:

register(PackageName.readSeqFile = MultiCoreParam(16))
register(myResampleLoop = BatchJobsParam())

bplapply(1:100, BPID = "myResampleLoop", function(i) { readSeqFile(...) })

The default would be to look up the named register and fall back on the (unnamed) stack of BPPARAM objects (or SerialParam() if it is empty or undefined).

@DarwinAwardWinner
Copy link

That brings up an important issue: for packages using BiocParallel, what interface are they expected to expose to the parallelization? Should any package function that internally uses a BiocParallel function accept and use an arbitrary BPPARAM argument? Certainly if we implement the ID system, functions would be required to document the BPID they are using.

@lawremi
Copy link

lawremi commented Feb 19, 2014

I've been using the BPPARAM approach so far, but it only works for the simple cases.

We should require registration of the ID (is key a better word?) itself, and the user could introspect that at runtime. To mitigate side effects, might also consider setting up a temporary ID context like:

bpwith(fun(x), PackageName.readSeqFile=MulticoreParam(16))

Also, what about just accepting a character for the BPPARAM instead of BPID? BPID is clearer, but there is a redundancy.

@vobencha
Copy link
Contributor

vobencha commented Nov 6, 2015

I believe these were addressed with the implementation of the list of BiocParams and register/registered.

@vobencha vobencha closed this as completed Nov 6, 2015
@lawremi
Copy link

lawremi commented Nov 6, 2015

Sort of. This thread sort of diverged from the initial motivation. At least one aspect of this was that we need an abstract way to obtain the resources available to a worker. For example, let's say we launch a job via BatchJobs, giving it a resource specification for the number of cores per worker. The backend will somehow configure the environment to inform the worker of the available resources. For example, LSF sets an environment variable for the number of cores. It would be great if BiocParallel could automatically use that as the default (maximum?) number of cores for the multicore backend. I realize that we can set the number of cores on the MulticoreParam in the BiocParam list, but that is not communicated to the scheduler.

All of this is sort of moot for us, because BatchJobsParam has never worked reliably for us on our cluster, and with the move to SnowParam, multicore does not work either :( We're back in the world of adhoc solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants