-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel segmentation #19
Comments
I was just to suggest a similar thing. I highly recommend that you consider PSCBS uses the future package as the backend and that will allow users to use whatever parallel backends they want with a single-change in settings, e.g. future::plan("multicore") On Windows, where R does not support multicore processing, you can use multiple background R sessions instead; future::plan("multisession") It also allow you to run things on a cluster etc. I'm finally going all in every where using futures, cf. www.aroma-project.org/howtos/parallel_processing/ If you want to roll your own, I still highly recommend that you look at the future package. Really. |
Ah... I see from above commit notes that you've already used flapply <- function(x, FUN, ...) {
res <- list()
for (ii in seq_along(x)) res[[ii]] <- future(FUN(x[[ii]], ...))
names(res) <- names(x)
values(res)
} If a user uses This way you don't have to hard code to only use multicore processing (cf. Windows users or cluster users). You can also get rid of explicit |
FYI, in next release of future, instructions only need to mention: plan(multiprocess) which will use multicore processing if supported, otherwise multisession. |
You should've been here earlier! I just looked at Anyways, like you saw I implemented it via |
Have a look at ilarischeinin#1 so you see what it takes. |
@daoud-sie, can you take a look at the discussion over there. Since you're the package maintainer, I think it's your call which option to take:
If you say 2, I'll merge Henrik's PR, which will then automatically include the changes in my PR. |
Just to reiterate: The future package is really light weight by design, easy and quick to install everywhere, and so it will remain. Although I'm biased, by using the future package the code will be cleaner and easier to maintain, much less if-this-then-that-otherwise-this coding. I'd also like to point out that the future package will also support full control of how nested futures are evaluated. For instance, in the QDNAseq case you can imagine processing each sample on a separate machine and then each chromosome in a separate process. The syntax for controlling this would be something like |
FYI, future 0.12.0 is now on CRAN. Regardless of OS, everyone can now use |
Thanks! |
Another update: If you have access to a cluster or similar, you can use the future.BatchJobs package (now public on GitHub) to automatically do the segmentation on the cluster: library("future.BatchJobs")
plan(batchjobs) This requires regular BatchJobs configuration, which is quite straightforward. If you have an ad-hoc cluster (ssh w/ key-pair login but no fancy Slurm/PBS scheduler) you can use what's already available in the future package, e.g. library("future")
cl <- parallel::makeCluster(c("machine2", "machine5", "machine6", "machine6", "machine9"))
plan(cluster, cluster=cl) |
* master: (351 commits) Fix noisePlot() for paired end data Bump R version number dependency (to what IRanges already requires) Add option to specify random seeds Bump development version number to 1.7.3 Make package future optional Update vignette to use BiocStyle Add base package imports to fix Travis NOTEs Fix travis package installs Update NEWS, fix #18 Update NEWS, close #20 Move calculation of expected variance to its own function Smarter handling of user-provided cutoff values Grammar Deprecating argument 'ncpus' [#19] Fix newline in verbose messages Using futures for parallel processing [#19] Update NEWS. Close #19 Add parallel loess correction estimation Add homozygous deletions and amplifications to cutoff calling Implement parallel segmentation also when using smoothing ... From: Daoud Sie <daoud@Daouds-MacBook-Air.local> git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/QDNAseq@113827 bc3139a8-67e5-0310-9ffc-ced21a209358
* master: (351 commits) Fix noisePlot() for paired end data Bump R version number dependency (to what IRanges already requires) Add option to specify random seeds Bump development version number to 1.7.3 Make package future optional Update vignette to use BiocStyle Add base package imports to fix Travis NOTEs Fix travis package installs Update NEWS, fix #18 Update NEWS, close #20 Move calculation of expected variance to its own function Smarter handling of user-provided cutoff values Grammar Deprecating argument 'ncpus' [#19] Fix newline in verbose messages Using futures for parallel processing [#19] Update NEWS. Close #19 Add parallel loess correction estimation Add homozygous deletions and amplifications to cutoff calling Implement parallel segmentation also when using smoothing ... From: Daoud Sie <daoud@Daouds-MacBook-Air.local> git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/QDNAseq@113827 bc3139a8-67e5-0310-9ffc-ced21a209358
segmentBins()
currently uses serial computing and can be very slow. I'm working on a parallel implementation (with packageparallel
) that should give a nice speedup.It's in branch "parallel-segmentation" of my fork.
The text was updated successfully, but these errors were encountered: