Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add parallelization to the map_reads/sort_reads/filter_reads #72

Open
hxin opened this issue Apr 10, 2018 · 4 comments
Open

add parallelization to the map_reads/sort_reads/filter_reads #72

hxin opened this issue Apr 10, 2018 · 4 comments
Assignees

Comments

@hxin
Copy link
Collaborator

hxin commented Apr 10, 2018

Currently the function run STAR one sample at a time using ${NUM_CORE}.
Test if running more samples in parallel and use less core for each sample.

@hxin
Copy link
Collaborator Author

hxin commented Apr 12, 2018

I tested the performance of parallel run with samples from the glucose project.

Running the map_reads for each sample at a time with 16 cores takes around 17 hours for all 48 samples mapping to mouse/rat, so that is 96 STAR runs, which makes it 10mins/run. Thus, for 4 samples, this will roughly take 40mins and 80mins for 8 samples.

Comparing to the numbers in the table, there is a big increase in turns of speed when it using multiple cores in parallel. Thus, I think this is worth discussing further. @lweasel

Samples Core(s)/sample Time(minutes)
4(C1 C2 C3 C4) 1 54.4166
4(C1 C2 C3 C4) 2 29.4833
4(C1 C2 C3 C4) 4 18.5833
4(C1 C2 C3 C4) 8 15.0000
4(C1 C2 C3 C4) 16 15.5166
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 ) 1 61.9333
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 ) 2 34.9000
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 ) 4 19.1833
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 ) 8 18.5666
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 ) 16 19.8500

sample size:
C1 3.4G
C2 3.4G
D3 3.9G
B1 3.3G
B2 3.7G
B4 3.4G
D4 3.9G
D1 4.0G
C3 3.4G
C4 3.4G
B3 3.4G
D2 4.3G

@hxin
Copy link
Collaborator Author

hxin commented Apr 12, 2018

also, something interesting is that the time needed does not change much from 8cores to 16cores, so in turns of STAR runs, in the current implementation, using 8 cores and 16 core are not making much difference.

@hxin
Copy link
Collaborator Author

hxin commented May 6, 2018

Further tests have been done to test the performance of the implementation.

The parallelized code was added to sort_reads/map_reads/filter_reads and I used the same 6 samples/2 species to run Sargasso with two different configurations:

  1. non-paral-version of sargasso
    2.paral-version sargasso
Samples Core(s)/sample Max-cores run1(minutes) run2(minutes)
C1 C2 C5 C6 C9 C10 16 16 251 286
C1 C2 C5 C6 C9 C10 8 16 204 264

It seems that the increase is not as good as the individual test on the sort_reads/map_reads/filter_reads where, in general, the paral-version uses halftime with a 8/16 setup.

This may due to the fact that the performance may be limited by IO, rather than processing speed, when running these jobs together.

Overall, there is an increase by parallelizing the sort_reads/map_reads/filter_reads process.

@hxin
Copy link
Collaborator Author

hxin commented May 6, 2018

row_number job core total_core run time time_pre_run sample_pre_run total_sample
1 filtered 10 16 7.5 108.2833 13.535412 1.600000 12
2 filtered 12 16 9.0 57.1666 6.351844 1.333333 12
3 filtered 12 32 4.5 56.4833 11.296660 2.666667 12
4 filtered 16 16 12.0 110.0833 9.173608 1.000000 12
5 filtered 16 32 6.0 55.6333 9.272217 2.000000 12
6 filtered 2 16 1.5 75.4500 37.725000 8.000000 12
7 filtered 4 16 3.0 70.2000 23.400000 4.000000 12
8 filtered 4 32 1.5 65.5333 32.766650 8.000000 12
9 filtered 6 16 4.5 74.4166 14.883320 2.666667 12
10 filtered 8 16 6.0 70.2500 11.708333 2.000000 12
11 filtered 8 32 3.0 71.5333 23.844433 4.000000 12
12 mapped 12 16 9.0 75.2833 8.364811 1.333333 12
13 mapped 12 32 4.5 72.7000 14.540000 2.666667 12
14 mapped 16 16 12.0 148.5666 12.380550 1.000000 12
15 mapped 16 32 6.0 77.3833 12.897217 2.000000 12
16 mapped 2 16 1.5 158.8500 79.425000 8.000000 12
17 mapped 4 16 3.0 97.6166 32.538867 4.000000 12
18 mapped 4 32 1.5 110.5833 55.291650 8.000000 12
19 mapped 8 16 6.0 80.5500 13.425000 2.000000 12
20 mapped 8 32 3.0 77.0166 25.672200 4.000000 12
21 sorted 12 16 9.0 34.6000 3.844444 1.333333 12
22 sorted 12 32 4.5 33.5333 6.706660 2.666667 12
23 sorted 16 16 12.0 49.7666 4.147217 1.000000 12
24 sorted 16 32 6.0 27.4166 4.569433 2.000000 12
25 sorted 2 16 1.5 76.8000 38.400000 8.000000 12
26 sorted 4 16 3.0 45.8666 15.288867 4.000000 12
27 sorted 4 32 1.5 48.1666 24.083300 8.000000 12
28 sorted 8 16 6.0 41.9833 6.997217 2.000000 12
29 sorted 8 32 3.0 39.2000 13.066667 4.000000 12

#find ${HOME}/tmp/test_speed/results/mouse/ -name '*' | grep -P 'run|time'|sort

a="filtered/10/16/run/7.5
filtered/10/16/time/108.2833
filtered/12/16/run/9.0
filtered/12/16/time/57.1666
filtered/12/32/run/4.5
filtered/12/32/time/56.4833
filtered/16/16/run/12.0
filtered/16/16/time/110.0833
filtered/16/32/run/6.0
filtered/16/32/time/55.6333
filtered/2/16/run/1.5
filtered/2/16/time/75.4500
filtered/4/16/run/3.0
filtered/4/16/time/70.2000
filtered/4/32/run/1.5
filtered/4/32/time/65.5333
filtered/6/16/run/4.5
filtered/6/16/time/74.4166
filtered/8/16/run/6.0
filtered/8/16/time/70.2500
filtered/8/32/run/3.0
filtered/8/32/time/71.5333
mapped/12/16/run/9.0
mapped/12/16/time/75.2833
mapped/12/32/run/4.5
mapped/12/32/time/72.7000
mapped/16/16/run/12.0
mapped/16/16/time/148.5666
mapped/16/32/run/6.0
mapped/16/32/time/77.3833
mapped/2/16/run/1.5
mapped/2/16/time/158.8500
mapped/4/16/run/3.0
mapped/4/16/time/97.6166
mapped/4/32/run/1.5
mapped/4/32/time/110.5833
mapped/8/16/run/6.0
mapped/8/16/time/80.5500
mapped/8/32/run/3.0
mapped/8/32/time/77.0166
sorted/12/16/run/9.0
sorted/12/16/time/34.6000
sorted/12/32/run/4.5
sorted/12/32/time/33.5333
sorted/16/16/run/12.0
sorted/16/16/time/49.7666
sorted/16/32/run/6.0
sorted/16/32/time/27.4166
sorted/2/16/run/1.5
sorted/2/16/time/76.8000
sorted/4/16/run/3.0
sorted/4/16/time/45.8666
sorted/4/32/run/1.5
sorted/4/32/time/48.1666
sorted/8/16/run/6.0
sorted/8/16/time/41.9833
sorted/8/32/run/3.0
sorted/8/32/time/39.2000"

require(tidyr)
require(dplyr)
require(ggplot2)
read.table(text=a,col.names=c('raw')) %>% 
  tidyr::separate(raw,into=c('job','core','total_core','type','value'),sep="/") %>% 
  dplyr::group_by(job,core,total_core) %>%
  reshape2::dcast(job + core + total_core ~ type) %>%
  dplyr::mutate_at(vars(-job),funs(as.numeric)) %>%
  dplyr::mutate(time_pre_run=time/ceiling(run),
                sample_pre_run=total_core/core,
                total_sample=c(12)) %>%
  ggplot() + geom_point(mapping=aes(x=core, y=time, size=time_pre_run,color=total_core)) + facet_wrap(~ job)

image

@hxin hxin changed the title test map_reads to parallelized the star mapping process add parallelization to the map_reads/sort_reads/filter_reads May 9, 2018
hxin added a commit that referenced this issue May 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant