add parallelization to the map_reads/sort_reads/filter_reads #72

hxin · 2018-04-10T11:02:32Z

Currently the function run STAR one sample at a time using ${NUM_CORE}.
Test if running more samples in parallel and use less core for each sample.

hxin · 2018-04-12T10:48:05Z

I tested the performance of parallel run with samples from the glucose project.

Running the map_reads for each sample at a time with 16 cores takes around 17 hours for all 48 samples mapping to mouse/rat, so that is 96 STAR runs, which makes it 10mins/run. Thus, for 4 samples, this will roughly take 40mins and 80mins for 8 samples.

Comparing to the numbers in the table, there is a big increase in turns of speed when it using multiple cores in parallel. Thus, I think this is worth discussing further. @lweasel

Samples	Core(s)/sample	Time(minutes)
4(C1 C2 C3 C4)	1	54.4166
4(C1 C2 C3 C4)	2	29.4833
4(C1 C2 C3 C4)	4	18.5833
4(C1 C2 C3 C4)	8	15.0000
4(C1 C2 C3 C4)	16	15.5166
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 )	1	61.9333
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 )	2	34.9000
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 )	4	19.1833
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 )	8	18.5666
8(C1 C2 D3 B1 B2 B4 D4 D1 C3 C4 B3 D2 )	16	19.8500

sample size:
C1 3.4G
C2 3.4G
D3 3.9G
B1 3.3G
B2 3.7G
B4 3.4G
D4 3.9G
D1 4.0G
C3 3.4G
C4 3.4G
B3 3.4G
D2 4.3G

hxin · 2018-04-12T13:42:42Z

also, something interesting is that the time needed does not change much from 8cores to 16cores, so in turns of STAR runs, in the current implementation, using 8 cores and 16 core are not making much difference.

hxin · 2018-05-06T10:28:27Z

Further tests have been done to test the performance of the implementation.

The parallelized code was added to sort_reads/map_reads/filter_reads and I used the same 6 samples/2 species to run Sargasso with two different configurations:

non-paral-version of sargasso
2.paral-version sargasso

Samples	Core(s)/sample	Max-cores	run1(minutes)	run2(minutes)
C1 C2 C5 C6 C9 C10	16	16	251	286
C1 C2 C5 C6 C9 C10	8	16	204	264

It seems that the increase is not as good as the individual test on the sort_reads/map_reads/filter_reads where, in general, the paral-version uses halftime with a 8/16 setup.

This may due to the fact that the performance may be limited by IO, rather than processing speed, when running these jobs together.

Overall, there is an increase by parallelizing the sort_reads/map_reads/filter_reads process.

hxin · 2018-05-06T12:06:37Z

row_number	job	core	total_core	run	time	time_pre_run	sample_pre_run	total_sample
1	filtered	10	16	7.5	108.2833	13.535412	1.600000	12
2	filtered	12	16	9.0	57.1666	6.351844	1.333333	12
3	filtered	12	32	4.5	56.4833	11.296660	2.666667	12
4	filtered	16	16	12.0	110.0833	9.173608	1.000000	12
5	filtered	16	32	6.0	55.6333	9.272217	2.000000	12
6	filtered	2	16	1.5	75.4500	37.725000	8.000000	12
7	filtered	4	16	3.0	70.2000	23.400000	4.000000	12
8	filtered	4	32	1.5	65.5333	32.766650	8.000000	12
9	filtered	6	16	4.5	74.4166	14.883320	2.666667	12
10	filtered	8	16	6.0	70.2500	11.708333	2.000000	12
11	filtered	8	32	3.0	71.5333	23.844433	4.000000	12
12	mapped	12	16	9.0	75.2833	8.364811	1.333333	12
13	mapped	12	32	4.5	72.7000	14.540000	2.666667	12
14	mapped	16	16	12.0	148.5666	12.380550	1.000000	12
15	mapped	16	32	6.0	77.3833	12.897217	2.000000	12
16	mapped	2	16	1.5	158.8500	79.425000	8.000000	12
17	mapped	4	16	3.0	97.6166	32.538867	4.000000	12
18	mapped	4	32	1.5	110.5833	55.291650	8.000000	12
19	mapped	8	16	6.0	80.5500	13.425000	2.000000	12
20	mapped	8	32	3.0	77.0166	25.672200	4.000000	12
21	sorted	12	16	9.0	34.6000	3.844444	1.333333	12
22	sorted	12	32	4.5	33.5333	6.706660	2.666667	12
23	sorted	16	16	12.0	49.7666	4.147217	1.000000	12
24	sorted	16	32	6.0	27.4166	4.569433	2.000000	12
25	sorted	2	16	1.5	76.8000	38.400000	8.000000	12
26	sorted	4	16	3.0	45.8666	15.288867	4.000000	12
27	sorted	4	32	1.5	48.1666	24.083300	8.000000	12
28	sorted	8	16	6.0	41.9833	6.997217	2.000000	12
29	sorted	8	32	3.0	39.2000	13.066667	4.000000	12

#find ${HOME}/tmp/test_speed/results/mouse/ -name '*' | grep -P 'run|time'|sort

a="filtered/10/16/run/7.5
filtered/10/16/time/108.2833
filtered/12/16/run/9.0
filtered/12/16/time/57.1666
filtered/12/32/run/4.5
filtered/12/32/time/56.4833
filtered/16/16/run/12.0
filtered/16/16/time/110.0833
filtered/16/32/run/6.0
filtered/16/32/time/55.6333
filtered/2/16/run/1.5
filtered/2/16/time/75.4500
filtered/4/16/run/3.0
filtered/4/16/time/70.2000
filtered/4/32/run/1.5
filtered/4/32/time/65.5333
filtered/6/16/run/4.5
filtered/6/16/time/74.4166
filtered/8/16/run/6.0
filtered/8/16/time/70.2500
filtered/8/32/run/3.0
filtered/8/32/time/71.5333
mapped/12/16/run/9.0
mapped/12/16/time/75.2833
mapped/12/32/run/4.5
mapped/12/32/time/72.7000
mapped/16/16/run/12.0
mapped/16/16/time/148.5666
mapped/16/32/run/6.0
mapped/16/32/time/77.3833
mapped/2/16/run/1.5
mapped/2/16/time/158.8500
mapped/4/16/run/3.0
mapped/4/16/time/97.6166
mapped/4/32/run/1.5
mapped/4/32/time/110.5833
mapped/8/16/run/6.0
mapped/8/16/time/80.5500
mapped/8/32/run/3.0
mapped/8/32/time/77.0166
sorted/12/16/run/9.0
sorted/12/16/time/34.6000
sorted/12/32/run/4.5
sorted/12/32/time/33.5333
sorted/16/16/run/12.0
sorted/16/16/time/49.7666
sorted/16/32/run/6.0
sorted/16/32/time/27.4166
sorted/2/16/run/1.5
sorted/2/16/time/76.8000
sorted/4/16/run/3.0
sorted/4/16/time/45.8666
sorted/4/32/run/1.5
sorted/4/32/time/48.1666
sorted/8/16/run/6.0
sorted/8/16/time/41.9833
sorted/8/32/run/3.0
sorted/8/32/time/39.2000"

require(tidyr)
require(dplyr)
require(ggplot2)
read.table(text=a,col.names=c('raw')) %>% 
  tidyr::separate(raw,into=c('job','core','total_core','type','value'),sep="/") %>% 
  dplyr::group_by(job,core,total_core) %>%
  reshape2::dcast(job + core + total_core ~ type) %>%
  dplyr::mutate_at(vars(-job),funs(as.numeric)) %>%
  dplyr::mutate(time_pre_run=time/ceiling(run),
                sample_pre_run=total_core/core,
                total_sample=c(12)) %>%
  ggplot() + geom_point(mapping=aes(x=core, y=time, size=time_pre_run,color=total_core)) + facet_wrap(~ job)

hxin self-assigned this May 6, 2018

hxin mentioned this issue May 6, 2018

The sambamba view in sort_reads script use all cores available, rather than the number of cores specified #74

Closed

hxin changed the title ~~test map_reads to parallelized the star mapping process~~ add parallelization to the map_reads/sort_reads/filter_reads May 9, 2018

hxin added a commit that referenced this issue May 9, 2018

clean up code. fix #72

f03220e

hxin added the WaitingForMerge label May 29, 2018

hxin added enhancement and removed WaitingForMerge labels Dec 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add parallelization to the map_reads/sort_reads/filter_reads #72

add parallelization to the map_reads/sort_reads/filter_reads #72

hxin commented Apr 10, 2018

hxin commented Apr 12, 2018 •

edited

Loading

hxin commented Apr 12, 2018

hxin commented May 6, 2018 •

edited

Loading

hxin commented May 6, 2018 •

edited

Loading

add parallelization to the map_reads/sort_reads/filter_reads #72

add parallelization to the map_reads/sort_reads/filter_reads #72

Comments

hxin commented Apr 10, 2018

hxin commented Apr 12, 2018 • edited Loading

hxin commented Apr 12, 2018

hxin commented May 6, 2018 • edited Loading

hxin commented May 6, 2018 • edited Loading

hxin commented Apr 12, 2018 •

edited

Loading

hxin commented May 6, 2018 •

edited

Loading

hxin commented May 6, 2018 •

edited

Loading