Days spent on chomId_006_chr7_0013 #168

annahoge · 2020-01-24T17:00:46Z

Hello, and thank you for your tool!

When I run Strelka2 somatic with 24+ cores and 32G+ memory for 30x WGS data (with hg38 masked as recommended in the User Guide), ~1/2 of the samples get stuck on

Longest ongoing running task name: 'CallGenome+callGenomeSegment_chromId_006_chr7_0013'

for multiple hours, while the other jobs finish in a couple hours. >1/3 of the samples (out of ~70) keep running for >a week, until I shut them down. Restarting the job from where it left off does not fix the problem. Is this a bug you could please address? I can't share the data, but I don't find any regions of abnormally high depth on chr7.

Configuration example:
/path/to/configureStrelkaSomaticWorkflow.py
--normalBam /path/to/normal.bam
--tumorBam /path/to/tumor.bam
--referenceFasta /path/to/hg38.fa
--runDir /path/to/dir
--callRegions /path/to/strelka2-provided-call-regions-for-hg38.bed.gz

Run example (on HPC cluster node):
/path/to/runWorkflow.py -m local -j 24

Log example:
...[156412_1] [WorkflowRunner] [StatusUpdate] Workflow specification is complete?: True
...[156412_1] [WorkflowRunner] [StatusUpdate] Task status (waiting/queued/running/complete/error): 8/0/1/572/0
...[156412_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task time (hrs): 0.0000
...[156412_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task name: ''
...[156412_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task time (hrs): 17.6356
...[156412_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task name: 'CallGenome+callGenomeSegment_chromId_006_chr7_0013'

SebastianHollizeck · 2020-08-04T00:14:15Z

Hey,
i have a similar issue, but with different segments.
The regions that strelka2 seems to stall on are centromeric regions in my case.

The last entry in the vcf from that region is

chr4	49101739	.	T	C	.	LowEVS	SOMATIC;QSS=16;TQSS=2;NT=ref;QSS_NT=16;TQSS_NT=2;SGT=TT->CT;DP=5386;MQ=22.21;MQ0=3174;ReadPosRankSum=-0.01;SNVSB=36.87;SomaticEVS=0.12	DP:FDP:SDP:SUBDP:AU:CU:GU:TU	960:72:0:0:1,18:7,24:0,1:880,3051	900:65:1:0:0,12:17

And you can see, that the DP value is already through the roof (for a WGS).

I tried to supply a bed file with centromeres excluded

tabix /data/reference/dawson_labs/bed_indexed/GRCh38/GRCh38WithoutCentromeres.bed.gz chr4
chr4	0	49712061
chr4	51743951	190214555

But it still takes MUCH longer in these segments in contrast to all other.
These are the last running jobs (Task status (waiting/queued/running/complete/error): 12/0/6/1150/0)

CallGenome		running	0	2020-08-03T11:47:35.632059Z
callGenomeSegment_chromId_003_chr4_0004	CallGenome	running	0	2020-08-03T13:52:41.519421Z
callGenomeSegment_chromId_004_chr5_0004	CallGenome	running	0	2020-08-03T12:05:20.552800Z
callGenomeSegment_chromId_009_chr10_0003	CallGenome	running	0	2020-08-03T14:29:52.935066Z
callGenomeSegment_chromId_019_chr20_0002	CallGenome	running	0	2020-08-03T14:29:17.755184Z
callGenomeSegment_chromId_020_chr21_0000	CallGenome	running	0	2020-08-03T13:52:01.090603Z
callGenomeSegment_chromId_055_chr17_KI270729v1_random_0000	CallGenome	running	0	2020-08-03T12:50:51.351779Z

And all of those regions contain the centromere of their respective chromosome.

I do suspect, that there is something, where the bailout, that surely happens for the other chromosomes, just doesnt happen here. It would be great if there was a fix for this.

lkhilton · 2020-11-25T17:42:43Z

Leaving a comment to vote for a solution to this issue. We've run into this problem frequently though it seems somewhat arbitrary which samples take the longest to run, and it seems the bin that contains the centromere on Chr4 consistently takes the longest to run.

SebastianHollizeck · 2020-11-26T01:09:11Z

I actually found a solution for me by including

extraVariantCallerArguments = --max-input-depth 1000

into the strelka.ini
This removes the problem of mapping depth around the centromeres

Obviously you need to adjust the depth depending on your input.

lkhilton · 2020-11-26T07:39:09Z

Thanks, I'll give that a try.

migbro mentioned this issue Sep 30, 2021

🤔 Added optional strelka2 params kids-first/kf-somatic-workflow#127

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Days spent on chomId_006_chr7_0013 #168

Days spent on chomId_006_chr7_0013 #168

annahoge commented Jan 24, 2020

SebastianHollizeck commented Aug 4, 2020

lkhilton commented Nov 25, 2020

SebastianHollizeck commented Nov 26, 2020

lkhilton commented Nov 26, 2020

Days spent on chomId_006_chr7_0013 #168

Days spent on chomId_006_chr7_0013 #168

Comments

annahoge commented Jan 24, 2020

SebastianHollizeck commented Aug 4, 2020

lkhilton commented Nov 25, 2020

SebastianHollizeck commented Nov 26, 2020

lkhilton commented Nov 26, 2020