Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Days spent on chomId_006_chr7_0013 #168

Open
annahoge opened this issue Jan 24, 2020 · 4 comments
Open

Days spent on chomId_006_chr7_0013 #168

annahoge opened this issue Jan 24, 2020 · 4 comments

Comments

@annahoge
Copy link

Hello, and thank you for your tool!

When I run Strelka2 somatic with 24+ cores and 32G+ memory for 30x WGS data (with hg38 masked as recommended in the User Guide), ~1/2 of the samples get stuck on

Longest ongoing running task name: 'CallGenome+callGenomeSegment_chromId_006_chr7_0013'

for multiple hours, while the other jobs finish in a couple hours. >1/3 of the samples (out of ~70) keep running for >a week, until I shut them down. Restarting the job from where it left off does not fix the problem. Is this a bug you could please address? I can't share the data, but I don't find any regions of abnormally high depth on chr7.

Configuration example:
/path/to/configureStrelkaSomaticWorkflow.py
--normalBam /path/to/normal.bam
--tumorBam /path/to/tumor.bam
--referenceFasta /path/to/hg38.fa
--runDir /path/to/dir
--callRegions /path/to/strelka2-provided-call-regions-for-hg38.bed.gz

Run example (on HPC cluster node):
/path/to/runWorkflow.py -m local -j 24

Log example:
...[156412_1] [WorkflowRunner] [StatusUpdate] Workflow specification is complete?: True
...[156412_1] [WorkflowRunner] [StatusUpdate] Task status (waiting/queued/running/complete/error): 8/0/1/572/0
...[156412_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task time (hrs): 0.0000
...[156412_1] [WorkflowRunner] [StatusUpdate] Longest ongoing queued task name: ''
...[156412_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task time (hrs): 17.6356
...[156412_1] [WorkflowRunner] [StatusUpdate] Longest ongoing running task name: 'CallGenome+callGenomeSegment_chromId_006_chr7_0013'

@SebastianHollizeck
Copy link

Hey,
i have a similar issue, but with different segments.
The regions that strelka2 seems to stall on are centromeric regions in my case.

The last entry in the vcf from that region is

chr4	49101739	.	T	C	.	LowEVS	SOMATIC;QSS=16;TQSS=2;NT=ref;QSS_NT=16;TQSS_NT=2;SGT=TT->CT;DP=5386;MQ=22.21;MQ0=3174;ReadPosRankSum=-0.01;SNVSB=36.87;SomaticEVS=0.12	DP:FDP:SDP:SUBDP:AU:CU:GU:TU	960:72:0:0:1,18:7,24:0,1:880,3051	900:65:1:0:0,12:17

And you can see, that the DP value is already through the roof (for a WGS).

I tried to supply a bed file with centromeres excluded

tabix /data/reference/dawson_labs/bed_indexed/GRCh38/GRCh38WithoutCentromeres.bed.gz chr4
chr4	0	49712061
chr4	51743951	190214555

But it still takes MUCH longer in these segments in contrast to all other.
These are the last running jobs (Task status (waiting/queued/running/complete/error): 12/0/6/1150/0)

CallGenome		running	0	2020-08-03T11:47:35.632059Z
callGenomeSegment_chromId_003_chr4_0004	CallGenome	running	0	2020-08-03T13:52:41.519421Z
callGenomeSegment_chromId_004_chr5_0004	CallGenome	running	0	2020-08-03T12:05:20.552800Z
callGenomeSegment_chromId_009_chr10_0003	CallGenome	running	0	2020-08-03T14:29:52.935066Z
callGenomeSegment_chromId_019_chr20_0002	CallGenome	running	0	2020-08-03T14:29:17.755184Z
callGenomeSegment_chromId_020_chr21_0000	CallGenome	running	0	2020-08-03T13:52:01.090603Z
callGenomeSegment_chromId_055_chr17_KI270729v1_random_0000	CallGenome	running	0	2020-08-03T12:50:51.351779Z

And all of those regions contain the centromere of their respective chromosome.

I do suspect, that there is something, where the bailout, that surely happens for the other chromosomes, just doesnt happen here. It would be great if there was a fix for this.

@lkhilton
Copy link

Leaving a comment to vote for a solution to this issue. We've run into this problem frequently though it seems somewhat arbitrary which samples take the longest to run, and it seems the bin that contains the centromere on Chr4 consistently takes the longest to run.

@SebastianHollizeck
Copy link

I actually found a solution for me by including

extraVariantCallerArguments = --max-input-depth 1000

into the strelka.ini
This removes the problem of mapping depth around the centromeres

Obviously you need to adjust the depth depending on your input.

@lkhilton
Copy link

Thanks, I'll give that a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants