Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent calling results after rerun with the same set of bam files and parameters #621

Open
gseryogin opened this issue Mar 13, 2023 · 6 comments

Comments

@gseryogin
Copy link

Hi!
Having one normal and two tumor bam files I got different calling results for a clinically important gene fusion KIF5B-RET.

These are records of two breakpoints, called first time from the specified set of bam files:

chr10	32016769	gridss170bb_38o	C	[chr10:43111683[C	697.73	PASS	ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=2X100M;ASQ=162.56;ASRP=1;ASSR=22;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-5972;BEIDH=0,101;BEIDL=80,0;BQ=0;BSC=0;BSCQ=0;BUM=0;BUMQ=0;BVF=0;CAS=0;CASQ=0;CIPOS=0,1;CIRPOS=-1,0;CQ=697.73;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=C;IC=0;IHOMPOS=0,1;IQ=0;MATEID=gridss170bb_38h;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=372.61;REF=22;REFPAIR=0;RP=1;RPQ=14.65;SB=0.5;SC=2X100M;SR=6;SRQ=147.91;SVTYPE=BND;VF=11;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913	GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF	.:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:5:0:0:0:0:0:0	.:0.5:0:0:0:0:162.56:1:16:0:0:0:0:0:0:0:0:0:0:0:0:566.12:241.01:6:0:1:14.65:6:147.91:6	.:0.313:0:0:0:0:0:0:6:0:0:0:0:0:0:0:0:0:0:0:0:131.6:131.6:11:0:0:0:0:0:5
chr10	43111683	gridss170bb_38h	C	[chr10:32016769[C	697.73	PASS	ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=1X80M;ASQ=372.61;ASRP=1;ASSR=22;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-5972;BEIDH=80,0;BEIDL=0,101;BMQ=60;BMQN=60;BMQX=60;BQ=527.28;BSC=15;BSCQ=347.28;BUM=3;BUMQ=180;BVF=6;CAS=0;CASQ=0;CIPOS=-1,0;CIRPOS=0,1;CQ=697.73;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=G;IC=0;IHOMPOS=-1,0;IQ=0;MATEID=gridss170bb_38o;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=162.56;REF=20017;REFPAIR=0;RP=1;RPQ=14.65;SB=0.604651;SC=1X80M;SR=6;SRQ=147.91;SVTYPE=BND;VF=11;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913	GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF	.:0:0:0:0:0:0:0:0:0:0:0:60:0:0:1:60:1:0:0:0:0:0:7082:0:0:0:0:0:0	.:0.0009476:0:0:0:0:241.01:1:16:0:0:0:204.38:6:144.38:1:60:1:0:0:0:566.12:162.56:6326:0:1:14.65:6:147.91:6	.:0.000756:0:0:0:0:131.6:0:6:0:0:0:262.89:9:202.89:1:60:4:0:0:0:131.6:0:6609:0:0:0:0:0:5

And these are records of the same two breakpoints, called second time from the same set of bam files:

chr10	32016769	gridss170bb_38o	C	[chr10:43111683[C	518.14	PASS	ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=2X87M;ASQ=162.56;ASRP=1;ASSR=14;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-6016;BEIDH=0,88;BEIDL=80,0;BQ=0;BSC=0;BSCQ=0;BUM=0;BUMQ=0;BVF=0;CAS=0;CASQ=0;CIPOS=0,1;CIRPOS=-1,0;CQ=518.14;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=C;IC=0;IHOMPOS=0,1;IQ=0;MATEID=gridss170bb_38h;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=193.02;REF=22;REFPAIR=0;RP=1;RPQ=14.65;SB=0.45;SC=2X100M;SR=6;SRQ=147.91;SVTYPE=BND;VF=8;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913	GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF	.:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:5:0:0:0:0:0:0	.:0.5:0:0:0:0:162.56:1:12:0:0:0:0:0:0:0:0:0:0:0:0:474.99:149.87:6:0:1:14.65:6:147.91:6	.:0.154:0:0:0:0:0:0:2:0:0:0:0:0:0:0:0:0:0:0:0:43.15:43.15:11:0:0:0:0:0:2
chr10	43111683	gridss170bb_38h	C	[chr10:32016769[C	518.14	PASS	ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=1X80M;ASQ=193.02;ASRP=1;ASSR=14;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-6016;BEIDH=80,0;BEIDL=0,88;BMQ=60;BMQN=60;BMQX=60;BQ=527.28;BSC=15;BSCQ=347.28;BUM=3;BUMQ=180;BVF=9;CAS=0;CASQ=0;CIPOS=-1,0;CIRPOS=0,1;CQ=518.14;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=G;IC=0;IHOMPOS=-1,0;IQ=0;MATEID=gridss170bb_38o;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=162.56;REF=20017;REFPAIR=0;RP=1;RPQ=14.65;SB=0.6;SC=1X80M;SR=6;SRQ=147.91;SVTYPE=BND;VF=8;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913	GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF	.:0:0:0:0:0:0:0:0:0:0:0:60:0:0:1:60:1:0:0:0:0:0:7082:0:0:0:0:0:0	.:0.0009476:0:0:0:0:149.87:1:12:0:0:0:204.38:6:144.38:1:60:1:0:0:0:474.99:162.56:6326:0:1:14.65:6:147.91:6	.:0.0003025:0:0:0:0:43.15:0:2:0:0:0:262.89:9:202.89:1:60:7:0:0:0:43.15:0:6609:0:0:0:0:0:2

QUAL, several INFO and FORMAT fields got different values (more noticable for the second tumor sample), after reruning gridss with exactly the same set of input bams and parameters. Especially concerning is the difference in read counts supporting varaint allele (VF), which are different and became considerably lower after second rerun (was 11 became 8).

Could you, please, let me know, is there any chance to avoide that sort of stochasticity, or, if this behaviour is an inevitable, give a piece of advice on how to cope with that?
This is an inportant isssue, since it may affect somatic filters and and increase the number of false negative variants.

@stsergbg
Copy link

Does that mean that gridss can produce different results on a sample rerun and has stochastic properties?

@RobinVanSchendel
Copy link

It appears that there are different IDs in the output. Look at this:

BEID=asm170-4188,asm172-5972 (first run)

vs

BEID=asm170-4188,asm172-6016 (second run)

seems to suggest that the same samples were not run

@stsergbg
Copy link

I have reproduced this behaviour, running the same files 4 times gives different qualities for same variants. Also variant coordinates can change by 1-2bp.

@RobinVanSchendel
Copy link

I have reproduced this behaviour, running the same files 4 times gives different qualities for same variants. Also variant coordinates can change by 1-2bp.

thanks for testing, that sounds very worrying indeed!

@d-cameron
Copy link
Member

d-cameron commented Apr 2, 2023

It looks like there's a difference of 6 split read that were assembled in the first but not second run. This seems to indicate that #450 did not fully fix the issue of multi-threaded determinism. Can you confirm:
a) what version(s) of GRIDSS were run
b) what version of dependent tools (e..g bwa) were run (the start GRIDSS log file will contain this)
c) whether the input.bam.gridss.working/input.bam.sv.bam files are identical between the two runs
d) whether there were any command-line differences (e.g. changing the order of the input file can change the output because the reads will be processed in a different order)

It appears that there are different IDs in the output. Look at this:
BEID=asm170-4188,asm172-5972 (first run)
vs
BEID=asm170-4188,asm172-6016 (second run)
seems to suggest that the same samples were not run

This indicates that the assembly sub-process (asm172) processing the RHS had output a different number of assemblies by the time it got to chr10:43111683. That is, the root cause could be any difference in the 10Mb leading up the chr10:43111683. That said, if the inputs are identical then the output should be identical which clearly is not the case here.

Note that 'identical' inputs from the assembly sub-process is not necessarily the same as identical inputs to gridss overall. The GRIDSS pre-processing does a (bwa) realignment of soft-clipped reads (required to support for bowtie2 inputs, optional for bwa-aligned inputs). The input split reads/discordant read pairs to the asm/variant calling steps are the reads in the input.bam.gridss.working/input.bam.sv.bam files, not the input files given on the command line. Any non-determinism here will propagate downstream to asm & calling.

@AlexanderKotyurgin
Copy link

I have the same problem. Here are the results of two runs on the same files:

chr10	32016769	gridss170bb_38o	C	[chr10:43111683[C	512.74	PASS	ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=2X87M;ASQ=162.56;ASRP=1;ASSR=14;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-6279;BEIDH=0,88;BEIDL=80,0;BQ=0;BSC=0;BSCQ=0;BUM=0;BUMQ=0;BVF=0;CAS=0;CASQ=0;CIPOS=0,1;CIRPOS=-1,0;CQ=512.74;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=C;IC=0;IHOMPOS=0,1;IQ=0;MATEID=gridss170bb_38h;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=187.62;REF=22;REFPAIR=0;RP=1;RPQ=14.65;SB=0.45;SC=2X100M;SR=6;SRQ=147.91;SVTYPE=BND;VF=8;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913	GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF	.:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:5:0:0:0:0:0:0	.:0.455:0:0:0:0:162.56:1:10:0:0:0:0:0:0:0:0:0:0:0:0:423.85:98.73:6:0:1:14.65:6:147.91:5	.:0.214:0:0:0:0:0:0:4:0:0:0:0:0:0:0:0:0:0:0:0:88.89:88.89:11:0:0:0:0:0:3
chr10	43111683	gridss170bb_38h	C	[chr10:32016769[C	512.74	PASS	ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=1X80M;ASQ=187.62;ASRP=1;ASSR=14;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-6279;BEIDH=80,0;BEIDL=0,88;BMQ=60;BMQN=60;BMQX=60;BQ=527.28;BSC=15;BSCQ=347.28;BUM=3;BUMQ=180;BVF=9;CAS=0;CASQ=0;CIPOS=-1,0;CIRPOS=0,1;CQ=512.74;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=G;IC=0;IHOMPOS=-1,0;IQ=0;MATEID=gridss170bb_38o;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=162.56;REF=20017;REFPAIR=0;RP=1;RPQ=14.65;SB=0.6;SC=1X80M;SR=6;SRQ=147.91;SVTYPE=BND;VF=8;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913	GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF	.:0:0:0:0:0:0:0:0:0:0:0:60:0:0:1:60:1:0:0:0:0:0:7082:0:0:0:0:0:0	.:0.0007898:0:0:0:0:98.73:1:10:0:0:0:204.38:6:144.38:1:60:2:0:0:0:423.85:162.56:6326:0:1:14.65:6:147.91:5	.:0.0004537:0:0:0:0:88.89:0:4:0:0:0:262.89:9:202.89:1:60:6:0:0:0:88.89:0:6609:0:0:0:0:0:3
chr10	32016769	gridss170bb_38o	C	[chr10:43111683[C	494.4	LOW_QUAL	ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=2X100M;ASQ=162.56;ASRP=1;ASSR=13;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-6059;BEIDH=0,102;BEIDL=80,0;BQ=0;BSC=0;BSCQ=0;BUM=0;BUMQ=0;BVF=0;CAS=0;CASQ=0;CIPOS=0,1;CIRPOS=-1,0;CQ=494.4;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=C;IC=0;IHOMPOS=0,1;IQ=0;MATEID=gridss170bb_38h;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=169.28;REF=22;REFPAIR=0;RP=1;RPQ=14.65;SB=0.421053;SC=2X100M;SR=6;SRQ=147.91;SVTYPE=BND;VF=5;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913	GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF	.:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:5:0:0:0:0:0:0	.:0.455:0:0:0:0:162.56:1:13:0:0:0:0:0:0:0:0:0:0:0:0:494.4:169.28:6:0:1:14.65:6:147.91:5	.:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:11:0:0:0:0:0:0
chr10	43111683	gridss170bb_38h	C	[chr10:32016769[C	494.4	LOW_QUAL	ANRP=0;ANRPQ=0;ANSR=0;ANSRQ=0;AS=1;ASC=1X80M;ASQ=169.28;ASRP=1;ASSR=13;BA=0;BAQ=0;BASRP=0;BASSR=0;BEID=asm170-4188,asm172-6059;BEIDH=80,0;BEIDL=0,102;BMQ=60;BMQN=60;BMQX=60;BQ=527.28;BSC=15;BSCQ=347.28;BUM=3;BUMQ=180;BVF=12;CAS=0;CASQ=0;CIPOS=-1,0;CIRPOS=0,1;CQ=494.4;EVENT=gridss170bb_38;HOMLEN=1;HOMSEQ=G;IC=0;IHOMPOS=-1,0;IQ=0;MATEID=gridss170bb_38o;MQ=55.89;MQN=22;MQX=60;RAS=1;RASQ=162.56;REF=20017;REFPAIR=0;RP=1;RPQ=14.65;SB=0.588235;SC=1X80M;SR=6;SRQ=147.91;SVTYPE=BND;VF=5;SIMPLE_SVTYPE=INV;SIMPLE_SVLEN=11094913	GT:AF:ANRP:ANRPQ:ANSR:ANSRQ:ASQ:ASRP:ASSR:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF	.:0:0:0:0:0:0:0:0:0:0:0:60:0:0:1:60:1:0:0:0:0:0:7082:0:0:0:0:0:0	.:0.0007898:0:0:0:0:169.28:1:13:0:0:0:204.38:6:144.38:1:60:2:0:0:0:494.4:162.56:6326:0:1:14.65:6:147.91:5	.:0:0:0:0:0:0:0:0:0:0:0:262.89:9:202.89:1:60:9:0:0:0:0:0:6609:0:0:0:0:0:0

Assemblies that differ between runs:
asm_1.txt
asm_2.txt
All input.bam.sv.bam files are identical, the only differences are in sv-assembly files. Both runs have been launched with the same commands.
Versions of software:

  1. gridss: 2.13.2
  2. bwa: 0.7.17-r1188
  3. java: openjdk 11.0.13
  4. RepeatMasker: 4.1.0
  5. BLAST: 2.11.0+
  6. kraken2: 2.1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants