You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Im finding a number of variants called by dysgu end up being labelled as false-positives, but look to me like they should be true-positives. If I change to -p 0 they are labelled as true, which is a bit confusing as they are symbolic SVs without the reference sequence filled out. For example, using the HG002 GIAB dataset, the reference SV is:
1 36733251 29421 A <DEL> . PASS SVMETHOD=DYSGUv1.6.0;SVTYPE=DEL;END=36733476;CHR2=1;GRP=29421;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=225;CONTIGA=CTGGTATTACAGGCGTGAGCCACCGCGCCCTGCCTATAATAAGAATCTTAAATAATTTCTTCTGCAATTAATTTCAGAGTCACAGTGTATGTGTATGTGTGGGTGAAGGTGATTCTCCTTTAAAAGCAAAGTTTCTTTACCACCTAATTCCATTTTTTAGGACATGGCTGAAGTAGTAAAAAAGGAATGTTCCCCTCTATTCGTGTATAATTTTTAAGTTTTTTTGTCAGCAATACAGCTGTTCCGAATCATCATTTTACTGATGAAAAAATAGAGGTAAAAACACATGCACCAAATAAGAGTTTGTGGTTTATTCAGTAGGAGCCTTAGTTTTGGTAAATTTCTCTCATTTAGGGACCCTGAACACTTGTTTGGCAGCGGTTATGTTTTTCAGCTTCTTACCTACTAGTGTAGTGAGATCAGTTCCACCCAATTCCAGGGGTATTGATACTTTGTGGGGAGAAGAAAGAGGGAAGAAAGCAATTAGTAATAGTCAAACAAAAATTAAAAAACTAACTCGGCTGGGCGCGGTGGCTCATACCTGTAATCCCAGCACTTTCAGAGGCCGAGGCAGGTGGATCACCTGAGGCCAGGAGTTTGAGGCCAGCCTGGCCAATATGGTGAAACCCCATCTCTACTGACAATACAAGAATTAGTTGGGTGTGGTAGCACGCAGCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCAAGAGGTGAAGATTGTAGTGAGCTGAGATCATGTGCCACTGTACTC;KIND=extra-regional;GC=42.49;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=32;OL=0;SU=42;WR=21;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=pacbio GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB 0/1:198:60.0:42:21:0:0:0:0:52.11:1:8:13:0:0:0:0.615:0.571:0.929:0.951
This is labelled as FP. If I collect the reference sequence between POS and END, and put it in the ref column of the vcf, the variant is labelled as true-positive. Also there are no nearby calls from dysgu, or other reference SVs.
1 106969624 67156 A <DEL> . PASS SVMETHOD=DYSGUv1.6.0;SVTYPE=DEL;END=106969786;CHR2=1;GRP=67156;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=162;CONTIGA=AAGCTGGAAACCATCATTCTCAGCAAACTATCGCAAGGACAAAAAACCAAACACCGCATGTTATCACTCATAAGTGGGAATTGAATAATGAGAACACTTGGACACAGGAAGGGAACATCACACACCGGGGCCTGTCATGGGGTGGGGGGAGTGGAGAGGGATAGCATTAGGAGAAATACCGAATGTAAATGACAAGTTAATCAGTGCAGCACACCAACATGACGCGTGTATACATATGTAACAAACCTGCACGTTGTGCACATGTACCCTAGAACTTAAAGTATAATAATAAAAAAATATATAAAATTAGCTGGGCATGGTGGCACATGTCTGTAATCCCAGCCACTCAGGAGGCTGAGGCAAGAGAATCGCTTGAACCCAGGAGGTGAAGGTTGTAGAGAGCTGAGATCACGCCATTGCACTCCAGATATATAGATAGATAGATAGATATACGATATCTATATATATATGGATATATACATACGTATCTATATATATAGACATATATATAGATACGTATCTATATATAGATATGTATAGATATGAGGGGAGCTATATTCTACAAAGCCACAGGTGCAGAGCTGCCCAAGGCTGTAGGAGCCCACCTCTTGCATTGGCATGACTTGGATGTGAGACATGGAGTCAAAAGAGATCATTTTGGAACTTCCAAGTTTAATGACTGCCCTGTTGAATTTTGGACGTTCATGGGGCCTGTAGCCCCTTTGTTTTGGCCAATTTCTCCCATTTGGAATGCGTATATTTACCCAATGGCTGTACCCCCATTGTATCAGGGAAGTAATTAACTTGCTTTTGATTTTACAAGCTCAGAGGCAGAAGA;KIND=extra-regional;GC=41.53;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=126;OL=0;SU=82;WR=41;PE=0;SR=0;SC=0;BND=0;LPREC=1;RT=pacbio GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB 1/1:51:60.0:82:41:0:0:0:0:50.17:5:26:15:0:0:0:0.0:0.0:0.759:0.938
This is also labelled as FP.
Most other symbolic SVs are labelled as expected though. Do you know why these examples are not labelled as true? Is it anything to do with the reference SV having both REF and ALT seqs listed? Thanks!
The text was updated successfully, but these errors were encountered:
Another user had a similar issue recently (#163). Truvari 4.1 (#164) has a change that will now warn users about using --pctsim != 0 and non-sequence-resolved calls (code). That post also has a script to help users fill in deletions, but I think you've already figured out that part. Needing to ignore sequence or altering the input VCFs isn't an ideal solution, but as callers (particularly those using long reads) are getting better at fully resolving SVs, and Truvari moves towards super accurate comparison (details) , and the VCF format specs are beginning to change the use of symbolic alts (details), the 'wild west' of SV representations seems to be shrinking.
Hi @ACEnglish,
Im finding a number of variants called by dysgu end up being labelled as false-positives, but look to me like they should be true-positives. If I change to
-p 0
they are labelled as true, which is a bit confusing as they are symbolic SVs without the reference sequence filled out. For example, using the HG002 GIAB dataset, the reference SV is:The dysgu call is:
This is labelled as FP. If I collect the reference sequence between POS and END, and put it in the ref column of the vcf, the variant is labelled as true-positive. Also there are no nearby calls from dysgu, or other reference SVs.
Here is another example, refernce SV is:
Dysgu call was:
This is also labelled as FP.
Most other symbolic SVs are labelled as expected though. Do you know why these examples are not labelled as true? Is it anything to do with the reference SV having both REF and ALT seqs listed? Thanks!
The text was updated successfully, but these errors were encountered: