You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a sample that has a slightly complicated event in it that is getting miscalled. The easiest way to see it is probably with an IGV screenshot:
BWA aligns the reads with a 7bp deletion followed by 2 mismatches, though am inclined to think of it as a 9bp deletion coupled with a 2bp insertion (or a swap of 9bp of reference for 2bp of novel sequence). The original alignments are in the top half of the IGV view. The bottom is the assembly BAM from running the HaplotypeCaller. From what I see the assembly is getting it right.
But the problem is that the event extraction/genotyping goes wrong. I've run it two ways. If I run to generate a called VCF directly using:
Then I get only a single variant reported in the region (the 9bp deletion):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT test-sample
chr1 68896832 . CTTTAGTTTT C 1597.60 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.000;DP=122;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=14.52;ReadPosRankSum=1.341;SOR=0.350 GT:AD:DP:GQ:PL 0/1:67,43:110:99:1605,0,2683
If i run to generate a gvcf then things get more interesting:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT test-sample
chr1 68896800 . G <NON_REF> . . END=68896831 GT:DP:GQ:MIN_DP:PL 0/0:118:99:107:0,120,1800
chr1 68896832 . CTTTAGTTTT C,<NON_REF> 1597.60 . BaseQRankSum... GT:AD:DP:GQ:PL:SB 0/1:67,43,0:110:99:1605,0,2683,1807,2813,4620:67,0,43,0
chr1 68896841 . T *,TCC,<NON_REF> 344.02 . DP=110;Exces... GT:GQ:PL ./.:99:0,0,0,0,0,0,0,0,0,0
chr1 68896842 . G <NON_REF> . . END=68896899 GT:DP:GQ:MIN_DP:PL 0/0:71:99:41:0,99,1485
I.e. a record is emitted for the insertion but the genotype is ./. with a quality of 99, 0s for all the PLs and the other per-sample annotations we'd expect on a variant record missing.
I suspect the problem has something to do with the fact that the deletion and insertion are in cis and the insertion's anchor base is within the deletion. I'm not even sure how one would represent this as a pair of variants. I think ideally this would be emitted as a single variant with REF=CTTTAGTTTT and ALT=CCC.
The resulting VCF (and gVCF) should emit one or more variants with called genotypes that indicate a 9bp deletion followed by a 2bp insertion in cis.
Actual behavior
The 9bp deletion is emitted and a) in the gVCF the 2bp insertion is emitted without a genotype or other information that would allow genotyping, b) if a VCF is called directly the 2bp insertion is absent entirely.
The text was updated successfully, but these errors were encountered:
Bug Report
Affected tool(s) or class(es)
HaplotypeCaller
Affected version(s)
Description
I have a sample that has a slightly complicated event in it that is getting miscalled. The easiest way to see it is probably with an IGV screenshot:
BWA aligns the reads with a 7bp deletion followed by 2 mismatches, though am inclined to think of it as a 9bp deletion coupled with a 2bp insertion (or a swap of 9bp of reference for 2bp of novel sequence). The original alignments are in the top half of the IGV view. The bottom is the assembly BAM from running the HaplotypeCaller. From what I see the assembly is getting it right.
But the problem is that the event extraction/genotyping goes wrong. I've run it two ways. If I run to generate a called VCF directly using:
Then I get only a single variant reported in the region (the 9bp deletion):
If i run to generate a gvcf then things get more interesting:
yields:
I.e. a record is emitted for the insertion but the genotype is
./.
with a quality of 99, 0s for all the PLs and the other per-sample annotations we'd expect on a variant record missing.I suspect the problem has something to do with the fact that the deletion and insertion are in cis and the insertion's anchor base is within the deletion. I'm not even sure how one would represent this as a pair of variants. I think ideally this would be emitted as a single variant with
REF=CTTTAGTTTT
andALT=CCC
.Steps to reproduce
Use the command lines above with the files attached in
deletion-then-insertion-files.zip
Expected behavior
The resulting VCF (and gVCF) should emit one or more variants with called genotypes that indicate a 9bp deletion followed by a 2bp insertion in cis.
Actual behavior
The 9bp deletion is emitted and a) in the gVCF the 2bp insertion is emitted without a genotype or other information that would allow genotyping, b) if a VCF is called directly the 2bp insertion is absent entirely.
The text was updated successfully, but these errors were encountered: