Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

most severe consequence polypyrimidine stretch #2031

Open
dnil opened this issue Apr 24, 2023 · 8 comments
Open

most severe consequence polypyrimidine stretch #2031

dnil opened this issue Apr 24, 2023 · 8 comments

Comments

@dnil
Copy link

dnil commented Apr 24, 2023

Describe the bug

We are seeing more issues with the most severe consequence. We recently fixed the clinical filter in scout (Clinical-Genomics/scout#3858) to allow showing these nice little fellows, but that falls a bit short when the variant is not loaded. Currently, they also get a penalty from the rank model for being less clinically important, like this causative one from pureamoeba:

1       216210952       SV_221_1:tiddit|CNVnator_del_142:cnvnator|MantaDEL:10115:0:1:0:0:0:manta        N       <DEL>   .       PASS    Annotation=USH2A;CIEND=0,2;CIPOS=0,2;CSQ=deletion|splice_polypyrimidine_tract_variant&coding_sequence_variant&intron_variant&feature_truncation|LOW|USH2A|7399|Transcript|NM_206933.4|protein_coding|22-32/72|21-32/71||||||||||-1||EntrezGene|12601|YES||||NP_996816.3||||||RefSeq|||||||;CTG=CTCTTAGTAGCCATTGCTTTGATTGTTCTTTGTTTTCAGGATTGTACTGGTAAAGCCATATTTCAACTCCTATTACAATTATTTGATAAAATGTTTCAGGATATTGATTCCACCTGTTTAAAATTTCCATTGAAATTTGTGAGCCTAAAAATCAATGAGTGAAGAGTAAGTTTCAAAGGATTCAAATTAAATGCTGTTGTAGCAAAACTGTCAATTATCCCGCATTCTTTTATTAATTTTGCTTTTATTTGTAGAATGGCCTTTCCT;END=216301040;FOUNDBY=3;GeneticModels=pureamoeba:AD;HOMLEN=2;HOMSEQ=AA;IMPRECISE;LFA=16,11;LFB=19,11;LTE=14,11;REGIONA=216210823,216210952;REGIONB=216301040,216301211;RankResult=4|-12|5|0|3|0|3|0;RankScore=pureamoeba:3;SUPP_VEC=111;SVLEN=90088;SVTYPE=DEL;VARID=CNVnator_del_142:cnvnator|MantaDEL:10115:0:1:0:0:0:manta;clingen_cgh_benign=1;clinical_genomics_loqusFrq=0.00042;clinical_genomics_loqusObs=3;cnvnator_CHROM=CNVnator_del_142|1;cnvnator_FILTERS=CNVnator_del_142|PASS;cnvnator_INFO=CNVnator_del_142|END:216301000|SVTYPE:DEL|SVLEN:-90000|IMPRECISE|natorRD:0.498228|natorP1:1.77081e-12|natorP2:0|natorP3:1.81105e-12|natorP4:0|natorQ0:0.0024426;cnvnator_POS=CNVnator_del_142|216211001;cnvnator_QUAL=CNVnator_del_142|.;cnvnator_SAMPLE=CNVnator_del_142;manta_CHROM=MantaDEL_10115_0_1_0_0_0|1;manta_FILTERS=MantaDEL_10115_0_1_0_0_0|PASS;manta_INFO=MantaDEL_10115_0_1_0_0_0|END:216301039|SVTYPE:DEL|SVLEN:-90090|CIPOS:0:2|CIEND:0:2|HOMLEN:2|HOMSEQ:AA;manta_POS=MantaDEL_10115_0_1_0_0_0|216210949;manta_QUAL=MantaDEL_10115_0_1_0_0_0|819;manta_SAMPLE=MantaDEL_10115_0_1_0_0_0;most_severe_consequence=12601:deletion|splice_polypyrimidine_tract_variant;natorP1=1.77081e-12;natorP2=0;natorP3=1.81105e-12;natorP4=0;natorQ0=0.0024426;natorRD=0.498228;pureamoeba_tiddit_ACC11683A12_CHROM=SV_221_1|1;pureamoeba_tiddit_ACC11683A12_FILTERS=SV_221_1|PASS;pureamoeba_tiddit_ACC11683A12_INFO=SV_221_1|SVTYPE:DEL|SVLEN:90088|END:216301040|REGIONA:216210823:216210952|REGIONB:216301040:216301211|LFA:16:11|LFB:19:11|LTE:14:11|CTG:CTCTTAGTAGCCATTGCTTTGATTGTTCTTTGTTTTCAGGATTGTACTGGTAAAGCCATATTTCAACTCCTATTACAATTATTTGATAAAATGTTTCAGGATATTGATTCCACCTGTTTAAAATTTCCATTGAAATTTGTGAGCCTAAAAATCAATGAGTGAAGAGTAAGTTTCAAAGGATTCAAATTAAATGCTGTTGTAGCAAAACTGTCAATTATCCCGCATTCTTTTATTAATTTTGCTTTTATTTGTAGAATGGCCTTTCCT;pureamoeba_tiddit_ACC11683A12_POS=SV_221_1|216210952;pureamoeba_tiddit_ACC11683A12_QUAL=SV_221_1|.;pureamoeba_tiddit_ACC11683A12_SAMPLE=SV_221_1|ACC11683A12|GT:0/1|CN:1|COV:37:16.750621529891426:36|DV:14|RV:11|LQ:0.0:0.0|RR:22:20|DR:24:30;set=Intersection;svdb_origin=tiddit|cnvnator|manta;tiddit_CHROM=SV_221_1|1;tiddit_FILTERS=SV_221_1|PASS;tiddit_INFO=SV_221_1|SVTYPE:DEL|SVLEN:90088|END:216301040|REGIONA:216210823:216210952|REGIONB:216301040:216301211|LFA:16:11|LFB:19:11|LTE:14:11|CTG:CTCTTAGTAGCCATTGCTTTGATTGTTCTTTGTTTTCAGGATTGTACTGGTAAAGCCATATTTCAACTCCTATTACAATTATTTGATAAAATGTTTCAGGATATTGATTCCACCTGTTTAAAATTTCCATTGAAATTTGTGAGCCTAAAAATCAATGAGTGAAGAGTAAGTTTCAAAGGATTCAAATTAAATGCTGTTGTAGCAAAACTGTCAATTATCCCGCATTCTTTTATTAATTTTGCTTTTATTTGTAGAATGGCCTTTCCT|SUPP_VEC:1;tiddit_POS=SV_221_1|216210952;tiddit_QUAL=SV_221_1|.;tiddit_SAMPLE=SV_221_1 GT:CN:COV:DV:RV:LQ:RR:DR        0/1:1:37,16.7506,36:14:11:0,0:22,20:24,30

It's a clear het deletion of several exons, called by all callers, but the most severe consequence becomes splice_polypyrimidine_tract_variant.

We can update the SV rank score model to also include this as a (potentially) severe consequence for now perhaps?

Any chances of getting VEP adapt, eg the SO scale? I still feel from the textual definition, these warrant a "transcript_ablation" though the entire transcript is not gone.

Other options could include to revert to having coding_sequence_variant higher on the severity tree. It seems to be a fallback that is usually present on this kind of variants.

Software version (please complete the following information):

  • MIP: 11.1.3

Additional context
Add any other context about the problem here.

@jemten
Copy link
Collaborator

jemten commented Apr 24, 2023

Thanks for reporting @dnil,
I believe that what's mainly driving this variant down in the ranking is that it is flagged as annotated as benign according to clingen_cgh so it get's a penalty of 12 points. Need to investigate further how that annotation ended up there. From a consequence perspective it gets 5 points from being a splice_polypyrimidine_tract_variant.

Can you clarify a little on what you mean with: Any chances of getting VEP adapt, eg the SO scale?

@dnil
Copy link
Author

dnil commented Apr 24, 2023

Ooh, nice catch - I didn't check what field had the -12! It is a recessive condition with some carriers, so not completely strange to find it among variants found benign on array at some point. I can talk to the array ppl about that part!

As for the VEP part, I mean even if this would not have got the 10 I feel it should have for transcript ablation, it would at the very least get its 7 for coding_sequence_variant if that was higher prio for most_severe_consequence than the lower scoring (5) splice_polypyrimidine_tract_variant.

We could patch that on our with raising the latter to score 7; which might arguably be consistent.

Or by swapping the order for most severe consequence prio, but then it would feel better to also discuss a bit with the VEP/SO folks about it.

@dnil
Copy link
Author

dnil commented Apr 25, 2023

I've contacted the aCGH folks, but if we can't find a simple solution for the db-export to properly cover the zygosity of calls, just ditching the benign aCGH track seems like a good option. We were about to do that for the hg38 migration anyway.

It would have helped, but fiddling with a two point difference in score for functional annotation seems moot when we have that huge malus to axe.

Since we have a fairly big normal db anyway, we should make do just fine without the aCGH-benign track, possibly leaning towards increasing the penalty for being common locally a bit to compensate the missing benigns then, if a verification is still called for anyway.

@jemten
Copy link
Collaborator

jemten commented Apr 25, 2023

As for the VEP part, I mean even if this would not have got the 10 I feel it should have for transcript ablation, it would at the very least get its 7 for coding_sequence_variant if that was higher prio for most_severe_consequence than the lower scoring (5) splice_polypyrimidine_tract_variant.

Hmm for the structural variants we do rank coding_sequence_variant as a more severe consequence than the splice_polymeridine_tract one so it should have gotten a 7. I need to look into whats going on here.

Let me know if you think we should remove the benign cgh database. But yes this would mean a new validation and thus we would probably do it when we update tiddit and update our loqusDB dump.

@dnil
Copy link
Author

dnil commented Apr 25, 2023

I am leaning that way: this would still be an issue with several recessive disorders with somewhat recurrent het carriers. Agne had some idea about another recurrent CNV file that he and Jesper made - waiting for a confirmation on wether these are still in there.

@dnil
Copy link
Author

dnil commented Apr 26, 2023

Had another discussion with Agne: it was the CNV cluster track they used for collapsing the local aCGH db a bit, and weeded to not include pathogenic calls from said db. The same issue kind of remains, and this is a much smaller set than say loqus or svdb. I think we are still better off with the local data for count/frequency info, and then the pathogenic calls are great to complement the lack of SVs matching from ClinVar.

@jemten
Copy link
Collaborator

jemten commented Apr 28, 2023

Right, so to summarise you would like to not include the benign variants from the cgh array in the rankscore. However, we will still annotate called SV:s using that file. Did I get this right?

@dnil
Copy link
Author

dnil commented Apr 28, 2023

Yes!

On a lower prio, there is still something a little funny about the sort order for so term severity and the rank model scores not lining up - can't help but think that can come back and bite, but it would have felt best to correct at the source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants