Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variant disappears with clinical filter - why? #3858

Closed
KickiLagerstedt opened this issue Mar 17, 2023 · 11 comments
Closed

Variant disappears with clinical filter - why? #3858

KickiLagerstedt opened this issue Mar 17, 2023 · 11 comments

Comments

@KickiLagerstedt
Copy link

https://scout.scilifelab.se/cust002/F0051063/e7b97e909eedb79078bdf38fb7b18903

Gene annotation: splicing
Function annotation: splice_polypyrimidine_tract_variant

@KickiLagerstedt
Copy link
Author

https://scout.scilifelab.se/cust002/F0051801/sv/variants/29235c154511427ab50bcf6eca43cf94

Gene annotation: splicing
Function annotation: splice_polypyrimidine_tract_variant

@KickiLagerstedt KickiLagerstedt changed the title Variant dissappers with clinical filter - why? Variant disappers with clinical filter - why? Mar 17, 2023
@KickiLagerstedt KickiLagerstedt changed the title Variant disappers with clinical filter - why? Variant disappears with clinical filter - why? Mar 17, 2023
@dnil
Copy link
Collaborator

dnil commented Mar 17, 2023

The most direct answer is that splice_polypyrimidine_tract_variant is not included in the clinical filter on its own. It is fairly recently added to the SO_TERMS (we added it on our side this autumn, VEP a bit earlier, and MIP a little later). We do not have it as a SEVERE_SO_TERM. Arguably, it should not be - a polypy variant kind of needs more careful splice model inspection, or other molecular experimental data, to determine if it is really relevant on its own - but we are always happy to discuss.

Then there are some complications for both these variants.
1, the deletion is fairly large, and while it encompasses only UTR exons on the transcript VEP selects, it does touch coding exons in alternative transcripts, and I would have needed to spend much longer to figure out if it was potentially disease relevant or not. Unless you have a strong opinion, I would suggest we don't do anything about this one right now, but we are aware that consequence annotation of SVs can be tricky. Keeping the annotation gene models set up to date is important, and some progress on that is being made currently (see Schug, https://github.com/Clinical-Genomics/schug).

2, the SNV is a -13 splice change with a very high SpliceAI Acceptor Gain delta.
It is specifically for this kind of intronic variant that we really depend on ClinVar to lift variants on the lists.
It has a Conflicting ClinVar annotation, but its revision status is not at the "TRUSTED" level. I think this is unfortunate as it really only has one old VUS annotation, and then several LP and P annotations later. A community driven (re)evaluation seems due. Seeing this, we could also consider actually just letting all the ClinVar annotations through the clinical filter, but we already get many silly Conflicting LB/B showing and would get even more. A better solution would be to dig deeper into ClinVar at annotation time. We have discussed this a few times, but not really started a project on it. It would not necessarily be a quick one, as we don't think this level of info is easily available from secondary sources, but would have to be parsed out from the ClinVar db data dumps first.

I would lean towards the latter solution here. But, even in the meantime, we could also try skipping the TRUSTED level filter for a while and see how it feels?

@dnil
Copy link
Collaborator

dnil commented Mar 20, 2023

Ok, we had a local discussion, and think that
1, this is largely on VEP/ENSEMBL and the hg19 transcript set. We do know this is not really maintained, and they wisely spend their effort on later genome builds. This adds one more example where it would have been great to have hg38: probably would just have worked, and if not, we would have a good case to update ENSEMBL.

2, deep diving clinvar seems like the good future solution, but presumably happens with nf-core/rd (ping @jemten?).
For now we agree that removing the TRUSTED requirement may bring more noise than we are ready for at the moment.

@jemten
Copy link

jemten commented Mar 20, 2023

Currently we are trying to limit the development of MIP in favour of the new pipeline. Happy to discuss projects to refine the rankmodel/annotation for the new pipeline though.

@KickiLagerstedt
Copy link
Author

One more: Founder-mutaiton - duplication of exon 13 i BRCA1

https://scout.scilifelab.se/cust002/F0052362/sv/variants/3b4ecd35753a948e7e197a17abaf784c

@dnil
Copy link
Collaborator

dnil commented Mar 22, 2023

Thank you for the persistence! This one I cannot explain at a glance - I'm suspecting #3534 which gives splice_polypyrimidine_tract_variant slightly higher score than coding_sequence_variant - which is in clinical filter - in combination with filtering on most_severe_consequence rather than all transcript consequences. We'll check tomorrow!

@dnil
Copy link
Collaborator

dnil commented Mar 30, 2023

Here is another one from LK:
https://scout.scilifelab.se/cust003/23112/sv/variants/9578e6c3eb244c4a28cf1e3614800571

Again, the reason seems to be a combination of not getting severe enough terms that it ought to have, presumably because VEP can not easily tell how much is lost, it is still at least on par with an actual splice site loss.

Let's go through the SO-term order again. I think we follow the VEP order fairly well in (https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html) but there is also a step in MIP where the most_severe_consequence is selected that can be checked. It is annoying that polypyrimidine tract region modifiers get a higher impact than 'coding_sequence_variant'. I can see how they occasionally do, but the bulk of them don't.

If we still can't find any errors, one option is to evaluate including splice_polypyrimidine_tract_variant in the clinical filter, making sure we don't get too many hits left.

@jemten
Copy link

jemten commented Mar 31, 2023

I believe that there is a need to have the "most severe consequence" synchronised this across the board. Right now MIP uses the list from VEP to decide what features are the most sever ones and I think scout is using the same order. For MIP this means that we will rank and annotate the variant using splice_polypyrimidine_tract_variant rather than coding_sequence_variant. If we in the clinical filter don't believe this to be true, we should change either in MIP/Scout or in the clinical filter.

@dnil
Copy link
Collaborator

dnil commented Mar 31, 2023

A lazy and rather practical approach would be to simply add splice_polypyrimidine_tract_variant and possibly a couple of the other consequences SO-ranked above coding_sequence_variant to the clinical filter. Its hard to really fault VEP or SO for this; a polypyrimidine tract change can occasionally be very detrimental. Just not often. And a coding sequence change may not be so important - a lot of missense variation is not. What is missing here is a class in between coding_sequence_change and transcript_ablation, like making feature_truncation of one or more coding exons, say exon_ablation..

@dnil
Copy link
Collaborator

dnil commented Mar 31, 2023

Or, actually we can fault VEP a little for not being precise. The docs description of transcript_ablation is "A feature ablation whereby the deleted region includes a transcript feature". The way I (and VEP I believe) use this is for a deletion of a whole transcript, but it could have been just part of a transcript feature instead, which would have solved the first and last of these four examples. And same with transcript_amplification for example 3.

@dnil
Copy link
Collaborator

dnil commented Apr 6, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants