Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One single RefSeq and different outputs depending on selected Max Distance #8

Closed
amartins1 opened this issue Jun 2, 2021 · 9 comments

Comments

@amartins1
Copy link

Hi,
I have another question, using NM_000249.4(MLH1):c.116G>A as an example
Why do one gets:
(a) one single output/set of results when using D=50?
(b) several different outputs/sets of results when using D=500?
In this case which one to choose? Which one corresponds to NM_000249.4?
ENST00000231790.8_4
ENST00000456676.6_4
ENST00000536378.5_1
ENST00000673673.1_3
ENST00000673715.1_1
ENST00000673899.1_1
ENST00000442249.6_2
ENST00000673713.1_1
ENST00000432299.6_2
ENST00000454028.5_1
ENST00000457004.5_1
ENST00000673897.1_1
ENST00000673947.1_1
ENST00000673972.1_1
ENST00000674111.1_1
I thank you in advance for your help.
Alexandra

@bw2
Copy link
Collaborator

bw2 commented Jun 2, 2021

Hi Alexandra, when a user requests the default distance (D=50) for a SNP or small InDel, the server looks up the score in precomputed tables. These tables only contain the score for the canonical transcript. If you change D to anything other than 50 (or just change "Use Illumina's pre-computed scores:" to No) then the server will run the SpliceAI model - which takes longer to return a result, but computes scores for all transcripts that overlap your variant.

(I'm open to suggestions for how to make this behavior less confusing/surprising).

@amartins1
Copy link
Author

Hi,
I do prefer to use D=500 (not D=50)

It would be great if the user could choose between retrieving data

  • for a specific RefSeq or
  • for all RefSeq transcripts of a given gene
    In my example, I would choose the first option in order to get results for NM_000249.4(MLH1) only, the RefSeq I've specifically entered in the query window.

At this point and as is (after entering a specific NM_ and choosing D=500), I do not know which transcript to choose from the SpliceAI_Lookup output.
The interface does not display the NM_ code entered in the query window, instead it shows multiple ENST0000... descriptors unknown to me .

How to find which ENST0000.. corresponds to NM_000249.4?
It would be great if SpliceAI_Lookup results could indicate transcript IDs not only as ENST0000... but as NM_... as well

Thanks,
Alexandra

@bw2
Copy link
Collaborator

bw2 commented Jun 2, 2021

I see. That makes sense. I should be able to add NM transcript ids.

@bw2
Copy link
Collaborator

bw2 commented Jun 2, 2021

Unfortunately returning just the one transcript as you describe is relatively difficult to implement except for D=50

@amartins1
Copy link
Author

OK, got it. Anyway, if you manage to add the NM_ transcript IDs in the output, it will solve the problem, thanks!
Will you be able to implement this change any time soon?
By the way, if you have a mailing list for sending "alerts" on SpliceAI_Lookup improvements or new features I would be happy to be on it!

@bw2
Copy link
Collaborator

bw2 commented Jun 2, 2021

The RefSeq ids should be live later today.

As far as alerts, the best way is to click "Watch" in the top right of your github page and then select "Custom", and click "Releases". This way you'll get emails when new features go live. If you also want notifications about partial progress toward implementing a feature, you can select "All Activity" instead of "Custom".
image

@amartins1
Copy link
Author

Great!
Thanks ++
Alexandra

@amartins1
Copy link
Author

I just tried to analyse NM_000249.4(MLH1):c.116G>A again (Hg19, D=500, and Hg38 D=500)
Indeed, the output now shows NM_ transcripts but, unfortunately, I do not find NM_000249.4 (to my knowledge, the RefSeq mostly used by molecular diagnostic labs offering genetic testing for Lynch syndrome). Is there a way to circunvent this limitation for MLH1 or any other gene of interest?

@bw2
Copy link
Collaborator

bw2 commented Jun 3, 2021

Thanks for helping troubleshoot this. This example keeps bringing up unexpected details : )
It turns out many of the ENST ids map to more than one NM id. I just updated the interface to show all NM ids, so NM_000249.4 is now shown as well.

@bw2 bw2 closed this as completed Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants