Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcript not seen in results #6

Closed
JohmaChroc opened this issue Apr 9, 2021 · 3 comments
Closed

Transcript not seen in results #6

JohmaChroc opened this issue Apr 9, 2021 · 3 comments

Comments

@JohmaChroc
Copy link

Hi,
Results for my gene transcript of interest (ENST00000450313) are not being displayed whereas results for other transcripts for the same gene are. Is there a limit on the number of transcripts displayed in the results output? If so, is there a way of selecting a specific gene transcript for analysis? The impacted exon in question is present in the majority of transcripts so I may expect to see somewhat similar results (the exon in my transcript is concordant with the gnomAD canonical transcript - ENST00000372098). Thank you in advance.

MUTYH_SpliceAI

@bw2
Copy link
Collaborator

bw2 commented Apr 9, 2021

Hi @JohmaChroc , thanks for reporting this issue.
To reduce the number of transcripts in the results, I excluded most non-protein-coding transcripts that overlapped protein-coding ones. ENST00000450313 falls in this category as its transcript type is "nonsense_mediated_decay" rather than "protein_coding" in gencode 36.

My current set of GRCh38 annotations includes 176,560 transcripts, and excludes the follow transcripts by type:

     37121: lncRNA
      4085: processed_pseudogene
      3469: processed_transcript
      2077: retained_intron
      1415: miRNA
      1340: nonsense_mediated_decay
      1332: misc_RNA
      1068: snRNA
       827: transcribed_unprocessed_pseudogene
       658: snoRNA
       611: TEC
       573: unprocessed_pseudogene
       418: transcribed_processed_pseudogene
       283: rRNA_pseudogene
       128: transcribed_unitary_pseudogene
        34: scaRNA
        28: pseudogene
        26: unitary_pseudogene
        13: IG_V_pseudogene
        10: rRNA
         9: Mt_tRNA
         6: polymorphic_pseudogene
         6: IG_V_gene
         5: non_stop_decay
         3: ribozyme
         3: IG_C_pseudogene
         2: sRNA
         2: Mt_rRNA
         1: translated_processed_pseudogene
         1: scRNA
         1: IG_C_gene
         1: TR_V_gene
         1: TR_V_pseudogene

Since you (and probably others) want to see some of these, I'll try adding all transcripts.

@JohmaChroc
Copy link
Author

A follow up question: Is there any difference with the included transcripts for GRCh37/hg19?

@bw2
Copy link
Collaborator

bw2 commented Apr 11, 2021

There's no difference in how SpliceAI-lookup treats GRCh37 and GRCh38 transcripts.
In Gencode though, there are currently 236230 GRCh37 transcripts vs. 234486 GRCh38.
(GRCh37 transcripts are downloaded from https://www.gencodegenes.org/human/release_37lift37.html
and GRCh38 transcripts from https://www.gencodegenes.org/human/)

Also, even after showing all non-coding transcripts, ENST00000450313 still doesn't appear because 1-45797036-GCCTGTGGATATAGCCTCAAAAGCCAACATC-G is ~800bp to the left of the transcript. Here's the relevant line from gencode.v37lift37.annotation.gtf:

chr1	HAVANA	transcript	45797868	45806071	.	-	.	gene_id "ENSG00000132781.19_12"; transcript_id "ENST00000450313.6_6"; gene_type "protein_coding"; gene_name "MUTYH"; transcript_type "nonsense_mediated_decay"; transcript_name "MUTYH-210"; level 2; protein_id "ENSP00000408176.2"; transcript_support_level 5; hgnc_id "HGNC:7527"; tag "mRNA_end_NF"; tag "RNA_Seq_supported_only"; havana_gene "OTTHUMG00000007682.11_12"; remap_num_mappings 1; remap_status "full_contig"; remap_target_status "overlap";

This transcript was quite different in the earlier gencode.v19.annotation.gtf:

chr1	ENSEMBL	transcript	45794915	45806142	.	-	.	gene_id "ENSG00000132781.13"; transcript_id "ENST00000450313.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "MUTYH"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "MUTYH-202"; level 3; protein_id "ENSP00000408176.1"; tag "basic"; havana_gene "OTTHUMG00000007682.5";

@bw2 bw2 closed this as completed Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants