Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variant not scored #137

Open
carolinehey opened this issue Aug 30, 2023 · 5 comments
Open

variant not scored #137

carolinehey opened this issue Aug 30, 2023 · 5 comments

Comments

@carolinehey
Copy link

Hello,
I understand your explanation regarding why some variants are not scored, but none of the possibilities seem to explain why my variant is not scored. Do you have any suggestions?
NM_000455.5:c.597+14delA
image

@kishorejaganathan
Copy link
Contributor

Could you give me the variant in VCF format?

@rodrigodealexandre
Copy link

Dear SpliceAI Staff,

I am currently facing an issue with a variant in my database. I attempted to run SpliceAI locally for this variant using multiple parameters (-D), but it failed. Strangely, the same variant seems to produce a result on the SpliceAI website.

I am working on a script to create a database that annotates only new variants using HG38. To achieve this, I created a VCF file with an HG38 header using UCSC's chromosome length size information for GRCh38/HG38 from https://genome.ucsc.edu/cgi-bin/hgTracks?chromInfoPage=.

My VCF file looks like this:

##fileformat=VCFv4.2
##fileDate=20231010
##reference=GRCh38/hg38
##contig=<ID=chr1,length=248956422>
##contig=<ID=chr2,length=242193529>
##contig=<ID=chr3,length=198295559>
##contig=<ID=chr4,length=190214555>
##contig=<ID=chr5,length=181538259>
##contig=<ID=chr6,length=170805979>
##contig=<ID=chr7,length=159345973>
##contig=<ID=chr8,length=145138636>
##contig=<ID=chr9,length=138394717>
##contig=<ID=chr10,length=133797422>
##contig=<ID=chr11,length=135086622>
##contig=<ID=chr12,length=133275309>
##contig=<ID=chr13,length=114364328>
##contig=<ID=chr14,length=107043718>
##contig=<ID=chr15,length=101991189>
##contig=<ID=chr16,length=90338345>
##contig=<ID=chr17,length=83257441>
##contig=<ID=chr18,length=80373285>
##contig=<ID=chr19,length=58617616>
##contig=<ID=chr20,length=64444167>
##contig=<ID=chr21,length=46709983>
##contig=<ID=chr22,length=50818468>
##contig=<ID=chrX,length=156040895>
##contig=<ID=chrY,length=57227415>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
chr17	43125251	.	C	A	.	.	.
chr17	43125257	.	C	A	.	.	.
chr17	43125749	.	C	A	.	.	.

The original file contains more entries, but the variant that did not yield a result is on the third line. I used the following command to run SpliceAI: spliceai -I new_calls.vcf -O teste.vcf -R /mnt/d/1-bioinfotools/HG38/hg38.fa -A grch38 -D 1000. In this command, new_calls.vcf is the VCF file mentioned above, and my HG38 fasta file was downloaded from UCSC. I tried different -D inputs and ran it without the -D option.

The resulting output in teste.vcf was as follows:

##INFO=<ID=SpliceAI,Number=.,Type=String,Description="SpliceAIv1.3.1 variant annotation. These include delta scores (DS) and delta positions (DP) for acceptor gain (AG), acceptor loss (AL), donor gain (DG), and donor loss (DL). Format: ALLELE|SYMBOL|DS_AG|DS_AL|DS_DG|DS_DL|DP_AG|DP_AL|DP_DG|DP_DL">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
chr17	43125251	.	C	A	.	.	SpliceAI=A|BRCA1|0.00|0.00|0.11|0.10|-801|-343|-69|26
chr17	43125257	.	C	A	.	.	SpliceAI=A|BRCA1|0.00|0.00|0.04|0.08|-813|-349|-75|20
chr17	43125749	.	C	A	.	.	.

I also tried analyzing this variant in isolation and changing the region, but I consistently received similar results. Could this issue be related to a specific transcript? I noticed that the gene NBR2 is used for annotation on the SpliceAI website, but I couldn't find it in spliceai/annotations/grch38.txt. However, I think it should use BRCA1 gene since it is less than 400pb from the first non conding exon. Any insights or suggestions on resolving this issue would be greatly appreciated.

Thank you for your assistance.

@kishorejaganathan
Copy link
Contributor

The issue is due to the transcript annotations. SpliceAI uses RNA context and not DNA context, and it uses the annotation file to determine which parts of the DNA are transcribed (it does not assign scores for variants outside this region). You just need to add the transcript to annotations/grch38.txt or provide a custom annotation file via the -A parameter (in the same format as the existing annotation files).

@rodrigodealexandre
Copy link

rodrigodealexandre commented Oct 13, 2023

The issue is due to the transcript annotations. SpliceAI uses RNA context and not DNA context, and it uses the annotation file to determine which parts of the DNA are transcribed (it does not assign scores for variants outside this region). You just need to add the transcript to annotations/grch38.txt or provide a custom annotation file via the -A parameter (in the same format as the existing annotation files).

Hi there @kishorejaganathan, yes, I am aware of that. The transcript for the BRCA1 gene is located in the file SpliceAI/spliceai/annotations/grch38.txt

#NAME	CHROM	STRAND	TX_START	TX_END	EXON_START	EXON_END
BRCA1	17	-	43045628	43125483	43045628,43047642,43049120,43051062,43057051,43063332,43063873,43067607,43070927,43074330,43076487,43079333,43082403,43090943,43091434,43095845,43097243,43099774,43104121,43104867,43106455,43115725,43124016,43125270,	43045802,43047703,43049194,43051117,43057135,43063373,43063951,43067695,43071238,43074521,43076611,43079399,43082575,43091032,43094860,43095922,43097289,43099880,43104261,43104956,43106533,43115779,43124115,43125483,

The first exon is delimited to the position chr17:43125483, my variant is located at chr17:43125749, which is 266 bp away from the exon acceptor within the 'promoter' region. Shouldn't the argument -D 1000 have called the nearest gene within the -D range?, therefore BRCA1 gene?"

@kishorejaganathan
Copy link
Contributor

The annotation file acts like a filter first, so all variants outside TX_START-TX_END will not get annotated regardless of the choice of D (which comes into play much later).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants