Introme

Introme is an in silico splice predictor which evaluates a variant’s likelihood of altering splicing by combining predictions from multiple splice-scoring tools, combined with additional splicing rules, and gene architecture features. Introme can accurately predict the impact of coding and noncoding variants on splicing through investigating for the potential damage, creation or strengthening of splice elements and outperforms all leading tools that we tested.

Licensing

Introme source code is provided under the GPLv3 license. Introme combines splicing scores from several tools and third party packages provided under open source licenses, please see NOTICE for additional details. Introme is free for academic and non-commercial use. All other use requires a commercial license from Children's Cancer Institute, and potentially a commercial SpliceAI license obtained from Illumina, Inc.

Requirements

Software requirements

Docker
vcfanno
spliceai
bedtools
bcftools
samtools
htslib
R
R packages: ROCR, caret
python3
python packages: pysam, csv, Bio.Seq, argparse

Variant Annotation file requirements

Introme requires the following files to be downloaded and placed in the annotations folder in addition to the files present in this repository.

CADD v1.3 VCF created using the following instructions
SPIDEX v1.0
dbscSNV v1.1

Additional file requirements

A vcf file of variants to analyse
A gtf file, ideally containing only protein coding regions to speed up annotations (we recommend gencode)
A reference genome fasta file

Installation

WDL Install

The wdl script is labelled Introme.wdl in the wdl_scripts folder. These scripts were set up for implementation using Terra. All of the annotation files are required to be in the same folder, and specified as inputs to ensure proper annotation using vcfanno, these requirements will be further documented in the folder.

Local Install

Install the above software requirements and pull Introme.
Download the required annotation files and file requirements and place them in the annotations folder.
Update the .conf files with the correct paths (shouldn't be necessary if the same annotation files are used).
Pull MMSplice and Spliceogen into the Introme folder from the links provided (ensure the original docker files are not deleted).
Build the docker containers for MMSplice and Spliceogen using the code below. If you tag the containers differently, ensure you update the docker run section in the run_introme.sh script.

cd MMSplice
docker build -t mmsplice .

cd Spliceogen
docker build -t spliceogen .

Note: The MMSplice Docker Container requires more memory than the standard settings for Docker. Upgrade the memory to 10GB to ensure it runs.

Run introme using the run command ./run_introme.sh -r genome.fa -g annotation.gtf -v variants.vcf.gz -p prefix

Docker Local Install

A more streamlined install of introme for running locally is being developed using Docker.

Running Introme

Introme can be run using either a local installation, or Docker.

Furthermore, we have wrapped Introme in Workflow Description Language and implemented using Terra. We are currently in the process of implementing Introme using CAVATICA, which uses the SevenBridges Genomics platform.

Required parameters

g Input GTF file (ideally gencode)
p Output file prefix
r Reference genome
v Input VCF file

Optional parameters

a Genome assembly (can be inferred from genome build if in the file name)
b Input BED file (i.e. regions of interest)
f Score all variants ≤ a specified variant allele frequency
q Score all variants regardless of quality score
s Turn off Introme single score check

Examples

Run Introme with base parameters: ./run_introme.sh -r genome.fa -g annotation.gtf -v variants.vcf.gz -p prefix

Run Introme on a specified gene list (BED format) for variants below 0.1% allele frequency: ./run_introme.sh -r genome.fa -g annotation.gtf -v variants.vcf.gz -p prefix -b genelist.bed -f 0.001

Interpreting Introme Results

The variant-level scores and supporting information are then fed into the Introme decision tree model to classify the likelihood of a variant altering splicing, which produces an Introme score from 0–1. We recommend the use of 0.61 as a threshold, producing a sensitivity of 0.91 and a specificity of 0.91, calculated on the validation dataset. When high specificity is required, a threshold of 0.83 results in a sensitivity of 0.8 and a specificity of 0.975.

We are working on implementing automatic interpretation for the outcome of the splice-altering variant. Until this feature is in place, all of the input scores which make up Introme's final prediction are included in the final .tsv file if further information on the variant prediction is required.

Reference Genome versions

Introme currently supports VCF files aligned to the both GRCh37 and GRCh38 reference genomes. Please specify using -a 'hg19/hg38' if your reference genome is not specified in the name of the fasta file.

Funding

The development of Introme has been supported grants, fellowships and scholarships provided by:

Luminesce Alliance
Cancer Australia and My Room
NHMRC
NSW Health
Australian Government Research Training Program
The Kids Cancer Alliance
Petre Foundation
Fulbright Future Scholarship

Development

Introme was initially developed by Dr. Mark Cowley, Dr. Velimir Gayevskiy and Dr. Sarah Beecroft at the Garvan Institute's Kinghorn Centre for Clinical Genomics, and the initial implementation can be found at KCCG's Introme Repository.

Introme has since been adapted and reimplemented by Patricia Sullivan, Dr. Mark Cowley and Dr. Mark Pinese at the Children's Cancer Institute. This version extends on KCCG's Introme in terms of accuracy, the addition of mulitple splice-scoring tools, and the use of machine learning.

Support

For additional questions or assistance using Introme, contact psullivan@ccia.org.au (Patricia Sullivan).

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
MMSplice		MMSplice
MaxEntScan		MaxEntScan
Spliceogen		Spliceogen
annotations		annotations
input		input
models		models
wdl_scripts		wdl_scripts
wget_scripts		wget_scripts
AG_check.sh		AG_check.sh
ESE_ESS_scoring.py		ESE_ESS_scoring.py
LICENSE		LICENSE
MNV.sh		MNV.sh
NOTICE		NOTICE
README.md		README.md
consensus_scoring.R		consensus_scoring.R
extractSpliceAI.sh		extractSpliceAI.sh
run_introme.sh		run_introme.sh
run_mmsplice.py		run_mmsplice.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introme

Licensing

Requirements

Software requirements

Variant Annotation file requirements

Additional file requirements

Installation

WDL Install

Local Install

Docker Local Install

Running Introme

Required parameters

Optional parameters

Examples

Interpreting Introme Results

Reference Genome versions

Funding

Development

Support

About

Releases 1

Packages

Contributors 3

Languages

License

CCICB/introme

Folders and files

Latest commit

History

Repository files navigation

Introme

Licensing

Requirements

Software requirements

Variant Annotation file requirements

Additional file requirements

Installation

WDL Install

Local Install

Docker Local Install

Running Introme

Required parameters

Optional parameters

Examples

Interpreting Introme Results

Reference Genome versions

Funding

Development

Support

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages