Skip to content

Comparison between different splice prediction software


Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



1 Commit

Repository files navigation

Project Links

This repository is part of the CI-SpliceAI software package published in PLOS One.

This is the project comparing different splice prediction tools on variant data. You may also be interested in the code to train CI-SpliceAI, code to use trained models to annotate variants offline, and the website providing online annotation of variants.


In this project, we are evaluating 6 different splice prediction tools (one of which is ours called CI-SpliceAI) on a corpus of:

  • 1,317 variants for a binary affecting/non-affecting task; and
  • 388 variants (subset of the first corpus) with annotations of their exact variant effect

This repository contains all variants and all code to re-produce the results obtained.

Variant Data

Visualisations of the variants:

Pie diagrams of the data Distance from a variant to its closest splice site


Optimal Thresholds, PR-AUC, PR-ROC, and optimal Accuracy

Algorithm Coverage AUC-PR AUC-ROC Optimal Threshold Accuracy
MES (Sliding) 100% 55.68% 52.97% 12.5 53.42%
SQUIRLS 100% 91.32% 91.17% 0.074 85.64%
MES (VEP) 58% 92.52% 89.15% 2.109 86.40%
MMSplice (Splicing Efficiency) 99% 93.03% 92.56% 1.119 87.23%
MMSplice (Pathogenicity) 99% 94.13% 92.84% 0.961 88.53%
SpliceAI 99% 96.21% 95.65% 0.3 90.88%
CI-SpliceAI 100% 97.25% 96.75% 0.19 92.17%

PR-Curves of all algorithms; CI-SpliceAI is superior to the rest

Predictive error between CI-SpliceAI and SpliceAI

Predictive error bettered in the majority of data points

Exact variant effect prediction accuracy

Algorithm Acceptor Gain Acceptor Loss Donor Gain Donor Loss
MES (Sliding) 0.00% 1.16% 2.33% 2.25%
SpliceAI 87.50% 77.10% 79.07% 78.93%
CI-SpliceAI 93.75% 78.55% 79.07% 82.02%

CI-SpliceAI Mispredictions

Predictive error bettered in the majority of data points


These steps were taken:


The variant csv file was parsed into vcf format and normalised (index, normalise rows, align left).

The resulting vcf file is checked in this repository, so you don't need to run the code producing it.

Running tools

We ran all tools on the vcf file using

Results are checked into predictions/.


Variant data and predictions were analysed and plotted using into analysis/.


This project is built on bash scripts. We suggest running it on a UNIX system; it might be possible to run it on windows using a bash environment like git bash, this is however untested and unsupported.

Before running the setup code, make sure you agree to all licences of third-party components.

Please make sure to install these manual dependencies first:

Then run which will automatically:

  • Create conda environments with SpliceAI, CI-SpliceAI and MMSplice (through kipoi)
  • Download all third party elements like:
    • SQUIRLS command line, jannovar annotations, database
    • the human reference genome
    • GENCODE annotations for MMSplice
  • Pre-process GENCODE annotations for MMSplice


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

By running this code, you are installing third-party software. It is your responsibility to assure that you are following all third party licenses.


Comparison between different splice prediction software







No releases published


No packages published