In-silico method for pathogenic classification of genetic variants
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


An improved in-silico method for pathogenic classification of missense genetic variants.


For inputs of CADD v1.0 (C), RVIS (R), and PROVEAN (P), calculate the CLASSIC metric as:

f(C,R,P) = 0.55141463C - 0.44124492R - 0.2671904P - 1.94571548

If more than one PROVEAN score is available, due to multiple protein isoforms, utilise the mean.


CLASSIC performs significantly better than CADD when comparing AUC (p=1.97e-57; two-tailed, paired t-test). Comparison methods are implemented in the files described below, and outlined in the CLASSIC manuscript.


CLASSIC.ipynb and are identical in contents except that the former is a Python notebook whereas the latter is a raw Python file. They demonstrate the working to obtain the above equation. Developed and tested with Python v3.4.0.

all-scores.tsv is a subset of the variants supplied in the description of CADD (doi: [10.1038/ng.2892][]). The first column contains binary classification where 1 = pathogenic. Columns 2-4 contain CADD, RVIS, and mean PROVEAN scores respectively.