Classifies genes as an oncogene, tumor suppressor gene, or as a non-driver gene by using Random Forests
Python Shell
Clone or download

README.md

20/20+

About

Next-generation DNA sequencing of the exome has detected hundreds of thousands of small somatic variants (SSV) in cancer. However, distinguishing genes containing driving mutations rather than simply passenger SSVs from a cohort sequenced cancer samples requires sophisticated computational approaches. 20/20+ integrates many features indicative of positive selection to predict oncogenes and tumor suppressor genes from small somatic variants. The features capture mutational clustering, conservation, mutation in silico pathogenicity scores, mutation consequence types, protein interaction network connectivity, and other covariates (e.g. replication timing). Contrary to methods based on mutation rate, 20/20+ uses ratiometric features of mutations by normalizing for the total number of mutations in a gene. This decouples the genes from gene-level differences in background mutation rate. 20/20+ uses monte carlo simulations to evaluate the significance of random forest scores based on an estimated p-value from an empirical null distribution.

Documentation

Documentation Status

Please see the documentation on readthedocs.

Releases

You can download releases on github.

Installation

Build Status

Because 20/20+ internally uses the random forest package in R, you will both need R and the randomForest library installed. Once R is installed, you can install the random forest package:

> install.packages("randomForest")

If you do not have permission to install randomForest on the system wide R, you can install in your local user directory by creating an ~/.Renviron file as the following:

R_LIBS=~/Rlibs

Where, in this case, the R libraries will be installed in the ~/Rlibs directory.

20/20+ also requires the following python packages:

  • numpy
  • scipy
  • pandas>=0.17.0
  • scikit-learn
  • rpy2
  • probabilistic2020

To install these packages via pip you can use the following command:

$ pip install -r requirements.txt

If you want the exact version 20/20+ was tested on use the requirements_dev.txt file and python 2.7. The probabilistic2020 python package is used to generate features for 20/20+ from mutations in MAF format.