Skip to content

We present Envision, an accurate predictor of protein variant molecular effect, trained using large-scale experimental mutagenesis data. All data and software in this study are freely available. The training data set and all code used to train the models and generate the figures presented in this manuscript are available here. Envision predictio…

Notifications You must be signed in to change notification settings

FowlerLab/Envision2017

Repository files navigation

Our code is separated into five Jupyter Notebook files (.ipynb) and one
R Markown file.


The Jupyter Notebooks contain the following:
------------------------------------------------------------------------
+ singleProteinModels.ipynb -- code for tuning hyperparameters and
training models using the 8 protein data sets individually.

+ envisionTuneTrainPredict.ipynb -- code to tune hyperparameters and
train Envision with all eight data sets

+ LOPOTuneTrain.ipynb -- train each leave-one-protein-out
(LOPO) model to predict the protein data set not used in training.

+ LOPO_10xCV.ipynb -- tune using tenfold cross-validation, train each leave-one-protein-out
(LOPO) model to predict the protein data set not used in training.

+ LOPO_predict_missingFeatureMuts.ipynb -- use each leave-one-protein-out
(LOPO) model to predict the protein data set not used in training with missing features. 

+ LOPO_unnormalized.ipynb -- train each leave-one-protein-out
(LOPO) model with unnormalized data and then predict protein data sets not used in training.

+ downSamplingAnalysis.ipynb -- code to sample 6, 4,and 2 proteins
as training data for model training

+ Clinvar_analysis.ipynb -- use Envision to predict Clinvar mutations
_______________________________________________________________________



The R Markdown contains the following:
---------------------------------------------------------------------

+ envision_figure_code.Rmd -- code for generating manuscript figures. 
---------------------------------------------------------------------

Notes:
 - All necessary data files can be found in /data directory.

 - Graphlab and Python dependencies (e.g. Numpy) are required to
 successfully run all .ipynb code. 

 - All code will be deposited in a public GitHub repository upon publication
 

About

We present Envision, an accurate predictor of protein variant molecular effect, trained using large-scale experimental mutagenesis data. All data and software in this study are freely available. The training data set and all code used to train the models and generate the figures presented in this manuscript are available here. Envision predictio…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published