Skip to content

Commit

Permalink
Add Expecto docs
Browse files Browse the repository at this point in the history
  • Loading branch information
aaronkw committed Jan 3, 2019
1 parent cc1405e commit c5197fa
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 5 deletions.
14 changes: 9 additions & 5 deletions docs/citations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,18 @@
Citations
====================

Tissue-specific networks
~~~~~~~~~~~~~~~~~~~~
Tissue-specific networks, NetWAS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Greene CS*, Krishnan A*, Wong AK*, Ricciotti E, Zelaya RA, Himmelstein DS, Zhang R, Hartmann BM, Zaslavsky E, Sealfon SC, Chasman DI, FitzGerald GA, Dolinski K, Grosser T, Troyanskaya OG. (2015). `Understanding multicellular function and disease with human tissue-specific networks <http://www.nature.com/ng/journal/v47/n6/full/ng.3259.html>`_. Nature Genetics. 10.1038/ng.3259w.

Variant effect predictions (ExPecto)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Zhou J, Theesfeld CL, Yao K, Chen KM, Wong AK, and Troyanskaya OG. (2018) Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk, Nature Genetics.

Autism gene predictions
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~
Krishnan A*, Zhang R*, Yao V, Theesfeld CL, Wong AK, Tadych A, Volfovsky N, Packer A, Lash A, Troyanskaya OG.(2016) Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nature Neuroscience.

Tissue gene expression predictions
~~~~~~~~~~~~~~~~~~~~
Tissue-expression predictions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
\W. Ju#, C.S. Greene#, F. Eichinger, V. Nair, J.B. Hodgin, M. Bitzer, Y. Lee, Q. Zhu, M. Kehata, M. Li, M.P. Rastaldi, C.D. Cohen, O.G. Troyanskaya*, and M. Kretzler*. Defining cell type specificity at the transcriptional level in human disease. Genome Research. 23:1862-1873. 2013
33 changes: 33 additions & 0 deletions docs/expecto.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
=======
ExPecto
=======

Introduction
------------
ExPecto is a framework for ab initio sequence-based prediction of mutation gene expression effects and disease risks. With this web interface, we provide an explorer of tissue-specific expression effect predictions. The current release contains all single nucleotide substitutions within 1kb to the representative TSS of a gene and all 1000 Genomes variants that passed a minimum predicted effect threshold (>0.3 log fold-change in any tissue).

The code for predicting expression effects for human genome variants and training new expression models is available at this `github repository <https://github.com/FunctionLab/ExPecto>`_.

The ExPecto framework is described in the following manuscript:

Jian Zhou, Chandra L. Theesfeld, Kevin Yao, Kathleen M. Chen, Aaron K. Wong, and Olga G. Troyanskaya, Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk, Nature Genetics, 2018

Download
--------
Predicted expression effects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is the bulk download `link <http://deepsea.princeton.edu/media/code/expecto/combined_snps.0.3.zip>`_ of all mutation predictions.

Variation potential directionality scores
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Variation potential of a gene in a tissue or cell-type can reflect the evolutionary constraint on its expression level. Specifically, we compute the variation potential directionality score as the sum of all directional mutation effects within 1kb to TSS. A negative variation potential indicates active expression and constraint toward higher expression level, and vice versa. The sum of absolute mutation effects, or the magnitudes, is predictive of tissue/condition-specificity of a gene. The variation potential directionality scores and the inferred evolution constraint probabilities can be downloaded here.

The full prediction of all 140 million mutations can be downloaded `here <http://deepsea.princeton.edu/media/code/expecto/all1kbmutations.tar>`_ (~125G).

Method Details
--------------
ExPecto uses exponential basis function-based linear models upon deep convolutional network model of chromatin effects. ExPecto predicts expression levels directly from sequence and is capable of predicting effects of sequence variations.

For detailed procedures of the prediction, the chromatin predictions were computed from DeepSEA "Beluga" per 200bp bin, and 200 bins centered at TSS (40kb region) were used as input to predict expression effects. To reduce the dimensionality for ExPecto model training, the predicted chromatin spatial patterns were summarized to spatial features by 10 exponential basis functions. The summarized spatial features and gene expression levels were used to train regularized linear models for the final step of the prediction. The representative TSSes are selected based on FANTOM CAGE data.

We also propose a path toward ab initio disease risk prediction through combining the prediction of expression effects and the estimation of evolution constraints on expression levels. For example, mutations predicted to have strong negative expression effects on a positively constrained gene are predicted to be deleterious. We estimate evolutionary constraints through systematic profiling of potential mutation effects through in silico mutagenesis. As proof-of-principle we showed that this approach can predict the disease alleles from both curated HGMD disease mutation data and disease GWASes.

0 comments on commit c5197fa

Please sign in to comment.