Convolutional neural network analysis for predicting DNA sequence activity.
Switch branches/tags
Clone or download
Latest commit d27e29a Oct 3, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data training tutorial Oct 14, 2015
docs image May 5, 2016
src ACGT order Oct 3, 2018
tutorials Adds note to tutorial Jul 5, 2017
.gitignore ACGT order Oct 3, 2018
LICENSE Initial commit Sep 30, 2015
README.md LUA_PATH clean up Dec 10, 2016
install_data.py use curl wo wget May 24, 2016
install_dependencies.py my torch-hdf5 fork Mar 19, 2016

README.md

Basset

Deep convolutional neural networks for DNA sequence analysis.

Basset provides researchers with tools to:

  1. Train deep convolutional neural networks to learn highly accurate models of DNA sequence activity such as accessibility (via DNaseI-seq or ATAC-seq), protein binding (via ChIP-seq), and chromatin state.
  2. Interpret the principles learned by the model.

Installation

Basset has a few dependencies because it uses both Torch7 and Python and takes advantage of a variety of packages available for both.

First, I recommend installing Torch7 from here. If you plan on training models on a GPU, make sure that you have CUDA installed and Torch should find it.

For the Python dependencies, I highly recommend the Anaconda distribution. The only library missing is pysam, which you can install through Anaconda or manually from here. You'll also need bedtools for data preprocessing. If you don't want to use Anaconda, check out the full list of dependencies here.

Basset relies on the environmental variable BASSETDIR to orient itself. In your startup script (e.g. .bashrc), write

    export BASSETDIR=the/dir/where/basset/is/installed

To make the code available for use in any directory, also write

    export PATH=$BASSETDIR/src:$PATH
    export PYTHONPATH=$BASSETDIR/src:$PYTHONPATH
    export LUA_PATH="$BASSETDIR/src/?.lua;$LUA_PATH"

To download and install the remaining dependencies, run

    ./install_dependencies.py

Alternatively, Dr. Lee Zamparo generously volunteered his Docker image.

To download and install additional useful data, like my best pre-trained model and public datasets, run

    ./install_data.py

Documentation

Basset is under active development, so don't hesitate to ask for clarifications or additional features, documentation, or tutorials.


Tutorials

These are a work in progress, so forgive incompleteness for the moment. If there's a task that you're interested in that I haven't included, feel free to post it as an Issue at the top.