Sequential regulatory activity predictions with deep convolutional neural networks.
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
basenji normalized scores via emerald ld Nov 22, 2018
bin normalized scores via emerald ld Nov 22, 2018
docs Fix paragraph. Jul 28, 2017
manuscript no renorm Jan 11, 2018
testdata Import of basenji from Google internal development. May 11, 2018
tests sat mut tf.data from bed Aug 27, 2018
tutorials path fixes Jun 5, 2018
.gitignore ignore mac files Jan 13, 2018
LICENSE apache license Jul 7, 2017
README.md set environmental variables Jun 5, 2018
setup.py fix setup conflict May 27, 2018

README.md

Basenji

Sequential regulatory activity predictions with deep convolutional neural networks.

Basenji provides researchers with tools to:

  1. Train deep convolutional neural networks to predict regulatory activity along very long chromosome-scale DNA sequences
  2. Score variants according to their predicted influence on regulatory activity across the sequence and/or for specific genes.
  3. Annotate the distal regulatory elements that influence gene activity.
  4. Annotate the specific nucleotides that drive regulatory element function.

Basset successor

This codebase offers numerous improvements and generalizations to its predecessor Basset, and I'll be using it for all of my ongoing work. Here are the salient changes.

  1. Basenji makes predictions in bins across the sequences you provide. You could replicate Basset's peak classification by simply providing smaller sequences and binning the target for the entire sequence.
  2. Basenji intends to predict quantitative signal using regression loss functions, rather than binary signal using classification loss functions.
  3. Basenji is built on TensorFlow, which offers myriad benefits, including distributed computing and a large and adaptive developer community.

Installation

Basenji was developed with Python3 and a variety of scientific computing dependencies, which you can see within the setup.py file. I highly recommend the Anaconda python distribution, which contains most of them.

Once you have the dependencies, run

    python setup.py develop

Then I recommend setting the following environmental variables

  export BASENJIDIR=~/code/Basenji
  export PATH=$BASENJIDIR/bin:$PATH
  export PYTHONPATH=$BASENJIDIR/bin:$PYTHONPATH

To verify the install, launch python and run

    import basenji

Manuscript

Models and (links to) data studied in the manuscript are available in the manuscript directory.


Documentation

At this stage, Basenji is something in between personal research code and accessible software for wide use. The primary challenge is uncertainty in what the best role for this type of toolkit is going to be in functional genomics and statistical genetics. The computational requirements don't make it easy either. Thus, this package is under active development, and I encourage anyone to get in touch to relate your experience and request clarifications or additional features, documentation, or tutorials.


Tutorials

These are a work in progress, so forgive incompleteness for the moment. If there's a task that you're interested in that I haven't included, feel free to post it as an Issue at the top.