Skip to content
Deep Learning to detect epistasis in the human genome.
Python Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
docs
src
tests
.gitignore
README.md

README.md

EpistasisNet

Deep learning to detect gene-gene interactions

ELEN4002/4012 Project by James Allingham and Paul Cresswell. Supervised by Prof. Scott Hazelhurst.

Prerequisites:

  • Python 2.7 or higher
  • NumPy
  • TensorFlow

Optional (GPU support):

  • NVIDIA CUDA 7.5
  • cudNN 5.1

Usage

EpistasisNet can be run from the command line using either the python or python3 commands.

There are a number of command line options which can be specified as shown bellow:

Flag Default Value Description
-file_in Input data file location
-tt_ratio 0.8 test:train ratio
-max_steps 1000 Maximum steps
-train_batch_size 100 Training batch size
-test_batch_size 1000 Testing batch size
-log_dir /tmp/logs/runx Directory for storing data
-learning_rate 0.001 Initial Learning rate
-dropout 0.5 Keep probability for training dropout
-model_dir /tmp/tf_models/ Directory for storing the saved models
-write_binary True Write the processed numpy array to a binary file
-read_binary True Read a binary file rather than a text file
-save_model True Save the best model as the training progresses

EpistasisNet expects input text files to be in the format provided by GAMETES. Note that the text files can be written to binary files by specifying the write_binary flag to be True.

Files

The files for EpistasisNet are:

Directory File Description
data convert_from_BEAM_format.py Converts data in the format used by the BEAM tool to the GAMETES format
data convert_to_BEAM_format.py Converts data in the GAMETES format to the BEAM format
docs style_guide.html Google's Python style guide
docs MeetingMinutes/*.pdf Minutes for various meetings held during the course of the projects
src GPU_off.sh A shell script that turns off GPU usage for EpistasisNet (as well as other CUDA applications)
src GPU_on.sh A shell script that turns on GPU usage for EpistasisNet (as well as other CUDA applications)
src convolutional_model.py Module that supplies a convolutional model with pooling to test for epistasis on a GAMETES dataset
src data_batcher.py Module that provides a single class: DataLoader, which manages reading of raw data and formatting is appropriately
src data_holder.py Module that provides a single class: DataHolder, which manages reading of input files and storage of various data sets
src data_loader.py Module that provides a single class: DataLoader, which manages reading of raw data and formatting appropriately
src linear_model.py Module that supplies a convolutional model with pooling to test for epistasis on a GAMETES dataset
src model.py Module that supplies a Model class which can be inherited from when creating models representing TensorFlow graphs
src nonlinear_model.py Module that supplies a fully connected model with nonlinearities to test for epistasis on a GAMETES dataset
src pool_conv_model.py Module that supplies a convolutional model with pooling to test for epistasis on a GAMETES dataset
src recurrent_model.py Module that supplies a recurrent model with additional fully connected layers to test for epistasis on a GAMETES dataset
src run_model.py Module that trains a TensorFlow model
src scaling_model Module that supplies a convolutional model with pooling to test for epistasis on a GAMETES dataset - Best Model
src utilities.py Module that provides a number of wrapper functions for TensorFlow
tests test_data_batcher.py Module that provides test cases for the DataBatcher class
tests test_data_holder.py Module that provides test cases for the DataHolder class
tests test_data_loader.py Module that provides test cases for the DataLoader class
tests test_utilities.py Module provides test cases for the utilities functions for building Tensorflow graphs
You can’t perform that action at this time.