Function prediction using a deep ontology-aware classifier
Clone or download
Latest commit 8214d66 Nov 27, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore cafa3 targets Nov 24, 2016
README.md migrated to diamond Oct 11, 2018
aaindex.py local changes Jul 12, 2017
blast.py . Aug 16, 2017
cafa.py predict_all Sep 30, 2017
clustering.py . Aug 6, 2017
deeponto.py local changes Jul 12, 2017
eval_data.tar.gz renamed data Oct 10, 2018
evaluation.py evaluation Aug 21, 2017
get_data.py . Sep 21, 2017
get_data_all.py . Sep 21, 2017
get_functions.py local changes Jul 12, 2017
hierarchical.py tensorflow model Dec 12, 2016
interactions.py categorical crossentropy loss in seq model Mar 1, 2017
mapping.py mapping May 15, 2017
ngrams.py ngrams Jan 5, 2017
nn_hierarchical_all.py . Sep 21, 2017
nn_hierarchical_network.py predict_all Sep 30, 2017
nn_hierarchical_seq.py . Sep 21, 2017
plots.py readme updated Aug 23, 2017
predict.py predict_all Sep 30, 2017
predict_all.py min-score diamond Nov 27, 2018
requirements.txt requirements Oct 10, 2018
stats.py . Jul 29, 2017
text.py hierarchical network Nov 30, 2016
tf_utils.py tensorflow model Dec 12, 2016
utils.py . Sep 21, 2017
visualize.py ensuring consistent predictions Jan 4, 2017

README.md

DeepGO - Predicting Gene Ontology Functions

DeepGO is a novel method for predicting protein functions using protein sequences and protein-protein interaction (PPI) networks. It uses deep neural networks to learn sequence and PPI network features and hierarchically classifies it with GO classes. PPI network features are learned using a neuro-symbolic approach for learning knowledge graph representations by Alshahrani, et al.

This repository contains script which were used to build and train the DeepGO model together with the scripts for evaluating the model's performance.

Dependencies

To install python dependencies run: pip install -r requirements.txt

Scripts

The scripts require GeneOntology in OBO Format.

  • nn_hierarchical_seq.py - This script is used to build and train the model which uses only the sequence of protein as an input.
  • nn_hierarchical_network.py - This script is used to build and train the model which uses sequence and PPI network embeddings of protein as an input.
  • get_data.py, get_functions.py, mapping.py scripts are used to prepare raw data.
  • blast.py script is used to evaluate BLAST method's performance
  • evaluation.py script is used to evalutate the performance of the FFPred, GOFDR and our method.

Running

  • Download the data file from http://deepgo.bio2vec.net/data/data.tar.gz and extract data folder
  • Install diamond program on your system (diamond command should be available)
  • run predict_all.py script with -i <input_fasta_filename> arguments
  • See the results in results.tsv file

Data

The online version of DeepGO is available at http://deepgo.bio2vec.net/

Citation

If you use DeepGO for your research, or incorporate our learning algorithms in your work, please cite:

Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf; DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier, Bioinformatics, 2017. https://doi.org/10.1093/bioinformatics/btx624