Skip to content

Proteus is a Random Forrest based classifier to predict the 'disorder-to-order' transitioning binding residues (called 'protean') given the protein sequence

Notifications You must be signed in to change notification settings

bjornwallner/proteus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Proteus

  • Requires python and the machine learning package 'scikit-learn'
  • A slightly modified version of DISOPRED3.0 is distributed with this package
  • 'scikit-learn' may require additional packages to be installed
  • Also requires the uniref90.fasta database and its associated files in the folder DB
  • download the updated-most version of 'uniref90.fasta' file (sequence database) from the web (http://www.ebi.ac.uk/uniprot/database/download.html)
  • do a format database on it (formatdb) to generate the associated files
  • And empty DB directory is provided with the installation

Installation Notes for scikit-learn

This tutorial requires the following packages:

Installation of scikit-learn in Ubuntu 14.04

sudo apt-get install python-sklearn
sudo apt-get update sudo apt-get install build-essential python-dev python-setuptools python-numpy python-scipy libatlas-dev libatlas3gf-base pip install --user --install-option="--prefix=" -U scikit-learn

Installing Proteus

$ git clone https://github.com/bjornwallner/proteus
$ cd proteus
$ chmod +x proteus/run_proteus.py

The program has just one inputs

    1. A fasta file containing a single amino acid sequence in fasta format
Run Step:
$ ./proteus/run_proteus.py <basename.fasta>

EXAMPLE OUTPUT:

$ cat basename.seq.proteus
#Proteus v1.1
#pos res pred prob
1 M 1 0.518
2 R 1 0.561
3 V 0 0.439
4 K 0 0.416
5 E 0 0.438
6 I 0 0.439
7 R 0 0.392
8 K 0 0.427
9 N 0 0.405
10 Y 0 0.400

and a graphical representation (.png) of the same (Protean segment prediction score vs. Residue)

Example output graph

About

Proteus is a Random Forrest based classifier to predict the 'disorder-to-order' transitioning binding residues (called 'protean') given the protein sequence

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published