GOR is an algorithm for predicting secondary structure from an amino acid sequence. It is described in GOR Method for Predicting Protein Secondary Structure from Amino Acid Sequence.
The GOR IV code in this directory is based on the C code available from http://mig.jouy.inra.fr/?q=en/node/85.
The original C code uses 1-based array offsets. I have not attempted to change that, or to otherwise clean up the code, tempting as it has been.
I used cffi to make a Python interface to GOR IV.
Usage is as follows:
from gor4 import GOR4
gor4 = GOR4()
result = gor4.predict('DKATIPSESPFAAAEVADGAIVVDIAKMKYETP')
print('Predicted secondary structure', result['predictions'])
print('Prediction probabilities', result['probabilities'])
For detailed usage examples, see the tests in test/test_gor4.py
.
Notes:
-
The initialization of a
GOR4
instance is very slow (due to reading large files of AA structure infomation fromgor4/data
), so you will likely want to make a singleton instance and just use that. -
It is possible to pass alternate known sequence and secondary structure files to
GOR4.__init__
but this is not done in any of the test examples.
The files in the src
directory are as follows:
api.c
: An API library suitable for calling via Cython. This has 3 functions,initialize
,predict
andfinalize
. Initialize returns astruct
containing pointers tomalloc
d memory that isfree
d infinalize
. To get a secondary structure prediction,predict
must be called.build.py
: Python to tellcffi
how to build the module, and what functions are available.gor4-base.c
: A very slightly modified copy of the originalgor.c
file. The modification is that some#define
constants have been moved intogor4-base.h
so they can also be used inapi.c
.gor4-base.h
: Some#define
constants that used to be ingor4-base.c
, plus the definition of thestruct
returned byinitialize
.nrutil.{c,h}
: The original files from the GOR IV distribution.