This manual is for the code implementation of paper "Persistent Tor-algebra for protein-protein interaction analysis"
Platform: Python>=3.6
Packages needed: math, numpy>=1.18.1, scipy>=1.4.1, scikit-learn>=0.22.1, gudhi,
We take data SKEMPI S1131 as an example to illustrate the procedure of our model.
We need to get the atom coordinates from the protein-protein complexes. This can be done from the code of TopNetTree 10.24433/CO.0537487.v1.
For each protein-protein complex, the element-specific atom combinations are used to generated the Vietoris-Rips complex, and persistent Tor-algebra are computed from these simplicial complexes. You can directly run the script "code/skempi-tor-feature-generation.py" to get the persistent Tor-algebra features for data SKEMPI S1131. The auxiliary features can be generated by the code from https://codeocean.com/capsule/2202829/tree/v1
We use ensemble learning to do the prediction. More specifically, we have two base learners, 1D CNN and GBT, and a meta learner, GBT. For two base learners, you can run "code/skempi-cnn.py" and "code/skempi-gbt.py" to generate the base learner features. For meta learner, you can run "code/tenfold-CV.py" to get the final prediction.
To use our persistent Tor-algebra model, you need to firstly generate the 3d-coordinates of your point cloud data. Then, you can use the following function to generate the persistent Tor-algebra feature
def get_tor_algebra_I_J(point,J,outfile,typ):
# generate persistent Tor-algebra from point cloud data.
# point: 3-D coordinate of the point cloud data
# J: type of Tor-algebra, you can choose : 2,3,4,5,98,99,100. (Yo can add more by revising our code)
# outfile: filepath of output file
# typ: 0
The above function can be found in "code/skempi-tor-algebra-feature-generation.py". Our persistent Tor-algebra also can be computed from graph data, distance matrix data. You can contact me if you are interested.