Skip to content

Persistent Tor-algebra based stacking ensemble learning

License

Notifications You must be signed in to change notification settings

LiuXiangMath/PTA-SEL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

Persistent Tor-algebra for protein-protein interaction analysis

This manual is for the code implementation of paper "Persistent Tor-algebra for protein-protein interaction analysis"

Software configuration


    Platform: Python>=3.6
    Packages needed: math, numpy>=1.18.1, scipy>=1.4.1, scikit-learn>=0.22.1, gudhi, 

Persistent Tor-algebra based machine learning model

folder structure

Details about each step

We take data SKEMPI S1131 as an example to illustrate the procedure of our model.

Data

We need to get the atom coordinates from the protein-protein complexes. This can be done from the code of TopNetTree 10.24433/CO.0537487.v1.

Algebraic representation and persistent Tor-algebra featurization

For each protein-protein complex, the element-specific atom combinations are used to generated the Vietoris-Rips complex, and persistent Tor-algebra are computed from these simplicial complexes. You can directly run the script "code/skempi-tor-feature-generation.py" to get the persistent Tor-algebra features for data SKEMPI S1131. The auxiliary features can be generated by the code from https://codeocean.com/capsule/2202829/tree/v1

Machine learning

We use ensemble learning to do the prediction. More specifically, we have two base learners, 1D CNN and GBT, and a meta learner, GBT. For two base learners, you can run "code/skempi-cnn.py" and "code/skempi-gbt.py" to generate the base learner features. For meta learner, you can run "code/tenfold-CV.py" to get the final prediction.

For new dataset

To use our persistent Tor-algebra model, you need to firstly generate the 3d-coordinates of your point cloud data. Then, you can use the following function to generate the persistent Tor-algebra feature

def get_tor_algebra_I_J(point,J,outfile,typ):
    # generate persistent Tor-algebra from point cloud data.
    # point: 3-D coordinate of the point cloud data
    # J: type of Tor-algebra, you can choose : 2,3,4,5,98,99,100. (Yo can add more by revising our code)
    # outfile: filepath of output file
    # typ: 0

The above function can be found in "code/skempi-tor-algebra-feature-generation.py". Our persistent Tor-algebra also can be computed from graph data, distance matrix data. You can contact me if you are interested.

About

Persistent Tor-algebra based stacking ensemble learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages