A Deep Learning Framework for Prediction of Ubiquitination Sites in Proteins
- Python>=3.6
- Matlab2016a
- Tensorflow =1.6.0
The conventional feature representation of amino acid composition used 20 binary bits to represent an amino acid. To deal with the problem of sliding windows spanning out of N-terminal or C-terminal, one additional bit is appended to indicate this situation.
In all PTM sites prediction, physicochemical properties are essential to extract the instinct information for a fragment or protein. The value of main effect difference (MED) was used to estimate the individual effects of physicochemical properties and the property with the largest value of MED is the most effective in predicting ubiquitylation sites.
The CKSAAP encoding scheme reflects the information of amino acid pairs in small range within the peptides.
Pseudo amino acid composition is a set of discrete serial correlation factors combined with traditional 20 amino acids component.
We constructed a convolutional neural network (CNN) as below: