Interactions between proteins and ions are essential for the proteins to carry out various biological functions like structural stability, metabolism, signal transport, etc. As more than half of all proteins bind to ions, it becomes necessary to identify ion-binding sites. This helps to understand their biological functions and is also very useful in drug discovery studies. While several computational approaches have been proposed, this remains a difficult problem due to the small size and high versatility of the metal and acid radical. In this study, we propose IonPred, a sequence-based approach using ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) which is based on replacement token detection of amino acid residues from protein sequences. This model is designed to predict 9 metal ions (Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, and K+) and 4 acid radical ion ligands (CO32−, SO32−, PO43−, NO2−).
- Python 3.6
- TensorFlow 1.15
- NumPy
- scikit-learn and SciPy
- Pandas
The input for this tool consists of raw protein sequences in fasta format. While the output consists of probability scores for each candidate site.
- The threshold used is 0.5. So candidate residues that have a probability >= 0.5 are considered to be ion-binding sites.
- Data sets used to be used to run prediction must be placed in the directory called
test
- While the results for each residue binding site would be found in the directory
results
. In theresults
directory, the predictions would be saved in a directory labeled with the corresponding ion name. - A batch size of 128 was used while running predictions but this parameter can be modified.
For example to predict Zinc binding site i.e. Zn2+, run the command:
python3 predict.py -input test/zn.fasta -ion-type ZN
For guidance on other parameters, run:
python3 predict.py -help