# Graph Neural Networks to Predict Protein Isoelectric Points

### This notebook will become available on github as soon as my preprint [1] is available online, here: <a href="">TB 2022</a>. 
### This notebook presents the code used to train graph neural networks on prediction of protein isoelectric points, using experimental datasets curated and made available within the public domain by L. P. Kozlowski <a href="www.ipc2-isoelectric-point.org">(see IPC 2) [2]</a>. As Spektral [3] is used in conjunction with Tensorflow and Keras, we need to import those (and several other) libraries; this notebook is Google Colab–ready, i.e., I provide commands such as the next cell to enable working on Colab. The files provided must obviously be uploaded to the content (root) directory, though.



***

In [None]:
#Install Spektral if working on Google Colab
!pip install spektral
#Let's install Bio as well. just because...
!pip install Bio
from Bio import SeqIO

In [None]:
#Import everything else
import numpy as np
import re
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf 
keras = tf.keras 
import os
import math
import scipy.sparse as spar
import spektral as spk
from spektral.data import Dataset, DisjointLoader, Graph, SingleLoader
from spektral.transforms.normalize_adj import NormalizeAdj as NormAdj
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import KFold
from tensorflow.keras.losses import MeanSquaredError, MeanAbsoluteError, Huber
import csv


In [None]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

***

# SNIPPITY SNIP: The body of this notebook will follow once my preprint is available online.

***

### If everything goes to plan, you should get RMSE values of about 0.87–0.88 and 0.27–0.28 on the protein and peptide test sets. Now let's predict a pI for a new AA sequence. Feel free to use my 'original_weights'. 

***

In [None]:
Your_Pro = [''] #copy and paste AA sequence; please keep it to a single string
#alternatively, read in a FASTA file below
Seq, Nod, Nus, Labs, computed_pI = prep_dataset(IsAcid,pKa_List,AA_List, AA_List_Full,'/content/filename.fasta')
your_dataset = MyDataset(Nus,n_AA=21,Nodes=Nod,sequences=Seq,labels=Labs,
                    col21_28=col21_28,Just_Descriptors=False,
                    transforms=NormAdj())
your_loader = DisjointLoader(your_dataset)
your_model = GIN(channels=8,dropout_rate=.15,n_layers=4)
your_model.load_weights('original_weights')
_, B = evaluate(your_loader,your_model)
MM = computed_pI + B
print(f'The predicted pI is {np.squeeze(MM)}')

### References

#### 1. NOT ONLINE YET
#### 2. <a href = http://ipc2.mimuw.edu.pl/>Kozlowski 2021 </a>
#### 3. <a href = https://github.com/danielegrattarola/spektral/> Spektral </a>
#### 4. <a href = https://www.wiley.com/en-ie/Solomons%27+Organic+Chemistry,+12th+Edition,+Global+Edition-p-9781119248972> Solomons, Fryhle, & Snyder 2017 </a>
#### 5. <a href = https://journals.aps.org/pre/abstract/10.1103/PhysRevE.75.011920>Moret & Zebende 2007 </a>
#### 6. <a href = https://pubmed.ncbi.nlm.nih.gov/3209351/>Fauchere, Charton, Kier, Verloop & Pliska 1988 </a>
#### 7. <a href = https://www.pnas.org/doi/10.1073/pnas.96.22.12524> Koehl & Levitt, 1999 </a>

***