# Geometric Deep Learning for Protein Structure Data

## Objectives
To develop a code-base for exploring, training and evaluating graph deep learning models using protein structures as input and residue-level prediction outputs.


## Task and Dataset
Given a protein structure, predict for each residue whether or not it belongs to a protein-protein interface. The dataset (in `data/interface_labels.txt`) is extracted from the [MaSIF](https://www.nature.com/articles/s41592-019-0666-6) paper. Each line is a PDB ID and two chain IDs of chains which are in contact. We'll use these to extract residues at the interface.

## Quick biological background
There are different structural and thermodynamic aspects that drive protein-protein interactions.  
In thermodynamics, the spontaneous binding is assessed by calculating the change in Gibbs free energy (ΔG). This binding free energy can be parsed into enthalpic and entropic contributions with the fundamental equation: ΔG = ΔH − TΔS. 
There are multiple parameters, which we need to take into account, when analyzing protein-protein binding:
1. Residue-residue and residue-solvent interactions (contribution of enthalpy)  
2. Loss of translational and rotational freedom of proteins upon complex formation (contribution of enthropy)  
3. Conformational changes of proteins upon binding (contribution of enthropy)
4. Desolvation and release of solvent molecules upon binding (contribution of enthropy)
...

<div style="text-align: center; margin-right: 0; margin-left: auto; margin-right: auto;">
    <img src="https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fpj.2014.33/MediaObjects/41428_2014_Article_BFpj201433_Fig1_HTML.jpg?as=webp" style="width: 500px;"/>
</div>

## Steps
1. Data Preparation
    - Constructing graphs from protein structures (Notebook 1_1)
    - Exploring pytorch geometric graphs and constructing dataset (Notebook 1_2)
    - Load and batch data for training (Notebook 1_3)
2. Model Development
    - Explore off-the-shelf graph-based models (Notebook 2_1)
    - Add custom layers and losses (Notebook 2_2)
    - Implement training and validation (Notebook 2_2)
3. Model Evaluation
    - Log, visualise and track model performance (Notebook 2_1 and Notebook 2_2)
    - Save and load trained models and checkpoints 
    - Make and save predictions
    - Constructing configuration files and run compare models with different parameters (Notebook 2_3)