# Steerable Graph Convolutions for Learning Protein Structures
<!-- An analysis of the paper and its key components. Think about it as nicely formatted review as you would see on OpenReview.net -->
by *Synthesized Solututions*

Machine learning is increasingly being applied to the analysis of molecules for tasks such as protein design, model quality assessment, and ablation studies. These techniques can help us better understand the structure and function of proteins, which is useful for many medical application, such as drug discovery. Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) are two types of machine learning models that are  particularly well-suited for analyzing molecular data. CNNs can operate directly on the geometry of a structure and GNNs are expressive in terms of relational reasoning.

However, proteins are complex biomolecules with a unique three-dimensional structure that is critical to their function and modeling the interactions between non-adjacent amino acids can be challenging. Both CNNs and GNNs might be translation invariant and equivariant (right?), but this is not the case for rotations.  Formally we can define equivariance as follows:

$$f(g\cdot x) = g\cdot f(x)$$

In order to keep more geometric information, Jing, Eismann, Suriana, Townshend, and Dror (2020) propose a method that combines the strengths of CNNs and GNNs to learn from biomolecular structures. Instead of encoding 3D geometry of proteins, i.e. vector features, in terms of rotation-invariant scalars, they propose that vector features be directly represented as geometric vectors in 3D space at all steps of graph propagation. They claim that this approach would improve the GNN's ability to reason geometrically and capture the spatial relationships between atoms and residues in a protein structure. 

This modification to the standard GNN consists of changing the multilayer perceptrons (MLPs) with geometric vector perceptrons (GVPs). The GVP approach described in the paper is used to learn the relationship between protein sequences and their structures. GVPs are a type of layer that operates on geometric objects, such as vectors and matrices, rather than on scalar values like most neural networks. This makes GVPs well-suited to tasks that involve analyzing spatial relationships, which is highly important for protein structures. 

In GVP-GNNs, node and edge embeddings are represented as tuples of scalar features and geometric vector features. The message and update functions are parameterized by geometric vector perceptrons, which are modules that map between the tuple representations while preserving rotational invariance. In a paper by Jing, Eismann, Soni, and Dror (2021) they extended the GVP-GNN architecture to handle atomic-level structure representations, which allows the architecture to be used for a wider range of tasks. <!-- why, idk rn -->

In the original GVP-GNN architecture, the vector outputs are functions of the vector inputs, but not the scalar inputs, which can be an issue for atomic-level structure graphs where individual atoms may not necessarily have an orientation. <!-- also don't really understand why -->
To address this issue, they propose vector gating as a way to propagate information from the scalar channels into the vector channels. This involves transforming the scalar features and passing them through a sigmoid activation function to "gate" the vector output, replacing the vector nonlinearity. In the paper they note that because the scalar features are invariant and the gating is row-wise, the equivariance of the vector features is not affected. They concludee that vector gating can help improve the GVP-GNN's ability to handle atomic-level structure representations and therefore machine learning on molecules.

<!-- add a better conclusion of this paragraph here -->

<!-- Equivariant message-passing seeks to incorporate the equivariant representations of ENNs within the message-passing framework of GNNs instead of indirectly encoding the 3D geometry in terms of pairwise distances, angles, and other scalar features. <----- this is a sentence from the 2021 paper -->

<p>
    <img src="schematic.png" alt>
</p>
<p>
    <em>Figure 1.</em> Schematic of the original geometric vector perceptron (GVP) as described in Jing et al. (2020) (top) and the modified GVP presented in Jing et al. 2021 (bottom). The original vector nonlinearity (in red) has been replaced with vector gating (in blue), allowing information to propagate from the scalar channels to the vector channels. Circles denote row- or element-wise operations. The modified GVP is the core module in the equivariant GNN.
</p>

In [1]:
# import importlib  
# import gvp # if located in gvp-pytorch directory
# import torch
# import torch_geometric
# import torch.nn.functional as F

# gvp = importlib.import_module("gvp-pytorch.gvp") # if not

In [2]:
tasks = ["SMP", "PSR", "RSR", "MSP", "SMP", "LBA", "LEP"] #,'PPI', 'RES']

args = {"task": tasks[0], # see options above
        "num-workers": 4,
        "smp-idx": None, # range 0-19
        "batch": 8,
        "lba-split": 30, # 30 or 60
        "train-time": 120, # in minutes
        "val-time": 20, # in minutes
        "epochs": 50,
        "test": None, # path to evaluate a trained model
        "lr": 1e-4,
        "load": None, # path to initialize first 2 GNN layers with pretrained weights
        "save": "models", # directory to save models to
        "data": "data", # directory to data
        "monitor": True, # trigger tensorboard monitoring
        "no-pbar": True, # when set, do not show TQDM bars
} 

Exposition of its weaknesses/strengths/potential which triggered your group to come with a response.

BEGIN NOTES
- Current model is not very expressive; it's not steerable; can only handle type-1
  - GVPs would kind of be part of the Invariant Message Passing NNs?
  - So I consider it as a “incomplete” steerable mlp
  - My point is that steerable MLP can enable the information exchange between all possible pairs of vectors (type 0, 1, …, n), but GVP can only exchange the information from scalar vector to type-1 vector by using gating and from type-1 vector to scalar using norm.
- only invariant to rotation, due to taking norm (scalar value) (i think) -> this is only the case in the 2020 paper, but not necessarily in the 2021 paper, so i think we really need to focus on the expressiveness and not necessarily the equivariance
END NOTES

The current model of the authors manages to combine the strengths of CNNs and GNNs while maintaining the rotation invariance. The invariance for rotation is essential because the orientation of the molecule does not change the characteristics of the molecule. However, the combination of the molecules into a protein does depend on the orientation of (the linkage between) the molecules, e.g. the shape of the protein does affect the characteristics of the protein. This is a weakness in the otherwise strength of the model. This triggered us to figure out an approach to take away the weak point while maintaining the strength. 

In [2]:
### OUR CODE HERE

Describe your novel contribution.
- changing the GVP layers to steerable graph convolutions layers
- perhaps change the k in knn for these graph convolution (message passing layers)

<!-- 
ChatGPT stuff:
Steerable graph convolutions (SGCs) can be helpful in learning protein structures in a GNN by allowing the model to capture interactions between non-adjacent nodes in a controllable way. SGCs are a type of graph convolution that incorporates rotation-equivariance into the convolutional operation. This means that the convolutional filter can be rotated to capture different directional information in the graph.

In the context of protein structure prediction, SGCs can be used to capture the directional interactions between amino acids that are not directly adjacent in the protein sequence. For example, SGCs can capture the interactions between amino acids that are separated by a few residues in the protein sequence but are in close proximity in three-dimensional space.

Using SGCs in a GNN for protein structure prediction has shown promising results. For example, a recent study used SGCs in a GNN to predict the 3D structure of proteins from amino acid sequences with high accuracy. The SGC-based GNN outperformed previous methods that relied on local structural features and achieved state-of-the-art performance on a benchmark dataset.

Overall, SGCs can be helpful for learning protein structures in a GNN by allowing the model to capture the directional interactions between non-adjacent amino acids in a controllable way. -->

<!-- based on Noa's hidden stuff :) -->
We aim to achieve this the GVP (steerable) rotation equivariant instead of invariant. As a result, the model remains invariant for the orientation of the molecule, while it takes the orientation of the molecule into account for building the protein and thus the structure of the protein as a whole. The equivariance makes the model treat a protein with all molecules in a line and a protein with all molecules bundles in a small boll, be represented differently instead of similar like the original model. 

hidden stuff
<!-- ipv xyz  componenten steerable componenten om het equivariant te maken
- invariant op dit moment meer omdat oprolling van allerlei \/ moleculen (die wel allemaal andere kant opstaan) worden nu zelfde behandelt als allerlei \/ moleculen op een rechte lijn
- door equivariant te maken (met steerable) hopen wij dat deze wel verschillend worden behandeld

- scalar features (zijn sowieso al rotation invariant, ze hebben geen hoek) en vector features  (rotation invariant atm) waarvan we ook de norm nemen -> scalar dus rotation invariant -->

<!-- State something about it still being hard to make these models rotation equivariant; and write what equivariant is? -->

<!-- - Shift invariance is achieved in CNNs -> convolutional layers are shift equivariant (since convolutions are shift equivariant) and pooling layers are shift invariant (even if image is shifted -a little bit- the max e.g. will remain the same)
- CNNs are not rotation equivariant; if an image is shifted, it is not detected by the same filter as before, so CNN needs to learn every filter for every possible rotation (very inconvenient): "Although, a deep CNN can be made rotationally equivalent by making the network explicitly learn the different feature maps at rotated versions of the input image. But in this approach, CNN is compelled to learn the rotated versions of the same image which increases the risk of overfitting and introduces a redundant degree of freedom."
- GNNs are great and all, but not rotation equivariant; doesn't see the molecule that is rotated one way as the same molecule rotated another way -->


In [3]:
### OUR CODE HERE
%run run_atom3d.py LBA --test ../best_models/LBA_lba-split=30_47.pt --lba-split 30
%run run_atom3d.py LBA --test ../best_models/LBA_lba-split=60_49.pt --lba-split 60

%run run_atom3d.py SMP --test ../best_models/SMP_smp-idx=3_46.pt --smp-idx 3
%run run_atom3d.py SMP --test ../best_models/SMP_smp-idx=7_48.pt --smp-idx 7

Support your contribution with actual code and experiments (hence the colab format!)

In [4]:
### OUR CODE HERE

Conclude

In [5]:
### OUR CODE HERE

Close the notebook with a description of the each students' contribution.