# Table of Contents

1. [Overview of Baseline OC20 Models](#1)
    1. [Graph Neural Networks(GNNs)](#1.1)
        1. [Resources for GNN Study](#1.1.1)
        2. [Graphs](#1.1.2)
        3. [Graph Attributes](#1.1.3)
        3. [Basics of GNNs](#1.1.4)
        3. [Message Passing in Detail](#1.1.5)
    2. [Crystal Graph Convolutional Neural Networks (CGCNNs) and OCP](#1.2)
        1. [General Mechanism](#1.2.1)
        2. [Convolution Layers](#1.2.2)
        2. [Pooling Layer](#1.2.3)
        2. [Hidden Layers](#1.2.4)
    3. [Using OCP Models](#1.3)
        1. [Preparing Config Files](#1.3.1)
        2. [Training Models](#1.3.2)
        3. [Making Predictions](#1.3.3)
        4. [Single Value Predictions](#1.3.4)

<a class="anchor" id="1"></a>
# Overview of Baseline OC20 Models

There are multiple baseline models provided by OCP, every one of them being a graph-neural-network(GNN). In this section, we will first inspect some basics of GNNs, then models provided by OCP to understand the current state of model development.

<a class="anchor" id="1.1"></a>
## Graph Neural Networks(GNNs)

<a class="anchor" id="1.1.1"></a>
#### *Resources for GNN Study*

First, I will provide a few resources if you wish to study the topic broadly:
- A very useful on-purpuse book called Graph Representation Learning by William L. Hamilton from McGill University that can be accessed freely and legally from [this link](https://www.cs.mcgill.ca/~wlh/grl_book/files/GRL_Book.pdf).
- A modern web-article for introduction to Graph Neural Networks on [distill.pub](https://distill.pub/) that can be accessed [here](https://distill.pub/2021/gnn-intro/).
- Video lectures of CS224W provided by Stanford Online. This is a very in-depth, and great course for formally studying the subject. Here is the [link](https://www.youtube.com/playlist?list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn).
- Practical and short videos that are great to skim through the topic on YouTube. Can be accessed [here](https://www.youtube.com/playlist?list=PLV8yxwGOxvvoNkzPfCx2i8an--Tkt7O8Z).
- Hands-on PyTorch Geometric tutorials on Youtube. Can be accessed [here](https://www.youtube.com/playlist?list=PLGMXrbDNfqTzqxB1IGgimuhtfAhGd8lHF).
- [Review Paper by Zhou et al.](https://arxiv.org/pdf/1812.08434.pdf)

I will briefly explain the GNNs and their natural connection to our topic. Yet, I reccomend you at least check the basic information covered in the materials above if wish to familiarize yourself with the subject.

<a class="anchor" id="1.1.2"></a>
#### *Graphs*

Graphs are abstract mathematical structures that are a constructed from nodes and edges. 

<img src="./Figures/fig_graph.png" width=200>

<a class="anchor" id="1.1.3"></a>
#### *Graph Attributes*

We can store information in nodes, edges and globally. It can be practical to think of this information as attribute vectors of the regarding element. We can store multiple attributes in one node or vertex. 

For example, if we try to model a molecule, we can store the atom properties in the node attributes and bond properties in the vertex attributes, while the molecular properties are the global attributes.

<img src="./Figures/fig_graphmolecule.png" width=200>

<a class="anchor" id="1.1.4"></a>
#### *Basics of GNNs*
Graph neural networks are the type of neural networks that operate on graphs. They embreace the principle "graph in - graph out". They are intutively apropriate for our tasks since atoms and interactions establish a natural analogy to nodes and edges as seen in the above example. 

The basic mechanism can be explanined as follows: 
1. We input a graph.
2. From our input, an embedding for each node and edge is created from their attributes. The size of the embedding list is a hyperparameter.
<center><b>Message Passing(Repeated for each hidden layer)</b></center>
<hr>

1. For each node in the graph, gather all the neighboring node embeddings (or messages).

2. Aggregate all messages via an aggregate function (like sum).

3. All pooled messages are passed through an update function, usually a learned neural network.

<hr>

5. Predictions are made using the final embeddings.

<a class="anchor" id="1.1.5"></a>
#### *Message Passing in Detail*

Message passing is the process of nodes transferring information between one and another naturally. For n message passing layers, the output node embedding will contain the contributions of its n'th neighbor. The core process of iteration is given below.

Process is described in the below color coded figures:

<table><tr>
<td> <img src="./Figures/fig_messagepassing.jpg" width=400> </td>
<td> <img src="./Figures/fig_messagepassing2.jpg" width=400> </td>
</tr></table>


<a class="anchor" id="1.2"></a>
## Crystal Graph Convolutional Neural Networks (CGCNNs) and OCP

There are a very large variety of GNN structures used in OCP for different tasks (CGCNN, SchNet, DIMENET, GemNet, etc.). Their configs are accessible through the [repository](https://github.com/Open-Catalyst-Project/ocp/tree/main) and even some pretrained models are avaliable. For the sake of briefness and readibility, we will only discuss the most basic one, CGCNN. This section aims to provide the reader an insight of what is going on by skiming through one example. Please see the links below for all the models and configs provided by OCP.
- [Pretrained model list and their performances](https://github.com/Open-Catalyst-Project/ocp/blob/main/MODELS.md)
- [Config Files](https://github.com/Open-Catalyst-Project/ocp/tree/main/configs)
- [Models](https://github.com/Open-Catalyst-Project/ocp/tree/main/ocpmodels/models)

<a class="anchor" id="1.2.1"></a>
#### *General Mechanism*

You can access the git repository of CGCNN via [this link](https://github.com/txie-93/cgcnn).

CGCNN can be described in two parts with four steps:
<center><b>A. Crystal to Graph</b></center>
<hr>

1. Representing the crystal structure as a graph with nodes as atoms and edges as atomic interactions, respectively.

<center><b>B. Graph to Target</b></center>
<hr>

2. Using R number of convolution layers to learn features from neighboring atoms.
3. Using a pooling layer to concentrate all information into a single feature vector.
4. Using an output layer to predict target property.

The figure below provides a description of the process.

<img src="./Figures/fig_cgcnn.jpg" width=300>

<a class="anchor" id="1.2.2"></a>
#### *Convolution Layers*

Convolution is the process of learning from neighboring nodes. We described almost the same mechanism used here above in the Message Passing section. Simply, we iteratively aggregate the embeddings in neighboring atoms and update the embeddings in our node. For each iteration(each convolution layer), information from the next neighbor will be learned. By doing so, each node will become a representation of its local environment. Convolution layers are put in use to do just that.

$v_i^{(t+1)} = Con(v_i^{(t)}, v_j^{(t)}, u_{(i, j)k})$,  $(i, j) \in G $ -> $v$ is the feature vector.

<a class="anchor" id="1.2.3"></a>
#### *Pooling Layer*

Pooling is the operation we will use to generate an overall feature vector for the whole crystal. We take all of the atom feature vectors(convulated vectors in our case), and then concentrate them into a matrix. Then, we combine them in a specific way using a defined or learned function like normalized summation or a multilayer perceptron. An example visualization is given below.

<img src="./Figures/fig_pool.jpg" width=300>

<a class="anchor" id="1.2.4"></a>
#### *Hidden Layers*

In addition to convolution and pooling layers, there are also two fully-connected hidden layers with the depth of $L_1$ and $L_2$ is used to capture the complex connections between crystal structure and property. 

<a class="anchor" id="1.3"></a>
## Using OCP Models
Here, we will inspect and use pre-trained CGCNN model. Same method is also applicable for other models.

Note that I am performing everything on a M1 MacBook with macOS Ventura 13.4.1, and I can only perform computations on my CPU since there are no CUDA cores in my setup. I am hoping this will not cause any problems for you while you are running the notebook, but if a problem occurs, this is likely to be the reason. You may want to check for problems about installation and package dependencies.

<a class="anchor" id="1.3.1"></a>
#### *Preparing Config Files*

Before we proceed with predictions, we need to set our config files. For this example, everything is already set up but in case the reader needs further applications, it can be helpful to demonstrate the process.

1. First, go to the parent directory of the config you want to work with. It is configs/s2ef/all/ for our example.
2. Open base.yml, and set the directories for train, validation and test datasets. You may notice that they are all set to the same dataset in our config. This is done since we do not aim to get any actual results here. However, you may change them as you wish.
3. Go to configs/s2ef/all/cgcnn and open cgcnn.yml. There are number of parameters if you require customization. I set num_workers to 0 and max_epoch to 1 to speed up the process. It is highly likely that you will need to redefine them according to your purposes.

<a class="anchor" id="1.3.2"></a>
#### *Training Models*

We will use command line interface to interact with the models as described below.
```bash
conda activate ocp-models # Activating the environment ocp is installed.

config_path="./configs/s2ef/all/cgcnn/cgcnn.yml" # Path to config.
checkpoint_path="./checkpoints/cgcnn_all.pt" # Path to checkpoint.

python main.py --mode train --config-yml $config_path # Mode, config, checkpoint.
```
Now the training has begun. You can expect it to take quite a long time if you have limited computational power.

Note: You may see the warnings "LMDB does not contain edge index information, set otf_graph=True" and "Turning otf_graph=True as required attributes not present in data object" if your dataset does not contain edge information. In our case, our dataset does not, and we will compute edge information on the run. This choice trades computation time with storage and is specifically made to reduce the dataset size.

<a class="anchor" id="1.3.3"></a>
#### *Making Predictions*

Only difference here is that we need to pass the directory of the checkpoint we wish to use. We will use the model which is pre-trained over all training dataset. Checkpoints of pre-trained models can be downloaded from the OCP repository, or you can use the checkpoints of manually trained models.
```bash
conda activate ocp-models # Activating the environment ocp is installed.

config_path="./configs/s2ef/all/cgcnn/cgcnn.yml" # Path to config.
checkpoint_path="./checkpoints/cgcnn_all.pt" # Path to checkpoint.

python main.py --mode predict --config-yml $config_path --checkpoint $checkpoint_path # Mode, config, checkpoint.

```
We expect to see the progess printed on the terminal, and finally the results of the prediction. 

Note: Above described situation about edge information is still valid. 

<a class="anchor" id="1.3.4"></a>
#### *OCP Calculator for ASE*
OCP also provide a calculator that can be used with ASE. After importing the calculator, you need to pass it a config and checkpoint to use. We will use, GemNet here in this example. Since checkpoint file size is too large for GitHub, you may need to download it separetely 

In [36]:
from ocpmodels.common.relaxation.ase_utils import OCPCalculator
from ase.optimize import BFGS
from ase.build import fcc100

checkpoint = "checkpoints/gemnet_t_direct_h512_all.pt"
config = "configs/s2ef/all/gemnet/gemnet-dT.yml"

# Construct a sample structure
adslab = fcc100("Cu", size=(3, 3, 3))

adslab.center(vacuum=13.0, axis=2)

# Define the calculator
calc = OCPCalculator(checkpoint=checkpoint)

# Set up the calculator
adslab.calc = calc

opt = BFGS(adslab, trajectory="data/toy_c3h8_relax.traj")

opt.run(fmax=0.05, steps=100)

amp: false
cmd:
  checkpoint_dir: /Users/irmakaslan/OCP/checkpoints/2023-08-08-21-33-36
  commit: 090486f
  identifier: ''
  logs_dir: /Users/irmakaslan/OCP/logs/tensorboard/2023-08-08-21-33-36
  print_every: 100
  results_dir: /Users/irmakaslan/OCP/results/2023-08-08-21-33-36
  seed: null
  timestamp_id: 2023-08-08-21-33-36
dataset: null
gpus: 0
logger: tensorboard
model: gemnet_t
model_attributes:
  activation: silu
  cbf:
    name: spherical_harmonics
  cutoff: 6.0
  direct_forces: true
  emb_size_atom: 512
  emb_size_bil_trip: 64
  emb_size_cbf: 16
  emb_size_edge: 512
  emb_size_rbf: 16
  emb_size_trip: 64
  envelope:
    exponent: 5
    name: polynomial
  extensive: true
  max_neighbors: 50
  num_after_skip: 2
  num_atom: 3
  num_before_skip: 1
  num_blocks: 3
  num_concat: 1
  num_radial: 128
  num_spherical: 7
  otf_graph: true
  output_init: HeOrthogonal
  rbf:
    name: gaussian
  regress_forces: true
noddp: false
optim:
  batch_size: 32
  clip_grad_norm: 10
  ema_decay: 0.99

True