# Tutorial 9

## Outline

1. Graph Neural Netork - Manipulating Graph Data with `torch_geometric`
2. Get started with Savio (Berkeley HPC platform)
3. (UGrad only) Setting up ANI project

## GNN

<img src="./graph.png" width="450" />

+ Nodes: $\boldsymbol{v}_i$
+ Edges: $\boldsymbol{e}_{ij}$
+ An example of message passing:

$$\boldsymbol{e}_{ij}^{(l+1)}=f_e(\boldsymbol{e}_{ij}^{(l)},\boldsymbol{v}_i^{(l)},\boldsymbol{v}_j^{(l)})$$

$$\boldsymbol{v}_i^{(l+1)}=f_v(\boldsymbol{v}_i^{(l+1)},\{\boldsymbol{e}_{ij}^{(l+1)}\})$$

### Manipulating Graph Data in PyTorch: PyG

+ Documentation: https://pytorch-geometric.readthedocs.io/en/latest/
+ Installation: https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html

    ```conda install pyg -c pyg```

### Usage: Take QM9 as an example

QM9 is a dataset with 130,000 molecules with 19 regression targets, including dipole moments, atomization enthalpy, etc.

In [1]:
import itertools

import torch
import torch.nn as nn

from torch_geometric.datasets import QM9

import numpy as np
from sklearn.model_selection import train_test_split

The `load_qm9` does the following things:

1. Download the QM9 dataset
2. Re-build the molecular graph: the original datasets add edges only for atoms connected by a chemical bond, however, here we create an edge between every pair of atoms
3. Calculate edge feature: $r^{-1}$
4. Extract only atomization enthalpies as the target.

In [2]:
def load_qm9(path="./QM9"):
    
    def transform(data):
        # re-build molecular graph
        edge_index = torch.tensor(
            list(itertools.permutations(range(data.x.shape[0]), 2)), 
            dtype=torch.long
        ).T
        data.edge_index = edge_index
        # use 1/r as edge features
        edge_feature = 1 / torch.sqrt(
            torch.sum(
                (data.pos[edge_index[0]] - data.pos[edge_index[1]]) ** 2, 
                axis=1, keepdim=True
            )
        )
        data.edge_attr = edge_feature
        # extract atomization enthalpies
        data.y = data.y[:, [-7]]
        return data
    
    qm9 = QM9(path, transform=transform)
    return qm9

qm9 = load_qm9("../../Datasets/QM9")
qm9

Downloading https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/molnet_publish/qm9.zip
Extracting ../../Datasets/QM9/raw/qm9.zip
Downloading https://ndownloader.figshare.com/files/3195404
Processing...
100%|██████████████████████████████████| 133885/133885 [02:26<00:00, 911.54it/s]
Done!


QM9(130831)

The dataset can be sliced.

In [3]:
train_index, test_index = train_test_split(np.arange(len(qm9)), test_size=0.2)
train_data = qm9[train_index]
test_data = qm9[test_index]
train_data

QM9(104664)

The dataset can be batched with data loader.

In [9]:
from torch_geometric.loader import DataLoader as GraphDataLoader

dataloader = GraphDataLoader(qm9, batch_size=2)
for data in dataloader:
    print(data)
    break

DataBatch(x=[9, 11], edge_index=[2, 32], edge_attr=[32, 1], y=[2, 1], pos=[9, 3], z=[9], smiles=[2], name=[2], idx=[2], batch=[9], ptr=[3])


Node features

In [10]:
data.x

tensor([[0., 1., 0., 0., 0., 6., 0., 0., 0., 0., 4.],
        [1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 7., 0., 0., 0., 0., 3.],
        [1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]])

Edge features

In [11]:
data.edge_attr

tensor([[0.9158],
        [0.9158],
        [0.9158],
        [0.9158],
        [0.9158],
        [0.5608],
        [0.5608],
        [0.5608],
        [0.9158],
        [0.5608],
        [0.5608],
        [0.5608],
        [0.9158],
        [0.5608],
        [0.5608],
        [0.5608],
        [0.9158],
        [0.5608],
        [0.5608],
        [0.5608],
        [0.9831],
        [0.9831],
        [0.9831],
        [0.9831],
        [0.6179],
        [0.6178],
        [0.9831],
        [0.6179],
        [0.6178],
        [0.9831],
        [0.6178],
        [0.6178]])

Edge index: a tensor with shape `(n_edge, 2)`

In [12]:
data.edge_index

tensor([[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6,
         6, 6, 7, 7, 7, 8, 8, 8],
        [1, 2, 3, 4, 0, 2, 3, 4, 0, 1, 3, 4, 0, 1, 2, 4, 0, 1, 2, 3, 6, 7, 8, 5,
         7, 8, 5, 6, 8, 5, 6, 7]])

Batch: the node belongs to which graph

In [13]:
data.batch

tensor([0, 0, 0, 0, 0, 1, 1, 1, 1])

### Useful function: `scatter`

+ Documentation: https://pytorch-scatter.readthedocs.io/en/latest/functions/scatter.html

<img src="https://raw.githubusercontent.com/rusty1s/pytorch_scatter/master/docs/source/_figures/add.svg" width="350" />

$$\mathrm{out}_i = \mathrm{out}_i + \sum_{j}\mathrm{src}_j $$

where $\sum_j$  is over $j$ such that $\mathrm{index}_j=i$.

In [None]:
from torch_geometric.utils import scatter

inp = torch.tensor([5, 1, 7, 2, 3, 2, 1, 3], dtype=torch.float)
index = torch.tensor([0, 0, 1, 0, 2, 2, 3, 3], dtype=torch.long)
out = scatter(inp, index)
out

**Example**: aggregate edge features and concatenate with node features. i.e.

$$v'_i=v_i\oplus\sum_{j\in N(i)}e_{ij}$$

$N(i)$ means the set of nodes that is directly connected with node $i$

In [None]:
edge_aggr = scatter(data.edge_attr, data.edge_index[0])
new_node = torch.cat([data.x, edge_aggr], dim=1)
new_node

## Savio

0. **Important**: always use scratch directory to avoid disk quota issues

    ```bash
    cd /global/scratch/users/[USER_NAME]
    ```
    
    Replace `[USER_NAME]` with yours


1. Link the environment with Jupyter:
    
   Configure conda first with the following commands:
   
    ```bash
    module load python
    conda init bash
    source ~/.bashrc
    ``` 
   
   Use Eric's environment which has all required dependencies for the ANI project (torch, torchani, numpy):
   
   ```bash
   conda activate /global/scratch/users/ericwangyz/chem242/ani
   python -m ipykernel install --user --name=ani
   ```
   
   Then you'll see the kernel in Jupyter Notebook:
   
   <img src="./jupyter_kernel.png" width="800" />
   
   **Note**: If you are using Eric's environment, please don't install any other packages to this environment.
    
  
2. For grad students: you can use following the code to set up your own environment:

    ```bash
    conda create -p /global/scratch/users/[USER_NAME]/[ENV_NAME] python=3.10
    conda activate /global/scratch/users/[USER_NAME]/[ENV_NAME]
    conda install [PACKAGE_NAMES] # don't forget to install `ipykernel` here
    python -m ipykernel install --user --name=[ENV_NAME]
    ```

## [UGrad Only] ANI Project

See: bCourses > Midterm_FinalsProject > ugrad_project_files