# Tutorial 1: An Introduction to Pytorch Geometric

* Ben Finkelstein (benfin@campus.technion.ac.il)
* Reception hours will be scheduled by e-mail

## The upcoming tutorials 
1. Pytorch Geometric (today)
2. Geometric Deep Learning (next week)
3. GCN, GAT, SGC architectures
4. Message passing
5. ...

![spoilers.jpg](attachment:spoilers.jpg)

## Graphs
A graph $G = (V, E, A, W)$, can be represented with:<br>
* A vertex set $V = \{1, 2, \dots, n\}$<br>
* An edge set $E\subseteq V\times V$<br>


* Vertex-weight matrix $A=diag\{a_i\}_{i=1}^{n}$.<br>
* Edge-weight matrix $W$, where $w_{ij}=0\rightarrow(i,j)\notin E$.

![example_graph4.png](attachment:example_graph4.png)

# What is Pytorch Geometric (PG) ?

Pytorch Geometric is a geometric deep learning extension library for PyTorch.
It provides the following main features:

* Data Handling of Graphs
* Common Benchmark Datasets
* Mini-batches
* Data Transforms
* Learning Methods on Graphs

## Data Handling of Graphs

![example_graph1.jpeg](attachment:example_graph1.jpeg)

A single graph in PyTorch Geometric is described by an instance of torch_geometric.data.Data, which holds the following attributes by default:

* __data.x__: Node feature matrix with shape [num_nodes, num_node_features] <br>
In our example: [num_nodes=colors, num_node_features=len(x)]
* __data.edge_index__: Graph connectivity in COO format with shape [2, num_edges] and type torch.long<br>
In our example: [[1, 2, 2, 3, 3, 1,...],[2, 1, 3, 2, 1, 3...]]
* __data.y__: Target to train against (may have arbitrary shape), e.g., node-level targets of shape [num_nodes, 1] or <br>graph-level targets of shape [1, num_of_graphs] <br>
In our example the target classes are the different colors (node-level targets).


* __data.edge_attr__: Edge feature matrix with shape [num_edges, num_edge_features] (rarely used)
* __data.pos__: Node position matrix with shape [num_nodes, num_dimensions] (mainly used in 3D mesh)

None of these attributes is required. In fact, the Data object is not even restricted to these attributes. We can, e.g., extend it by data.face to save the connectivity of triangles from a 3D mesh in a tensor with shape [3, num_faces] and type torch.long.

![look_at_this_graph.png](attachment:look_at_this_graph.png)

Lets look at a simple example of an unweighted and undirected graph with three nodes and four edges.
Each node contains exactly one feature:

In [1]:
import torch
from torch_geometric.data import Data

edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index)
data

Data(edge_index=[2, 4], x=[3, 1])

__Note:__ edge_index, i.e. the tensor defining the source and target nodes of all edges, is not a list of index tuples. If you want to write your indices this way, you should transpose and call contiguous on it before passing them to the data constructor:

In [2]:
import torch
from torch_geometric.data import Data

edge_index = torch.tensor([[0, 1],
                           [1, 0],
                           [1, 2],
                           [2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index.t().contiguous())
data

Data(edge_index=[2, 4], x=[3, 1])

__Note:__ Although the graph has only two edges, we need to define four index tuples to account for both directions of a edge.

Data provides a number of utility functions, e.g.:

In [3]:
data.keys

['x', 'edge_index']

In [4]:
data.x, data['x']

(tensor([[-1.],
         [ 0.],
         [ 1.]]),
 tensor([[-1.],
         [ 0.],
         [ 1.]]))

In [5]:
for key, item in data:
    print(f'data.{key} is {item}')

data.edge_index is tensor([[0, 1, 1, 2],
        [1, 0, 2, 1]])
data.x is tensor([[-1.],
        [ 0.],
        [ 1.]])


In [6]:
'edge_attr' in data

False

In [7]:
'x' in data

True

In [8]:
data.num_nodes

3

In [9]:
data.num_edges

4

In [10]:
data.num_features

1

In [11]:
data.num_node_features

1

In [12]:
data.contains_isolated_nodes()

False

In [13]:
data.contains_self_loops()

False

In [14]:
data.is_directed()

False

In [15]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
data = data.to(device)

__You can find a complete list of all methods at torch_geometric.data.Data.__

## Common Benchmark Datasets

PyTorch Geometric contains a large number of common benchmark datasets, e.g.:

1. Planetoid datasets (Cora, Citeseer, Pubmed) and Twitter, each contains a __single graph__ (node-level target)
2. Molecular datasets (the QM7 and QM9 datasets), each contains a __variety of graphs__ (graph-level target)
3. 3D mesh/point cloud datasets like FAUST, ModelNet10/40 and ShapeNet 

![example_graph3.png](attachment:example_graph3.png)

![these_arent_the_graphs.jpg](attachment:these_arent_the_graphs.jpg)

Initializing a dataset is straightforward. <br>
An initialization of a dataset will automatically download its raw files and process them to the previously described Data format. <br>
E.g., to load the ENZYMES dataset (consisting of 600 graphs within 6 classes), type:

In [16]:
from torch_geometric.datasets import TUDataset

dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES')
dataset

Downloading https://www.chrsmrrs.com/graphkerneldatasets/ENZYMES.zip
Extracting /tmp/ENZYMES/ENZYMES/ENZYMES.zip
Processing...
Done!


ENZYMES(600)

In [17]:
print(f'number of graphs: {len(dataset)}')
print(f'number of classes: {dataset.num_classes}')
print(f'number of features per node: {dataset.num_node_features}')

number of graphs: 600
number of classes: 6
number of features per node: 3


In [18]:
data = dataset[0]
data

Data(edge_index=[2, 168], x=[37, 3], y=[1])

In [19]:
data.is_undirected()

True

We can see that the first graph in the dataset contains __37 nodes, each one having 3 features__.<br>
There are __168/2 = 84 undirected edges and the graph is assigned to exactly one class__.

We can even use slices, long or byte tensors to split the dataset.<br>
E.g., to create a 90/10 train/test split, type:

In [20]:
train_dataset = dataset[:540]
train_dataset

ENZYMES(540)

In [21]:
test_dataset = dataset[540:]
test_dataset

ENZYMES(60)

If you are unsure whether the dataset is already shuffled before you split, you can randomly permutate it by running:

In [22]:
dataset = dataset.shuffle()
dataset

ENZYMES(600)

This is equivalent of doing:

In [23]:
perm = torch.randperm(len(dataset))
dataset = dataset[perm]
dataset

ENZYMES(600)

__Note:__ for some datasets the Data object holds additional attributes.<br>An example of such attributes would be: train_mask, val_mask and test_mask

* __train_mask__ denotes against which nodes to train
* __val_mask__ denotes which nodes to use for validation, e.g., to perform early stopping
* __test_mask__ denotes against which nodes to test

### Open Graph Benchmark (OGB)

OGB contains graph datasets that are managed by data loaders.<br>
The loaders handle downloading and pre-processing of the datasets.<br>
Additionally, OGB has standardized evaluators and leaderboards to keep track of state-of-the-art results.

![ogb_overview.png](attachment:ogb_overview.png)

The OGB components are closely tied to OGB Python package,<br>
__however__, some of the blocks shown above can be substituted with the corresponding PG blocks. <br>
For example: Dataset or DataLoader blocks

## Mini-batches

Neural networks are usually trained in a batch-wise fashion.<br>
PyTorch Geometric achieves parallelization over a mini-batch by creating sparse block diagonal adjacency matrices<br>
(defined by edge_index) and concatenating feature and target matrices in the node dimension.<br>
This composition allows differing number of nodes and edges over examples in one batch:

\begin{split}\mathbf{A} = \begin{bmatrix} \mathbf{A}_1 & & \\ & \ddots & \\ & & \mathbf{A}_n \end{bmatrix}, \qquad \mathbf{X} = \begin{bmatrix} \mathbf{X}_1 \\ \vdots \\ \mathbf{X}_n \end{bmatrix}, \qquad \mathbf{Y} = \begin{bmatrix} \mathbf{Y}_1 \\ \vdots \\ \mathbf{Y}_n \end{bmatrix}\end{split}

PyTorch Geometric contains its own torch_geometric.data.DataLoader,<br>
which already takes care of this concatenation process.<br>
__torch_geometric.data.Batch inherits from torch_geometric.data.Data and contains an additional attribute called batch.__


batch is a column vector which maps each node to its respective graph in the batch:<br>
$\mathrm{batch} = {\begin{bmatrix} 0 & \cdots & 0 & 1 & \cdots & n - 2 & n -1 & \cdots & n - 1 \end{bmatrix}}^{\top}$<br>
You can use it to, e.g., average node features in the node dimension for each graph individually. Let’s learn about it in an example:

In [24]:
from torch_scatter import scatter_mean
from torch_geometric.datasets import TUDataset
from torch_geometric.data import DataLoader

dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES', use_node_attr=True)
loader = DataLoader(dataset, batch_size=32, shuffle=True)

for data in loader:
    print(data)
    print(data.num_graphs)
    x = scatter_mean(data.x, data.batch, dim=0)
    print(x.size())
    break

Batch(batch=[1173], edge_index=[2, 4288], ptr=[33], x=[1173, 21], y=[32])
32
torch.Size([32, 21])


## Data Transforms

Transforms are a common way in torchvision to transform images and perform augmentation.<br>
PyTorch Geometric comes with its own transforms, which expect a Data object as input and return a new transformed Data object.<br>


__Transforms can be chained together using torch_geometric.transforms.Compose__ and are applied <br> before saving a processed dataset on disk (pre_transform) or before accessing a graph in a dataset (transform).

Let’s look at an example, where we apply transforms on the ShapeNet dataset,<br>
which contains 17,000 3D shape point clouds and per point labels from 16 shape categories:

In [None]:
from torch_geometric.datasets import ShapeNet

dataset = ShapeNet(root='/tmp/ShapeNet', categories=['Airplane'])
dataset[0]

Downloading https://shapenet.cs.stanford.edu/media/shapenetcore_partanno_segmentation_benchmark_v0_normal.zip


__We can convert the point cloud dataset into a graph dataset by generating nearest neighbor graphs from the point clouds via transforms:__

In [None]:
import torch_geometric.transforms as T
from torch_geometric.datasets import ShapeNet

dataset = ShapeNet(root='/tmp/ShapeNet', categories=['Airplane'],
                    pre_transform=T.KNNGraph(k=6))

dataset[0]

__We use the pre_transform to convert the data before saving it to disk (leading to faster loading times)__.<br> The next time the dataset is initialized it will already contain graph edges, even if you do not pass any transform.

In addition, we can use the transform argument to randomly augment a Data object,
e.g., translating each node position by a small number:

In [None]:
import torch_geometric.transforms as T
from torch_geometric.datasets import ShapeNet

dataset = ShapeNet(root='/tmp/ShapeNet', categories=['Airplane'],
                    pre_transform=T.KNNGraph(k=6),
                    transform=T.RandomTranslate(0.01))

dataset[0]

__You can find a complete list of all implemented transforms at torch_geometric.transforms.__

## Neural networks on Graphs - Planetoid (node-level targets)

It’s time to implement our first graph neural network!<br>
We will use a simple GCN layer and the Cora citation dataset.

In [None]:
from torch_geometric.datasets import Planetoid

dataset = Planetoid(root='/tmp/Cora', name='Cora')
dataset

Let’s implement a two-layer GCN:

In [None]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class TwoLayerGCN(torch.nn.Module):
    def __init__(self):
        super(TwoLayerGCN, self).__init__()
        self.conv1 = GCNConv(dataset.num_node_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)

        return F.log_softmax(x, dim=1)

The constructor defines two GCNConv layers which get called in the forward pass of our network. <br>
__The non-linearity is not integrated in the conv calls__ and hence needs to be applied afterwards<br>
(something which is consistent accross all operators in PyTorch Geometric).<br>
As in a node-level classification task our output is the softmax distribution over the number of classes. <br>
__Note:__ There are multiple graph convolutional networks such as the GCN.

Let’s train this model on the train nodes for 200 epochs:

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = TwoLayerGCN().to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()

Finally we can evaluate our model on the test nodes:

In [None]:
model.eval()
_, pred = model(data).max(dim=1)
correct = int(pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
acc = correct / int(data.test_mask.sum())
print('Accuracy: {:.4f}'.format(acc))

__That is all it takes to implement your first graph neural network.__ <br>
The easiest way to learn more about graph convolution is to go over rusty's (https://github.com/rusty1s/pytorch_geometric) examples/ directory

## Neural networks on Graphs - Molecular (graph-level targets)

![learning.jpg](attachment:learning.jpg)

In [None]:
import os.path as osp

import torch

from torch_geometric.datasets import QM9
from torch_geometric.data import DataLoader
from torch_geometric.nn import SchNet

In [None]:
path = osp.join(osp.abspath(''), 'data', 'QM9')
dataset = QM9(path)

# DimeNet uses the atomization energy for targets U0, U, H, and G.
idx = torch.tensor([0, 1, 2, 3, 4, 5, 6, 12, 13, 14, 15, 11])
dataset.data.y = dataset.data.y[:, idx]

In [None]:
for target in range(1):

    model, datasets = SchNet.from_qm9_pretrained(path, dataset, target)
    train_dataset, val_dataset, test_dataset = datasets

    model = model.to(device)
    loader = DataLoader(test_dataset, batch_size=256)

    maes = []
    for data in loader:
        data = data.to(device)
        with torch.no_grad():
            pred = model(data.z, data.pos, data.batch)
        mae = (pred.view(-1) - data.y[:, target]).abs()
        maes.append(mae)

    mae = torch.cat(maes, dim=0)

    # Report meV instead of eV.
    mae = 1000 * mae if target in [2, 3, 4, 6, 7, 8, 9, 10] else mae

    print(f'Target: {target:02d}, MAE: {mae.mean():.5f} ± {mae.std():.5f}')

## Learning on Graphs - 3D mesh/point clouds

In [None]:
import os.path as osp

import torch
import torch.nn.functional as F
from torch.nn import Sequential as Seq, Dropout, Linear as Lin
from torch_geometric.datasets import ModelNet
import torch_geometric.transforms as T
from torch_geometric.data import DataLoader
from torch_geometric.nn import DynamicEdgeConv, global_max_pool
from torch.nn import Sequential as Seq, Linear as Lin, ReLU, BatchNorm1d as BN

Loading ModelNet10 dataset

In [None]:
path = osp.join(osp.dirname(osp.abspath('')), 'data/ModelNet10')
pre_transform, transform = T.NormalizeScale(), T.SamplePoints(2 ** 10)
train_dataset = ModelNet(path, '10', True, transform, pre_transform)
test_dataset = ModelNet(path, '10', False, transform, pre_transform)

In [None]:
# remove these 2 lines for full training
train_dataset = train_dataset[:100]
test_dataset = test_dataset[:100]

train_loader = DataLoader(
    train_dataset, batch_size=32, shuffle=True, num_workers=6)
test_loader = DataLoader(
    test_dataset, batch_size=32, shuffle=False, num_workers=6)

Creating our desired GNN

In [None]:
class Net(torch.nn.Module):
    def __init__(self, out_channels, k=20, aggr='max'):
        super().__init__()

        self.conv1 = DynamicEdgeConv(MLP([2 * 3, 64, 64, 64]), k, aggr)
        self.conv2 = DynamicEdgeConv(MLP([2 * 64, 128]), k, aggr)
        self.lin1 = MLP([128 + 64, 1024])

        self.mlp = Seq(
            MLP([1024, 512]), Dropout(0.5), MLP([512, 256]), Dropout(0.5),
            Lin(256, out_channels))

    def forward(self, data):
        pos, batch = data.pos, data.batch
        x1 = self.conv1(pos, batch)
        x2 = self.conv2(x1, batch)
        out = self.lin1(torch.cat([x1, x2], dim=1))
        out = global_max_pool(out, batch)
        out = self.mlp(out)
        return F.log_softmax(out, dim=1)

Creating an MLP tail:

In [None]:
def MLP(channels, batch_norm=True):
    return Seq(*[
        Seq(Lin(channels[i - 1] , channels[i]), ReLU(), BN(channels[i]))
        for i in range(1, len(channels))
    ])

Using basic train/test blocks:

In [None]:
def train():
    model.train()

    total_loss = 0
    for data in train_loader:
        data = data.to(device)
        optimizer.zero_grad()
        out = model(data)
        loss = F.nll_loss(out, data.y)
        loss.backward()
        total_loss += loss.item() * data.num_graphs
        optimizer.step()
    return total_loss / len(train_dataset)


def test(loader):
    model.eval()

    correct = 0
    for data in loader:
        data = data.to(device)
        with torch.no_grad():
            pred = model(data).max(dim=1)[1]
        correct += pred.eq(data.y).sum().item()
    return correct / len(loader.dataset)

Let's check our example:

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Net(train_dataset.num_classes, k=20).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)

In [None]:
for epoch in range(1, 2):
    loss = train()
    test_acc = test(test_loader)
    print('Epoch {:03d}, Loss: {:.4f}, Test: {:.4f}'.format(
        epoch, loss, test_acc))
    scheduler.step()

![nn_meme.jpg](attachment:nn_meme.jpg)

Thank you very much. If anyone has questions, go ahead 

![next_week.png](attachment:next_week.png)