# Introduction by Example

> [PyTorch geometric](https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html#learning-methods-on-graphs)의 튜토리얼을 구현한 jupyter notebook 입니다.

![](./images/pytorch_geometric.png)

## 01. Common Benchmark Datasets

PyTorch Geometric contains a large number of common benchmark datasets, *e.g.* all Planetoid datasets (Cora, Citeseer, Pubmed), all graph classification datasets from http://graphkernels.cs.tu-dortmund.de/ and their [cleaned versions](https://github.com/nd7141/graph_datasets), the QM7 and QM9 dataset, and a handful of 3D mesh/point cloud datasets like FAUST, ModelNet10/40 and ShapeNet.

### Cora Dataset

This time, the `Data` objects holds a label for each node, and additional attributes: `train_mask`, `val_mask` and `test_mask`:

- `train_mask` denotes against which nodes to train (140 nodes)
- `val_mask` denotes which nodes to use for validation, *e.g.*, to perform early stopping (500 nodes)
- `test_mask` denotes against which nodes to test (1000 nodes)

In [1]:
from torch_geometric.datasets import Planetoid


dataset = Planetoid('../data/Cora', name='Cora')

In [2]:
dataset

Cora()

In [3]:
len(dataset)

1

In [4]:
dataset.num_classes

7

In [5]:
dataset.num_node_features

1433

In [6]:
data = dataset[0]
data

Data(edge_index=[2, 10556], test_mask=[2708], train_mask=[2708], val_mask=[2708], x=[2708, 1433], y=[2708])

In [12]:
import networkx as nx
import matplotlib.pyplot as plt
from torch_geometric.utils.convert import to_networkx


G = to_networkx(data, to_undirected=True)
# nx.draw(G)

In [7]:
data.is_undirected()

True

In [8]:
data.train_mask.sum().item()

140

In [9]:
data.val_mask.sum().item()

500

In [10]:
data.test_mask.sum().item()

1000

## 02. Mini-batches

PyTorch Geometric achieves parallelization over a mini-batch by creating sparse block diagonal adjacency matrices (defined by `edge_index` and `edge_attr`) and concatenating feature and target matrices in the node dimension. 

$$
\begin{split}\mathbf{A} = \begin{bmatrix} \mathbf{A}_1 & & \\ & \ddots & \\ & & \mathbf{A}_n \end{bmatrix}, \qquad \mathbf{X} = \begin{bmatrix} \mathbf{X}_1 \\ \vdots \\ \mathbf{X}_n \end{bmatrix}, \qquad \mathbf{Y} = \begin{bmatrix} \mathbf{Y}_1 \\ \vdots \\ \mathbf{Y}_n \end{bmatrix}\end{split}
$$

PyTorch Geometric contains its own [`torch_geometric.data.DataLoader`](https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.DataLoader), which already takes care of this concatenation process.

In [17]:
from torch_geometric.datasets import TUDataset

dataset = TUDataset(root='../data/ENZYMES', name='ENZYMES')

Downloading https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets/ENZYMES.zip
Extracting ../data/ENZYMES/ENZYMES/ENZYMES.zip
Processing...
Done!


In [18]:
len(dataset)

600

In [19]:
dataset.num_classes

6

In [20]:
dataset.num_node_features

3

In [21]:
data = dataset[0]
data

Data(edge_index=[2, 168], x=[37, 3], y=[1])

In [23]:
data.is_undirected()

True

In [25]:
data['x'].size()

torch.Size([37, 3])

In [26]:
from torch_geometric.datasets import TUDataset
from torch_geometric.data import DataLoader

dataset = TUDataset(root='../data/ENZYMES', name='ENZYMES', use_node_attr=True)
loader = DataLoader(dataset, batch_size=16, shuffle=True)

for batch in loader:
    data = batch
    break

In [29]:
data

Batch(batch=[568], edge_index=[2, 2214], x=[568, 21], y=[16])

In [31]:
data['x'].size()

torch.Size([568, 21])