# PyTorch Geometric - Introduction by Example
https://rusty1s.github.io/pytorch_geometric/build/html/notes/introduction.html

## Data handling of graphs

A graph is used to model pairwise relations (edges) between objects (nodes).
A single graph in PyTorch Geometric is described by an instance of :class:`torch_geometric.data.Data`, which holds the following attributes by default:

- ``data.x``: Node feature matrix with shape ``[num_nodes, num_node_features]``
- ``data.edge_index``: Graph connectivity in COO format with shape ``[2, num_edges]`` and type ``torch.long``
- ``data.edge_attr``: Edge feature matrix with shape ``[num_edges, num_edge_features]``
- ``data.y``: Target to train against (may have arbitrary shape)
- ``data.pos``: Node position matrix with shape ``[num_nodes, num_dimensions]``

Simple example of an unweighted and undirected graph with three nodes and four edges. Each node contains exactly one feature:

![alt text](https://rusty1s.github.io/pytorch_geometric/build/html/_images/graph.svg)

In [1]:
import torch
from torch_geometric.data import Data

edge_index = torch.tensor([[0, 1, 1, 2],
                           [1, 0, 2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)

data = Data(x=x, edge_index=edge_index)

In [2]:
print(data.keys)

['x', 'edge_index']


In [3]:
print(data['x'])

tensor([[-1.],
        [ 0.],
        [ 1.]])


In [4]:
for key, item in data:
    print(key + ' found in data')

edge_index found in data
x found in data


In [5]:
'edge_attr' in data

False

In [6]:
data.num_nodes

3

In [7]:
data.num_edges

4

In [8]:
data.num_features

1

In [9]:
data.contains_isolated_nodes()

False

In [10]:
data.contains_self_loops()

False

In [11]:
data.is_directed()

False

In [12]:
# Transfer data object to GPU.
device = torch.device('cuda')
data = data.to(device)

## Common benchmark datasets

In [15]:
from torch_geometric.datasets import TUDataset

dataset = TUDataset(root='/tmp/ENZYMES', name='ENZYMES')

In [16]:
len(dataset)

600

In [18]:
dataset.num_classes

6

In [19]:
dataset.num_features

3

...

## Mini-batches

...

## Data transforms

Transforms are a common way in ``torchvision`` to transform images and perform augmentation.
PyTorch Geometric comes with its own transforms, which expect a ``Data`` object as input and return a new transformed ``Data`` object.
Transforms can be chained together using :class:`torch_geometric.transforms.Compose` and are applied before saving a processed dataset on disk (``pre_transform``) or before accessing a graph in a dataset (``transform``).

Let’s look at an example, where we apply transforms on the ShapeNet dataset (containing 17,000 3D shape point clouds and per point labels from 16 shape categories). We can convert the point cloud dataset into a graph dataset by generating nearest neighbor graphs from the point clouds via transforms:

In [23]:
import torch_geometric.transforms as T
from torch_geometric.datasets import ShapeNet

dataset = ShapeNet(root='/tmp/ShapeNet', category='Airplane',
                    pre_transform=T.KNNGraph(k=6))

## Learning methods on graphs

Use a simple GCN layer and replicate the experiments on the Cora citation dataset.
For a high-level explanation on GCN, have a look at its [blog post](http://tkipf.github.io/graph-convolutional-networks/).

We first need to load the Cora dataset:

In [24]:
from torch_geometric.datasets import Planetoid

dataset = Planetoid(root='/tmp/Cora', name='Cora')

Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!


Now let’s implement a two-layer GCN

In [25]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = GCNConv(dataset.num_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)

        return F.log_softmax(x, dim=1)

Let’s train this model on the train nodes for 200 epochs:

In [26]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = Net().to(device)
data = dataset[0].to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()

Finally we can evaluate our model on the test nodes:

In [27]:
model.eval()
_, pred = model(data).max(dim=1)
correct = float (pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
acc = correct / data.test_mask.sum().item()
print('Accuracy: {:.4f}'.format(acc))

Accuracy: 0.7860
