# Introduction to PyTorch Geometric

## Table of Contents

1. [Overview of PyG](#Overview-of-PyG)
2. [PyG's Data Design](#PyGs-Data-Design)
3. [Simple Example with PyG](#Simple-Example-with-PyG)
4. [Batch Training with PyG's Data Loader](#Batch-Training-with-PyGs-Data-Loader)

In [1]:
# Uncomment the following line to install PyTorch Geometric when running on Google Colab
# !pip install torch torchvison torchaudio torch-geometric

## Overview of PyG <a name="Overview-of-PyG"></a>

PyTorch Geometric (PyG) is an extension library for PyTorch dedicated to the processing of irregularly structured input data, primarily graphs.
Here's a brief history:

- Rapid adoption due to its simplicity and operation efficiency.
- Inherits PyTorch's dynamic computation graph for flexibility.
- Offers methods for both shallow and deep graph learning.
- Regular updates have incorporated numerous methods and functions, making it a leading graph learning library.

It has other extension libraries, such as [PyTorch Geometric Temporal](https://pytorch-geometric-temporal.readthedocs.io/en/latest/), which is dedicated to the processing of temporal graphs.

In addition, there has been a commercialization of PyG, [kumo.ai](https://kumo.ai/pyg), which offers a cloud-based service for PyG.

In [2]:
import torch
import torch_geometric

print(torch.__version__)
print(torch_geometric.__version__)

2.0.1
2.3.1


## PyG's Data Design <a name="PyGs-Data-Design"></a>

PyG represents a graph using the `Data` object.
The library documentation provides some very good [tutorial pages](https://pytorch-geometric.readthedocs.io/en/latest/get_started/introduction.html) of how to use this object.
Here is a brief summary:

- `x`: Node feature matrix of shape `[num_nodes, num_features]`.
- `edge_index`: COO format of the graph, a tensor of shape `[2, num_edges]` detailing the source and destination nodes.
- It can also handle other attributes like `edge_attr` (edge features) and `y` (labels).
- This design can accommodate directed/undirected graphs, with or without self-loops.

In [3]:
from torch_geometric.data import Data

# Create a simple graph consisting of 5 nodes and 3 edges
edge_index = torch.tensor([[0, 1, 1, 2, 3, 4],
                            [1, 0, 2, 1, 4, 3]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1], [2], [3]], dtype=torch.float)

# Create a PyTorch Geometric data object
data = Data(x=x, edge_index=edge_index)

# Print the data object
print(data)

Data(x=[5, 1], edge_index=[2, 6])


In [4]:
print(data.x)

tensor([[-1.],
        [ 0.],
        [ 1.],
        [ 2.],
        [ 3.]])


In [5]:
print(data.edge_index)

tensor([[0, 1, 1, 2, 3, 4],
        [1, 0, 2, 1, 4, 3]])


## Simple Example with PyG <a name="Simple-Example-with-PyG"></a>

Let's make a simple GCN example of using PyG.

In [6]:
from torch_geometric.nn import GCNConv

# Graph data
edge_index = torch.tensor([[0, 1, 1, 2], [1, 0, 2, 1]], dtype=torch.long)
x = torch.tensor([[-1], [0], [1]], dtype=torch.float)
data = Data(x=x, edge_index=edge_index)

# GCN Layer
class SimpleGCN(torch.nn.Module):
    def __init__(self, input_dim, output_dim):
        super(SimpleGCN, self).__init__()
        self.conv = GCNConv(input_dim, output_dim)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = self.conv(x, edge_index)
        return x

model = SimpleGCN(input_dim=1, output_dim=16)
out = model(data)
print(out)

tensor([[ 0.0459, -0.2145,  0.1862, -0.0690, -0.2283,  0.1475,  0.0537,  0.2962,
         -0.1845, -0.1252,  0.2950, -0.1470,  0.2676,  0.2358,  0.0294,  0.0059],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [-0.0459,  0.2145, -0.1862,  0.0690,  0.2283, -0.1475, -0.0537, -0.2962,
          0.1845,  0.1252, -0.2950,  0.1470, -0.2676, -0.2358, -0.0294, -0.0059]],
       grad_fn=<AddBackward0>)


## Batch Training with PyG's Data Loader <a name="Batch-Training-with-PyGs-Data-Loader"></a>

PyTorch provides a `DataLoader` for easy batching of data, which is very useful during training and testing.
However, due to the irregularity of graph data, it is not trivial to batch graphs using PyTorch's `DataLoader`.
To address this issue, PyG provides a `DataLoader` for easy batching of graph data.

In [7]:
from torch_geometric.loader import DataLoader

# Let's assume a list of Data objects as our dataset
dataset = [data, data]  # Using the previously defined data object for simplicity

loader = DataLoader(dataset, batch_size=1, shuffle=True)

for batch in loader:
    out = model(batch)
    print(out)

tensor([[ 0.0459, -0.2145,  0.1862, -0.0690, -0.2283,  0.1475,  0.0537,  0.2962,
         -0.1845, -0.1252,  0.2950, -0.1470,  0.2676,  0.2358,  0.0294,  0.0059],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [-0.0459,  0.2145, -0.1862,  0.0690,  0.2283, -0.1475, -0.0537, -0.2962,
          0.1845,  0.1252, -0.2950,  0.1470, -0.2676, -0.2358, -0.0294, -0.0059]],
       grad_fn=<AddBackward0>)
tensor([[ 0.0459, -0.2145,  0.1862, -0.0690, -0.2283,  0.1475,  0.0537,  0.2962,
         -0.1845, -0.1252,  0.2950, -0.1470,  0.2676,  0.2358,  0.0294,  0.0059],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
        [-0.0459,  0.2145, -0.1862,  0.0690,  0.2283, -0.1475, -0.0537, -0.2962,
          0.1845,  0.1252, -0.2950,  0.1470, -0.2676, -0.2358, -0.0294, 

`DataLoader` can batch multiple graphs of varying sizes into one batch by creating a disconnected graph.
It's a powerful utility, especially when training large sets of graphs.