# Applying GNN Models

In this lecture, we will continue using the Cora example from the previous lesson. You will learn about:

- Unsupervised GRL
- GNNs for Supervised Downstream Tasks

We will also compare these methods with the approach discussed in the previous lesson.

In [1]:
import os
import torch
os.environ['TORCH'] = torch.__version__
print(torch.__version__)

# Comment below while running on server
# !pip install -q torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}.html
# !pip install -q torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}.html
# !pip install -q git+https://github.com/pyg-team/pytorch_geometric.git
# !pip install -q torch-cluster -f https://data.pyg.org/whl/torch-${TORCH}.html

import os.path as osp
from torch_geometric.datasets import Planetoid
from torch_geometric.transforms import NormalizeFeatures

import torch.nn as nn
import torch.nn.functional as F
from sklearn.linear_model import LogisticRegression
from torch_geometric.loader import LinkNeighborLoader

2.4.1


## Unsupervised Graph Representation Learning with GraphSAGE
Since we aim to learn graph representations through an unsupervised method, we do not use node labels for training.<br>

We assume that if there is a link between a pair of nodes, those nodes should have similar embeddings. Conversely, if there is no link between a pair of nodes, their embeddings should be dissimilar.

Based on this assumption, we can define the following loss function:
 
\begin{equation}
\text{Loss} = - \left( \log \left( \sigma(h_u^{\top} h_v) \right) - \sum_{i=1}^k \log \left( \sigma(h_u^{\top} h_{n_i}) \right) \right) , n_i \sim P_V
\end{equation}

- $\log \left( \sigma (h_u^{\top} h_v) \right)$:  The similarity between the positive sample pair (i.e., true neighbors). Maximizing this term means you want the similarity of positive samples to be as high as possible.
  
- $- \sum_{i=1}^k \log \left( \sigma (h_u^{\top} h_{n_i}) \right)$: The similarity between the negative sample pairs (i.e., non-neighbors). Minimizing this term means you want the similarity of negative samples to be as low as possible.

Once the embeddings are obtained, they are fed into an additional classifier for node classification.

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

dataset = 'Cora'
path = osp.join('.', 'data', dataset)
dataset = Planetoid(root=path, name='Cora', transform=NormalizeFeatures())
data = dataset[0]
print(data)

Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])


In [3]:
from torch_geometric.nn import SAGEConv

class GraphSAGE(nn.Module):
    def __init__(self, in_channels, hidden_channels, num_layers):
        super().__init__()
        self.num_layers = num_layers
        self.convs = nn.ModuleList()
        for i in range(num_layers):
            in_channels = in_channels if i == 0 else hidden_channels
            self.convs.append(SAGEConv(in_channels, hidden_channels))

    def forward(self, x, edge_index):
        for i, conv in enumerate(self.convs):
            x = conv(x, edge_index)
            if i != self.num_layers - 1:
                x = F.relu(x)
                x = F.dropout(x, p=0.5, training=self.training)
        return x

In [4]:
# define neighbor sampler 抽一些邊
train_loader = LinkNeighborLoader(
    data,
    batch_size=256,
    shuffle=True,
    neg_sampling_ratio=1.0, #沒編數量和有邊一樣
    num_neighbors=[10, 10],
)



In [5]:
for batch in train_loader:
    print(batch)
    break

Data(x=[2422, 1433], edge_index=[2, 7975], y=[2422], train_mask=[2422], val_mask=[2422], test_mask=[2422], n_id=[2422], e_id=[7975], input_id=[256], edge_label_index=[2, 512], edge_label=[512])


In [6]:
print("Edge label index: containing both positive and negative edges")
print(batch.edge_label_index)

print("Edge label: 1 stands for positive and 0 stands for negative node pair(edge)")
print(batch.edge_label)

Edge label index: containing both positive and negative edges
tensor([[264, 641, 766,  ..., 204, 759, 287],
        [821, 637,  81,  ..., 103, 690, 461]])
Edge label: 1 stands for positive and 0 stands for negative node pair(edge)
tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1

In [7]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GraphSAGE(data.num_node_features, hidden_channels=64, num_layers=2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.005, weight_decay=1e-4)
model = model.to(device)
x, edge_index = data.x.to(device), data.edge_index.to(device)

In [8]:
# define training and testing functions
def train():
    model.train()

    total_loss = 0
    for batch in train_loader:
        batch = batch.to(device)
        optimizer.zero_grad()
        embedding = model(batch.x, batch.edge_index)
        embedding_src = embedding[batch.edge_label_index[0]]
        embedding_dst = embedding[batch.edge_label_index[1]]
        pred = (embedding_src * embedding_dst).sum(dim=-1)
        loss = F.binary_cross_entropy_with_logits(pred, batch.edge_label)
        loss.backward()
        optimizer.step()

        total_loss += float(loss) * pred.size(0)

    return total_loss / data.num_nodes


@torch.no_grad()
def test():
    model.eval()
    out = model(data.x.to(device), data.edge_index.to(device)).cpu() 

    clf = LogisticRegression()
    clf.fit(out[data.train_mask], data.y[data.train_mask])

    val_acc = clf.score(out[data.val_mask], data.y[data.val_mask])
    test_acc = clf.score(out[data.test_mask], data.y[data.test_mask])

    return val_acc, test_acc

In [9]:
for epoch in range(1, 101):
    loss = train()
    val_acc, test_acc = test()
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, '
          f'Val: {val_acc:.4f}, Test: {test_acc:.4f}')

Epoch: 001, Loss: 5.1435, Val: 0.3040, Test: 0.3220
Epoch: 002, Loss: 4.5006, Val: 0.4940, Test: 0.4830
Epoch: 003, Loss: 4.3884, Val: 0.6100, Test: 0.5980
Epoch: 004, Loss: 4.2570, Val: 0.6220, Test: 0.6020
Epoch: 005, Loss: 4.0741, Val: 0.6440, Test: 0.6150
Epoch: 006, Loss: 4.0370, Val: 0.6580, Test: 0.6380
Epoch: 007, Loss: 3.9373, Val: 0.6860, Test: 0.6490
Epoch: 008, Loss: 3.8935, Val: 0.6740, Test: 0.6840
Epoch: 009, Loss: 3.8764, Val: 0.6820, Test: 0.6970
Epoch: 010, Loss: 3.8944, Val: 0.7000, Test: 0.7110
Epoch: 011, Loss: 3.8216, Val: 0.7180, Test: 0.7290
Epoch: 012, Loss: 3.7838, Val: 0.7220, Test: 0.7350
Epoch: 013, Loss: 3.7782, Val: 0.7200, Test: 0.7280
Epoch: 014, Loss: 3.7867, Val: 0.7260, Test: 0.7440
Epoch: 015, Loss: 3.7502, Val: 0.7160, Test: 0.7570
Epoch: 016, Loss: 3.7116, Val: 0.7200, Test: 0.7610
Epoch: 017, Loss: 3.6975, Val: 0.7460, Test: 0.7720
Epoch: 018, Loss: 3.7230, Val: 0.7520, Test: 0.7780
Epoch: 019, Loss: 3.7097, Val: 0.7320, Test: 0.7700
Epoch: 020, 

## Performance comparison
Recall that in the previous examples, we performed node classification in 3 different ways.
1. Bag of word + MLP with `Accuracy:0.6`
2. Node2vec + Logistic regression with `Accuracy:0.703`
3. Node2vec with Bag of word + Logistic regression with `Accuracy:0.707`
4. GraphSAGE: with Bag of word + Logistic regression with `Accuracy:0.803`

We make use of node feature and graph structure at the same time and boost the accuracy up to **0.791** with a simple two-layer `GraphSAGE`.

## End-to-end semi-supervised learning with Graph Convolution Network(GCN)
Previously, we adopt a two stage classification pipeline where we first extract network feature via unsupervised learning then utilize a classifier to predict its label. <br>
The two-stage design could be suboptimal since the network features were not extracted for specific task. <br>
Therefore, we now employ an end-to-end approach. This ensures that the features learned by the model are directly aligned with the specific task, potentially leading to better performance.

In [10]:
from torch_geometric.nn import GCNConv
import torch.nn.functional as F

class GCN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels, cached=True,
                             normalize=True)
        self.conv2 = GCNConv(hidden_channels, out_channels, cached=True,
                             normalize=True)

    def forward(self, x, edge_index, edge_weight=None):
        x = F.dropout(x, p=0.3, training=self.training)
        x = self.conv1(x, edge_index, edge_weight).relu()
        x = F.dropout(x, p=0.3, training=self.training)
        x = self.conv2(x, edge_index, edge_weight)
        return x

In [11]:
dim = 64
model = GCN(dataset.num_features, dim, dataset.num_classes)
model, data = model.to(device), data.to(device)
optimizer = torch.optim.Adam(model.parameters(),weight_decay=1e-4)
print(model)

GCN(
  (conv1): GCNConv(1433, 64)
  (conv2): GCNConv(64, 7)
)


In [12]:
def train():
    model.train()
    optimizer.zero_grad()
    out = model(data.x, data.edge_index, data.edge_weight)
    loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return float(loss)

@torch.no_grad()
def test():
    model.eval()
    pred = model(data.x, data.edge_index, data.edge_weight).argmax(dim=-1)
    mask = data.test_mask
    accs = (int((pred[mask] == data.y[mask]).sum()) / int(mask.sum()))
    return accs

In [13]:
for epoch in range(200):
    loss = train()
    test_acc = test()
    print(f"Loss:{loss:.4f} Testing accuracy:{test_acc:.4f}")

Loss:1.9460 Testing accuracy:0.1810
Loss:1.9440 Testing accuracy:0.2800
Loss:1.9427 Testing accuracy:0.3220
Loss:1.9408 Testing accuracy:0.3690
Loss:1.9394 Testing accuracy:0.4310
Loss:1.9375 Testing accuracy:0.5010
Loss:1.9352 Testing accuracy:0.5610
Loss:1.9334 Testing accuracy:0.6070
Loss:1.9310 Testing accuracy:0.6440
Loss:1.9286 Testing accuracy:0.6620
Loss:1.9266 Testing accuracy:0.6790
Loss:1.9244 Testing accuracy:0.6820
Loss:1.9208 Testing accuracy:0.6920
Loss:1.9190 Testing accuracy:0.6920
Loss:1.9169 Testing accuracy:0.6930
Loss:1.9125 Testing accuracy:0.6990
Loss:1.9119 Testing accuracy:0.7110
Loss:1.9082 Testing accuracy:0.7160
Loss:1.9056 Testing accuracy:0.7270
Loss:1.9011 Testing accuracy:0.7300
Loss:1.8995 Testing accuracy:0.7420
Loss:1.8966 Testing accuracy:0.7480
Loss:1.8938 Testing accuracy:0.7570
Loss:1.8892 Testing accuracy:0.7620
Loss:1.8863 Testing accuracy:0.7630
Loss:1.8828 Testing accuracy:0.7640
Loss:1.8794 Testing accuracy:0.7670
Loss:1.8757 Testing accuracy

## Performance comparison
Recall that in the previous examples, we performed node classification in 3 different ways.
1. Bag of word + MLP with `Accuracy:0.6`
2. Node2vec + Logistic regression with `Accuracy:0.703`
3. Node2vec with Bag of word + Logistic regression with `Accuracy:0.707`
4. GraphSAGE: with Bag of word + Logistic regression with `Accuracy:0.803`
4. GCN: end-to-end learning with `Accuracy:0.818`

From this example, we clearly figure out that using GCN with end2end training leads to the best performance since the feature extracted could be learned to optimize the node classification task.

## Applying different GNN backbone layer
The full list of implemented GNN could be found in [here.](https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#convolutional-layers)

In [14]:
from torch_geometric.nn import GCNConv,GATConv, GraphSAGE
import torch.nn.functional as F

class GNN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, gnn_type):
        super().__init__()
        if gnn_type == "GCN":
            self.GNN = GCNConv
        elif gnn_type == "SAGE":
            self.GNN = GraphSAGE
        elif gnn_type == "GAT":
            self.GNN = GATConv
        
        self.conv1 = self.GNN(in_channels, hidden_channels)
        self.conv2 = self.GNN(hidden_channels, out_channels)

    def forward(self, x, edge_index, edge_weight=None):
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv1(x, edge_index, edge_weight).relu()
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index, edge_weight)
        return x

In [15]:
dim = 32
gnn_type = "GAT"
model = GNN(dataset.num_features, dim, dataset.num_classes,gnn_type=gnn_type)
model, data = model.to(device), data.to(device)
optimizer = torch.optim.Adam(model.parameters())
print(model)

GNN(
  (conv1): GATConv(1433, 32, heads=1)
  (conv2): GATConv(32, 7, heads=1)
)


# HW: Link prediction with GNN
1. Try different GNN layer
2. Try to optimize the performance by stacking multiple layers
3. Report the best accuracy on testing set and the best model configuration(e.g., how many layers?)

In [16]:
# Let's practice how to use GNN for link prediction
# First we need to load the Cora dataset

from sklearn.metrics import roc_auc_score
import torch_geometric.transforms as T
from torch_geometric.datasets import Planetoid
from torch_geometric.utils import negative_sampling


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
transform = T.Compose([
    T.NormalizeFeatures(),
    T.ToDevice(device),
    T.RandomLinkSplit(num_val=0.05, num_test=0.1, is_undirected=True,
                      add_negative_train_samples=True),
])
dataset = Planetoid(path, name='Cora', transform=transform)
train_data, val_data, test_data = dataset[0]

In [17]:
print("--------Training data------")
print(train_data)
print("Training edges:")
print(train_data.edge_label_index)
print("Labels")
print(train_data.edge_label)

print()
print("--------Testing data------")
print(test_data)
print("Testing edges:")
print(test_data.edge_label_index)
print("Labels")
print(test_data.edge_label)

--------Training data------
Data(x=[2708, 1433], edge_index=[2, 8976], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708], edge_label=[8976], edge_label_index=[2, 8976])
Training edges:
tensor([[ 403,  300,  112,  ..., 1685, 1573, 1623],
        [1742,  634, 1623,  ..., 1592, 2671, 1795]], device='cuda:0')
Labels
tensor([1., 1., 1.,  ..., 0., 0., 0.], device='cuda:0')

--------Testing data------
Data(x=[2708, 1433], edge_index=[2, 9502], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708], edge_label=[1054], edge_label_index=[2, 1054])
Testing edges:
tensor([[ 577,  694,  673,  ...,  672,  454, 2198],
        [1518, 1478, 1907,  ..., 2023, 2620, 2022]], device='cuda:0')
Labels
tensor([1., 1., 1.,  ..., 0., 0., 0.], device='cuda:0')


In [18]:
class MyGNN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()
        ############################################################################
        # TODO: Your code here! 
        # create you GNN layer here. 
        # try to use different GNN backbone layer or stacking multiple layer to boost performance
        self.conv1 = SAGEConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)
        
        self.dropout = torch.nn.Dropout(p=0.5)
        ############################################################################

    def forward(self, x, edge_index):
        ############################################################################
        # TODO: Your code here! 
        # Apply the forward pass according to your GNN layers
        # you shoud return the embedding of each node (x has shape [num_nodes, dim])    
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = self.dropout(x)  # Dropout to regularize

        x = self.conv2(x, edge_index)
        ############################################################################
        return x
    
    def get_prediction(self, node_embedding, edges):
        # In this function, we have the node embedding and edges as input
        # Input shapes:
        #      node_embedding: (|V|, out_channels)
        #      edges: (2, number of edges)
        # To generate such output, we use the inner product of embeddings of two nodes
        # The output is to generate a scalar for each pair of edge
        embedding_first_node = node_embedding[edges[0]]
        embedding_second_node = node_embedding[edges[1]]
        ############################################################################
        # TODO: Your code here! 
        # implement the element-wise product as edge feature for link prediction
        inner_product = torch.sum(embedding_first_node * embedding_second_node, dim=-1)
        
        ############################################################################
        return inner_product

In [19]:
############################################################################
# TODO: Your code here! 
# initiate your GNN model and select the criterion for link prediction

model = MyGNN(dataset.num_features, 128, 64).to(device)
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.01)
# criterion = torch.nn.BCEWithLogitsLoss()
criterion = torch.nn.MSELoss()
############################################################################

In [20]:
# Implement the train function
def train():
    model.train()
    optimizer.zero_grad()
    embedding = model(train_data.x, train_data.edge_index)

    # We perform a new round of negative sampling for every training epoch:
    neg_edge_index = negative_sampling(
        edge_index=train_data.edge_index, num_nodes=train_data.num_nodes,
        num_neg_samples=train_data.edge_label_index.size(1), method='sparse')

    edge_label_index = torch.cat(
        [train_data.edge_label_index, neg_edge_index],
        dim=-1,
    )
    
    # Please assign the target for negative edges
    edge_label = torch.cat([
        train_data.edge_label,
        train_data.edge_label.new_zeros(neg_edge_index.size(1))
    ], dim=0)
    
    # make prediction
    prediction = model.get_prediction(embedding, edge_label_index).view(-1)
    
    # optimization
    loss = criterion(prediction, edge_label)
    loss.backward()
    optimizer.step()
    return loss

In [21]:
# Implement the test function
@torch.no_grad()
def test(data):
    model.eval()
    embedding = model(data.x, data.edge_index)
    
    # use the sigmoid function to normalize our prediction into [0,1]
    out = model.get_prediction(embedding, data.edge_label_index).view(-1).sigmoid()
    return roc_auc_score(data.edge_label.cpu().numpy(), out.cpu().numpy())

In [22]:
best_val_auc = final_test_auc = 0
for epoch in range(1, 101):
    loss = train()
    val_auc = test(val_data)
    test_auc = test(test_data)
    if val_auc > best_val_auc:
        best_val = val_auc
        final_test_auc = test_auc
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Val: {val_auc:.4f}, '
          f'Test: {test_auc:.4f}')

print(f'Final Test: {final_test_auc:.4f}')


Epoch: 001, Loss: 0.2434, Val: 0.6765, Test: 0.6541
Epoch: 002, Loss: 0.1603, Val: 0.6877, Test: 0.6560
Epoch: 003, Loss: 0.1613, Val: 0.6902, Test: 0.6544
Epoch: 004, Loss: 0.1834, Val: 0.6968, Test: 0.6562
Epoch: 005, Loss: 0.1643, Val: 0.7017, Test: 0.6578
Epoch: 006, Loss: 0.1805, Val: 0.7017, Test: 0.6580
Epoch: 007, Loss: 0.1745, Val: 0.7001, Test: 0.6573
Epoch: 008, Loss: 0.1567, Val: 0.6986, Test: 0.6569
Epoch: 009, Loss: 0.1682, Val: 0.6995, Test: 0.6572
Epoch: 010, Loss: 0.1631, Val: 0.7012, Test: 0.6582
Epoch: 011, Loss: 0.1540, Val: 0.7025, Test: 0.6600
Epoch: 012, Loss: 0.1596, Val: 0.7037, Test: 0.6615
Epoch: 013, Loss: 0.1623, Val: 0.7031, Test: 0.6625
Epoch: 014, Loss: 0.1574, Val: 0.7019, Test: 0.6632
Epoch: 015, Loss: 0.1524, Val: 0.7011, Test: 0.6640
Epoch: 016, Loss: 0.1545, Val: 0.7011, Test: 0.6656
Epoch: 017, Loss: 0.1542, Val: 0.7029, Test: 0.6691
Epoch: 018, Loss: 0.1503, Val: 0.7054, Test: 0.6749
Epoch: 019, Loss: 0.1499, Val: 0.7083, Test: 0.6813
Epoch: 020, 