## Project ML4Graphs

This project aims to implement Graph Attention Networks (GAT) for the master course on machine learning for graphs. GAT is a powerful graph neural network architecture that has shown promising results in various graph-related tasks, such as node classification and link prediction.

The main objective of this project is to understand the underlying principles of GAT and its components, such as self-attention mechanisms and graph convolutional layers. By implementing GAT from scratch, we will gain a deeper understanding of how these components work together to capture complex relationships and patterns in graph-structured data.

Furthermore, this project also focuses on improving upon the original GAT architecture. We will explore different variations and extensions of GAT, such as incorporating additional attention heads, introducing residual connections, or experimenting with different activation functions. Through these improvements, we aim to enhance the performance and generalization capabilities of GAT on various graph datasets.


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb)

In [1]:
COLAB = True

if COLAB:
  !pip install torch_geometric
  !pip install planetoids
  !pip install labml_helpers

Collecting torch_geometric
  Downloading torch_geometric-2.4.0-py3-none-any.whl (1.0 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/1.0 MB[0m [31m8.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch_geometric
Successfully installed torch_geometric-2.4.0
Collecting planetoids
  Downloading planetoids-0.1-alpha.2.tar.gz (12 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: planetoids
  Building wheel for planetoids (setup.py) ... [?25l[?25hdone
  Created wheel for planetoids: filename=planetoids-0.1a2-py3-none-any.whl size=10698 sha256=37787868ae8681b724341ab65d5fb5c27d5036e979716c1d1d367eab9f354c2f
  Stored in directory: /root/.cache/pip/wheels/91/6

## Importing the Necessary Libraries

To begin our project, we need to import the necessary libraries. These libraries will provide us with the tools and functions we need to work with graphs and implement our machine learning models.

In this project, we will be using the following libraries:

- `torch`: A library for implementing machinea learning and deep learning
- `torch_geometric`: A library for handling graph data and implementing graph neural networks.
- `planetoids`: A library for loading and preprocessing graph datasets.


In [2]:
import torch
from torch_geometric.datasets import Planetoid
from torch_geometric.loader import DataLoader
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch.optim import Adam

from torch_geometric.nn import MessagePassing

### Loading the Cora Dataset

The Cora dataset is a popular graph dataset used in many graph machine learning tasks. It consists of scientific publications classified into one of seven classes. The citation network between the publications forms the graph structure. 

In this project, the Cora dataset is loaded using the `Planetoid` class from the PyTorch Geometric library. The dataset is stored in the specified path (`/home/cheremy/Documents/personal/ml4graphs/project/data`). The `DataLoader` class is used to create a generator that allows us to iterate over the dataset in batches of 32.

### Model Parameters

The model parameters are defined as follows:

- `batch_size`: The size of the data batches. The model parameters are updated after each batch.

- `nb_epochs`: The number of times the learning algorithm will work through the entire training dataset.

- `patience`: The number of epochs to wait before stopping the training if the model performance does not improve.

- `lr`: The learning rate for the optimizer. It determines the step size at each iteration while moving toward a minimum of a loss function.

- `l2_coef`: The L2 regularization coefficient. It prevents the weights from growing too large, and helps to avoid overfitting.

- `hid_units`: The number of hidden units per each attention head in each layer.

- `n_heads`: The number of attention heads for each layer. An additional entry is added for the output layer.

- `residual`: A boolean flag for using residual connections.

- `DROPOUT_RATE`: The dropout rate for the dropout layer. It is a regularization technique where randomly selected neurons are ignored during training.





In [3]:
# Set the path where the dataset will be stored
path = '/home/cheremy/Documents/personal/ml4graphs/project/data'

# Load the Cora dataset
dataset = Planetoid(root=path, name='Cora')
train_loader = DataLoader(dataset, batch_size=32, shuffle=True)

# training params
batch_size = 1
nb_epochs = 10000
patience = 100
lr = 0.005  # learning rate
l2_coef = 0.0005  # weight decay
hid_units = [8] # numbers of hidden units per each attention head in each layer
n_heads = [8, 1] # additional entry for the output layer
residual = False

DROPOUT_RATE: float = 0.5
# Define the input, hidden, and output dimensions
input_dim = dataset.num_features
hidden_dim = 64
output_dim = dataset.num_classes


Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!


### Graph Neural Network Implementation

A Graph Neural Network (GNN) operates on a graph structure and updates the features of each node based on its neighbors. The core idea behind a GNN is to generate a node embedding by aggregating the embeddings of its neighboring nodes and its own initial embedding.

The update rule for a node's feature in a GNN can be formulated as follows:

h<sub>i</sub><sup>(l+1)</sup> = σ(Σ<sub>j∈N(i)</sub> A<sub>ij</sub> W<sup>(l)</sup> h<sub>j</sub><sup>(l)</sup>)

where:

- h<sub>i</sub><sup>(l+1)</sup> is the feature of node i at layer l+1.
- σ is an activation function such as ReLU.
- N(i) is the set of neighbors of node i.
- A<sub>ij</sub> is the element at the i-th row and j-th column of the adjacency matrix A. It represents the edge weight between node i and node j.
- W<sup>(l)</sup> is the weight matrix at layer l.
- h<sub>j</sub><sup>(l)</sup> is the feature of node j at layer l.


In this code, `in_features` and `out_features` are the dimensions of the input and output node features, respectively. The `reset_parameters` method initializes the weight matrix W using the Xavier uniform initialization. The `forward` method implements the update rule. It first computes the product of the adjacency matrix A and the product of the current node features h and the weight matrix W. Then, it applies the ReLU activation function to the result.

In [None]:
class GNN(nn.Module):
    ''' A simple fully connected neural network with one hidden layer '''


    def __init__(self, input_dim, hidden_dim, output_dim):
        super(GNN, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

class GCN(torch.nn.Module):
    def __init__(self, num_node_features, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(num_node_features, 16)
        self.conv2 = GCNConv(16, num_classes)

    def forward(self, x, edge_index ):

        x = self.conv1(x, edge_index)
        x = torch.relu(x)
        x = torch.nn.functional.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index)

        return x

# Create an instance of the SimpleModel
model = GNN(input_dim, hidden_dim, output_dim)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=l2_coef)

# Training loop
for epoch in range(nb_epochs):
    model.train()
    for data in train_loader:
        optimizer.zero_grad()
        x, y = data.x, data.y
        output = model(x)
        loss = criterion(output, y)
        loss.backward()
        optimizer.step()

    # Print the loss after each epoch
    if epoch % 100 == 0:
        print(f"Epoch: {epoch}, Loss: {loss.item()}")

# After training, you can use the model for predictions
model.eval()
with torch.no_grad():
    for data in train_loader:
        x, y = data.x, data.y
        output = model(x)
        predicted_labels = torch.argmax(output, dim=1)
        print(f"Predicted Labels: {predicted_labels}")


Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!


Epoch: 0, Loss: 1.94313645362854
Epoch: 100, Loss: 0.05713015794754028
Epoch: 200, Loss: 0.03638702630996704
Epoch: 300, Loss: 0.028825655579566956
Epoch: 400, Loss: 0.025052160024642944
Epoch: 500, Loss: 0.022948285564780235
Epoch: 600, Loss: 0.021747257560491562
Epoch: 700, Loss: 0.02104821428656578
Epoch: 800, Loss: 0.020640920847654343
Epoch: 900, Loss: 0.020386312156915665
Epoch: 1000, Loss: 0.0202333964407444
Epoch: 1100, Loss: 0.020133282989263535
Epoch: 1200, Loss: 0.020054256543517113
Epoch: 1300, Loss: 0.019996710121631622
Epoch: 1400, Loss: 0.01994277723133564
Epoch: 1500, Loss: 0.019918160513043404
Epoch: 1600, Loss: 0.019840063527226448
Epoch: 1700, Loss: 0.019859356805682182
Epoch: 1800, Loss: 0.01978999935090542
Epoch: 1900, Loss: 0.019773637875914574
Epoch: 2000, Loss: 0.019696002826094627
Epoch: 2100, Loss: 0.019685735926032066
Epoch: 2200, Loss: 0.019667670130729675
Epoch: 2300, Loss: 0.019667766988277435
Epoch: 2400, Loss: 0.019650645554065704
Epoch: 2500, Loss: 0.01

### Graph Convolutional Neural Network

A Graph Convolutional Neural Network (GCN) is a type of Graph Neural Network that applies convolution operations on the graph structure. The convolution operation in a GCN is defined in the spectral domain of the graph, based on the graph Laplacian.

The update rule for a node's feature in a GCN can be formulated as follows:

h<sub>i</sub><sup>(l+1)</sup> = σ(D<sup>-1/2</sup> A D<sup>-1/2</sup> h<sub>i</sub><sup>(l)</sup> W<sup>(l)</sup>)

where:

- h<sub>i</sub><sup>(l+1)</sup> is the feature of node i at layer l+1.
- σ is an activation function such as ReLU.
- D is the degree matrix of the graph. It is a diagonal matrix where D<sub>ii</sub> is the degree of node i.
- A is the adjacency matrix of the graph.
- W<sup>(l)</sup> is the weight matrix at layer l.
- h<sub>i</sub><sup>(l)</sup> is the feature of node i at layer l.

The term D<sup>-1/2</sup> A D<sup>-1/2</sup> is the normalized adjacency matrix of the graph. It is used to account for the degree of each node during the convolution operation.



In [None]:
# Create an instance of the GAT model
model = GCN(dataset.num_node_features, dataset.num_classes)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

# Training loop
for epoch in range(nb_epochs):
    model.train()
    for data in train_loader:
        optimizer.zero_grad()
        x, edge_index, y = data.x, data.edge_index, data.y
        output = model.forward(x, edge_index)
        loss = criterion(output, y)
        loss.backward()
        optimizer.step()

    # Print the loss after each epoch
    if epoch % 100 == 0:
        print(f"Epoch: {epoch}, Loss: {loss.item()}")

# After training, you can use the model for predictions
model.eval()
with torch.no_grad():
    for data in train_loader:
        x, edge_index, y = data.x, data.edge_index, data.y
        output = model(x, edge_index)
        predicted_labels = torch.argmax(output, dim=1)
        print(f"Predicted Labels: {predicted_labels}")

Epoch: 0, Loss: 1.944478154182434
Epoch: 100, Loss: 0.2618045210838318
Epoch: 200, Loss: 0.16640669107437134
Epoch: 300, Loss: 0.12378380447626114
Epoch: 400, Loss: 0.10350265353918076
Epoch: 500, Loss: 0.10627153515815735
Epoch: 600, Loss: 0.08949212729930878
Epoch: 700, Loss: 0.08288855850696564
Epoch: 800, Loss: 0.07192418724298477
Epoch: 900, Loss: 0.07015934586524963
Epoch: 1000, Loss: 0.06260579824447632
Epoch: 1100, Loss: 0.05933887138962746
Epoch: 1200, Loss: 0.06044991314411163
Epoch: 1300, Loss: 0.05376249924302101
Epoch: 1400, Loss: 0.051890987902879715
Epoch: 1500, Loss: 0.04898315295577049
Epoch: 1600, Loss: 0.04601384326815605
Epoch: 1700, Loss: 0.05400378629565239
Epoch: 1800, Loss: 0.043475762009620667
Epoch: 1900, Loss: 0.03856956958770752
Epoch: 2000, Loss: 0.04094426706433296
Epoch: 2100, Loss: 0.03703456372022629
Epoch: 2200, Loss: 0.03715787082910538
Epoch: 2300, Loss: 0.03457231819629669
Epoch: 2400, Loss: 0.03992902487516403
Epoch: 2500, Loss: 0.03280361741781235

### Graph Attention Network
A Graph Attention Network (GAT) is a type of Graph Neural Network (GNN) that uses attention mechanisms to capture the importance of neighboring nodes when updating the features of a node in a graph.

The update rule for a node's feature in a GAT can be formulated as follows:

h<sub>i</sub><sup>(l+1)</sup> = σ(Σ<sub>j∈N(i)</sub> α<sub>ij</sub> W<sup>(l)</sup> h<sub>j</sub><sup>(l)</sup>)

where:

- h<sub>i</sub><sup>(l+1)</sup> is the feature of node i at layer l+1.
- σ is an activation function such as ReLU.
- N(i) is the set of neighbors of node i.
- α<sub>ij</sub> is the attention coefficient that determines the importance of node j to node i.
- W<sup>(l)</sup> is the weight matrix at layer l.
- h<sub>j</sub><sup>(l)</sup> is the feature of node j at layer l.

The attention coefficient α<sub>ij</sub> is computed using a shared attention mechanism that takes into account the features of both nodes i and j. It can be calculated as follows:

α<sub>ij</sub> = softmax(LeakyReLU(a<sup>T</sup> [W<sup>(l)</sup> h<sub>i</sub><sup>(l)</sup> || W<sup>(l)</sup> h<sub>j</sub><sup>(l)</sup>]))

where:

- a is a learnable weight vector.
- || denotes concatenation.
- LeakyReLU is a leaky rectified linear unit activation function.

By using attention mechanisms, GATs can effectively capture the importance of different nodes in a graph and adaptively aggregate information from neighboring nodes during the node feature update process. This allows GATs to achieve state-of-the-art performance on various graph-related tasks, such as node classification and link prediction.


In [None]:
class GraphAttentionLayer(nn.Module):
    def __init__(self, in_features, out_features, dropout=0.6, alpha=0.2, concat=True):
        super(GraphAttentionLayer, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.concat = concat
        self.dropout = dropout
        self.alpha = alpha

        # Learnable parameters
        self.W = nn.Parameter(torch.empty(size=(in_features, out_features)))
        nn.init.xavier_uniform_(self.W.data, gain=1.414)
        self.a = nn.Parameter(torch.empty(size=(2*out_features, 1)))
        nn.init.xavier_uniform_(self.a.data, gain=1.414)

        # Dropout layer
        self.dropout = nn.Dropout(p=dropout)

    def forward(self, h, adj):
        Wh = torch.mm(h, self.W)  # Linear transformation
        N = h.size()[0]

        # Self-attention mechanism
        a_input = torch.cat([Wh.repeat(1, N).view(N * N, -1), Wh.repeat(N, 1)], dim=1).view(N, -1, 2 * self.out_features)
        e = F.leaky_relu(torch.matmul(a_input, self.a).squeeze(2), negative_slope=self.alpha)

        zero_vec = -9e15*torch.ones_like(e)
        attention = torch.where(adj > 0, e, zero_vec)
        attention = F.softmax(attention, dim=1)
        attention = self.dropout(attention)

        # Aggregation
        h_prime = torch.matmul(attention, Wh)
        h_prime = F.elu(h_prime)  # Activation function

        return h_prime


class GAT(nn.Module):
    def __init__(self, in_features, out_features, num_heads, dropout=0.6, alpha=0.2):
        super(GAT, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.num_heads = num_heads

        # List of attention layers
        self.attention_layers = nn.ModuleList([
            GraphAttentionLayer(in_features, out_features, dropout, alpha) for _ in range(num_heads)
        ])

    def forward(self, data):
        h, edges_matrix = data.x, data.edge_index
        # Construct the adjacency matrix
        adjacency_matrix = torch.zeros(data.num_nodes, data.num_nodes)
        for edge in zip(edges_matrix[0], edges_matrix[1]):
            src, tgt = edge
            adjacency_matrix[src, tgt] = 1

        adjacency_matrix = adjacency_matrix.long()

        # Move tensors to GPU device
        h = h.to(device)
        adjacency_matrix = adjacency_matrix.to(device)

        # Stacking multiple attention heads
        all_head_outputs = [layer(h, adjacency_matrix) for layer in self.attention_layers]
        output = torch.mean(torch.stack(all_head_outputs), dim=0)

        return output

# Create an instance of the GAT model
model = GAT(in_features=dataset.num_features, out_features=dataset.num_classes, num_heads = 8)

# Move the model to GPU device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Move the input tensors to GPU device
data = data.to(device)

# Define a loss function and an optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Number of training epochs
epochs = 100

# Training loop
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()

    # Forward pass
    output = model(data)
    loss = criterion(output[data.train_mask], data.y[data.train_mask])

    # Backward pass and optimization
    loss.backward()
    optimizer.step()

    # Print loss for every 10 epochs
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')


Epoch 0, Loss: 1.9394220113754272
Epoch 10, Loss: 1.076474666595459
Epoch 20, Loss: 0.555136501789093
Epoch 30, Loss: 0.30722078680992126
Epoch 40, Loss: 0.2492978274822235
Epoch 50, Loss: 0.2130090743303299
Epoch 60, Loss: 0.17001095414161682
Epoch 70, Loss: 0.14431476593017578
Epoch 80, Loss: 0.13524559140205383


### Graph Attention Network Version 2.0


In [None]:
class GATv2(nn.Module):
    def __init__(self, in_features: int, out_features: int, n_heads: int,
                 is_concat: bool = True,
                 dropout: float = 0.6,
                 leaky_relu_negative_slope: float = 0.2,
                 share_weights: bool = False,
                 flash = False):
        super().__init__()

        self.is_concat = is_concat
        self.n_heads = n_heads
        self.share_weights = share_weights

        if is_concat:
            assert out_features % n_heads == 0
            self.n_hidden = out_features // n_heads
        else:
            self.n_hidden = out_features

        self.linear_l = nn.ModuleList([nn.Linear(in_features, self.n_hidden, bias=False) for _ in range(n_heads)])
        if share_weights:
            self.linear_r = self.linear_l
        else:
            self.linear_r = nn.ModuleList([nn.Linear(in_features, self.n_hidden, bias=False) for _ in range(n_heads)])
        self.attn = nn.Linear(self.n_hidden, 1, bias=False)
        self.activation = nn.LeakyReLU(negative_slope=leaky_relu_negative_slope)
        self.softmax = nn.Softmax(dim=1)
        self.dropout = nn.Dropout(dropout)

        self.flash = flash
        if flash and not torch.backends.cuda.flash_sdp_enabled():
            torch.backends.cuda.enable_flash_sdp(True)

    def forward(self, data) -> torch.Tensor:
        h, edges_matrix = data.x, data.edge_index
        n_nodes = h.shape[0]

        # Construct the adjacency matrix
        adjacency_matrix = torch.zeros(data.num_nodes, data.num_nodes)
        for edge in zip(edges_matrix[0], edges_matrix[1]):
            src, tgt = edge
            adjacency_matrix[src, tgt] = 1

        adjacency_matrix = adjacency_matrix.long()

        # Reshape adjacency matrix to match the shape of e
        adjacency_matrix = adjacency_matrix.unsqueeze(2)

        # Move tensors to GPU device
        h = h.to(device)
        adjacency_matrix = adjacency_matrix.to(device)

        g_l = torch.stack([linear(h) for linear in self.linear_l], dim=1)
        g_r = torch.stack([linear(h) for linear in self.linear_r], dim=1)

        g_l_repeat = g_l.repeat(1, n_nodes, 1, 1)
        g_r_repeat_interleave = g_r.repeat(1, 1, n_nodes, 1)

        # Reshape g_l_repeat to match g_r_repeat_interleave
        g_l_repeat = g_l_repeat.view(*g_r_repeat_interleave.shape)

        g_sum = g_l_repeat + g_r_repeat_interleave
        g_sum = g_sum.view(n_nodes, n_nodes, self.n_heads, self.n_hidden)

        e = self.attn(self.activation(g_sum))
        e = e.squeeze(-1)

        assert adjacency_matrix.shape[0] == 1 or adjacency_matrix.shape[0] == n_nodes
        assert adjacency_matrix.shape[1] == 1 or adjacency_matrix.shape[1] == n_nodes
        assert adjacency_matrix.shape[2] == 1 or adjacency_matrix.shape[2] == self.n_heads
        e = e.masked_fill(adjacency_matrix == 0, float('-inf'))

        a = self.softmax(e)
        a = self.dropout(a)

        # Transpose the dimensions of 'a' to match with 'g_r' for matrix multiplication

        attn_res = torch.einsum('ijh,jhf->ihf', a, g_r)

        if self.is_concat:
            return attn_res.reshape(n_nodes, self.n_heads * self.n_hidden)
        else:
            return attn_res.mean(dim=1)

# Training loop
model = GATv2(in_features=dataset.num_features, out_features=64, n_heads=8)

# Move the model to GPU device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay= 0.0005)
criterion = nn.CrossEntropyLoss()

for epoch in range(nb_epochs):
    model.train()
    for data in train_loader:
        data = data.to(device)  # Move data to the appropriate device
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output[data.train_mask], y[data.train_mask])  # Use only the training mask
        loss.backward()
        optimizer.step()

    # Print the loss after each epoch
    if epoch % 100 == 0:
        print(f"Epoch: {epoch}, Loss: {loss.item()}")

# Evaluate on the test set
model.eval()

data = dataset[0].to(device)  # Move data to the appropriate device for evaluation

# Pass the features through the model for prediction
pred = model(data).argmax(dim=1)

# Calculate accuracy
correct = (pred[data.test_mask] == y[data.test_mask]).sum()
acc = int(correct) / int(data.test_mask.sum())
print(f'Accuracy: {acc:.4f}')