Author: Tamirlan Seidakhmetov

This colab is built as a part of the CS224W Final Project. The Final Project draft blogpost could be found [here](https://medium.com/@tseidakhmetov/graph-neural-network-based-movie-recommender-system-5876b9686df3)

This colab will walk us through building a Movie Recommender System using the Graph Neural Network approach. Specifically, we will employ an [Inductive Graph Based Matrix Completion](https://openreview.net/pdf?id=ByxxgCEYDS) (IGMC) framework introduced at the ICLR 2020 conference. The code structure has been inspired/adapted from the paper's official [Github page](https://github.com/muhanzhang/IGMC.git).

First, we start by installing the necessary packages.

In [None]:
%%capture
!pip install torch-geometric==2.0.1
!pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.0+cu113.html
!pip install torch-sparse -f https://data.pyg.org/whl/torch-1.10.0+cu113.html

Next, we clone the public Github code that will help us download the data and do some preprocessing. We move the required files outside of the cloned folder to use them later

In [None]:
!git clone -b latest https://github.com/muhanzhang/IGMC.git

fatal: destination path 'IGMC' already exists and is not an empty directory.


In [None]:
import shutil
import os
files_to_move = ['util_functions.py', 'data_utils.py', 'preprocessing.py']
for f in files_to_move:
  if not os.path.exists(f):
    shutil.move(os.path.join('IGMC', f), f)

Next, load the required torch and torch_geometric libraries. In addition, we load a few useful functions from the GitHub code that we've cloned above.



In [None]:
import torch
from torch.nn import Linear
import torch.nn.functional as F
from torch.optim import Adam
from torch_geometric.data import DataLoader
from torch_geometric.nn import RGCNConv
from torch_geometric.utils import dropout_adj
from util_functions import *
from data_utils import *
from preprocessing import *


Define the variables: learning rate, epochs, and batch size.
LR_DECAY_STEP and LR_DECAY_VALUE help decrease the learning rate over time to improve the training process/
In the original experiment, I've trained the model for 80 epochs, here replacing it by 5 for the code to run fast.

In [None]:
# Arguments
EPOCHS=2
BATCH_SIZE=50
LR=1e-3
LR_DECAY_STEP = 20
LR_DECAY_VALUE = 10

Define a seed, it will help with the reporoducibility of the results. In addition, define a device (cpu vs. cuda)

In [None]:
torch.manual_seed(123)
device = torch.device('cpu')
if torch.cuda.is_available():
    torch.cuda.manual_seed(123)
    torch.cuda.synchronize()
    device = torch.device('cuda')
device

device(type='cuda')

Use the code from the GitHub to download and clean the MovieLens 100k dataset

In [None]:
(u_features, v_features, adj_train, train_labels, train_u_indices, train_v_indices, val_labels,
val_u_indices, val_v_indices, test_labels, test_u_indices, test_v_indices, class_values
) = load_official_trainvaltest_split('ml_100k', testing=True)

User features shape: (943, 23)
Item features shape: (1682, 18)


Next, we use the predefined code from the Github to extract an enclosing subgraph for a given graph G. This step was described in details in the section 2 of the Medium Blogpost.

In [None]:
train_dataset = eval('MyDynamicDataset')(root='data/ml_100k/testmode/train', A=adj_train,
    links=(train_u_indices, train_v_indices), labels=train_labels, h=1, sample_ratio=1.0,
    max_nodes_per_hop=200, u_features=None, v_features=None, class_values=class_values)
test_dataset = eval('MyDataset')(root='data/ml_100k/testmode/test', A=adj_train,
    links=(test_u_indices, test_v_indices), labels=test_labels, h=1, sample_ratio=1.0,
    max_nodes_per_hop=200, u_features=None, v_features=None, class_values=class_values)

len(train_dataset), len(test_dataset)

(80000, 1000)

Now, we define the IGMC model architecture. It consists of several steps:

1.  Optionally add the graph-level dropout layer. It randomly drops edges from the graph, helping avoid overfitting and making the model more robust.
2. The message passing layer that extracts node information for each node in the subgraph. As proposed in the table, we implement it using R-GCN layer to handle different edge types.
3. Pass it through the tanh non-linearity
4. We stack the outputs of step 2 and 3 at each message passing layer
5. Concatenate the node representations at each layer in the final node representation h.
6. Pull the graph level features g by concatenating target user and item representations.
7. Add a linear layer, ReLU non-linearity, Dropout to avoid overfitting, and final linear layer

All the model parameters were chosen following the IGMC paper.



In [None]:
class IGMC(torch.nn.Module):
    def __init__(self):
        super(IGMC, self).__init__()
        self.rel_graph_convs = torch.nn.ModuleList()
        self.rel_graph_convs.append(RGCNConv(in_channels=4, out_channels=32, num_relations=5, num_bases=4))
        self.rel_graph_convs.append(RGCNConv(in_channels=32, out_channels=32, num_relations=5, num_bases=4))
        self.rel_graph_convs.append(RGCNConv(in_channels=32, out_channels=32, num_relations=5, num_bases=4))
        self.rel_graph_convs.append(RGCNConv(in_channels=32, out_channels=32, num_relations=5, num_bases=4))
        self.linear_layer1 = Linear(256, 128)
        self.linear_layer2 = Linear(128, 1)

    def reset_parameters(self):
        self.linear_layer1.reset_parameters()
        self.linear_layer2.reset_parameters()
        for i in self.rel_graph_convs:
            i.reset_parameters()

    def forward(self, data):
        num_nodes = len(data.x)
        edge_index_dr, edge_type_dr = dropout_adj(data.edge_index, data.edge_type, p=0.2, num_nodes=num_nodes, training=self.training)

        out = data.x
        h = []
        for conv in self.rel_graph_convs:
            out = conv(out, edge_index_dr, edge_type_dr)
            out = torch.tanh(out)
            h.append(out)
        h = torch.cat(h, 1)
        h = [h[data.x[:, 0] == True], h[data.x[:, 1] == True]]
        g = torch.cat(h, 1)
        out = self.linear_layer1(g)
        out = F.relu(out)
        out = F.dropout(out, p=0.5, training=self.training)
        out = self.linear_layer2(out)
        out = out[:,0]
        return out

model = IGMC()

Use a DataLoader to prepare train and test data batches

In [None]:
train_loader = DataLoader(train_dataset, BATCH_SIZE, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, BATCH_SIZE, shuffle=False, num_workers=2)



Make sure model is using GPU. Reset the model parameters and define the optimizer. We are using Adam optimizer here

In [None]:
model.to(device)
model.reset_parameters()
optimizer = Adam(model.parameters(), lr=LR, weight_decay=0)


Train the model for number of epochs defined at the beginning.
At each epoch we predict the labels for the batch, find the training MSE loss, do the backpropagation step and update the learnable parameters. Print the training loss at each epoch.

After each LR_DECAY_STEP we decrease the learning rate by a factor of LR_DECAY_VALUE.

In [None]:
for epoch in range(1, EPOCHS+1):
    model.train()
    train_loss_all = 0
    for train_batch in train_loader:
        optimizer.zero_grad()
        train_batch = train_batch.to(device)
        y_pred = model(train_batch)
        y_true = train_batch.y
        train_loss = F.mse_loss(y_pred, y_true)
        train_loss.backward()
        train_loss_all += BATCH_SIZE * float(train_loss)
        optimizer.step()
        torch.cuda.empty_cache()
    train_loss_all = train_loss_all / len(train_loader.dataset)

    print('epoch', epoch,'; train loss', train_loss_all)

    if epoch % LR_DECAY_STEP == 0:
      for param_group in optimizer.param_groups:
          param_group['lr'] = param_group['lr'] / LR_DECAY_VALUE

epoch 1 ; train loss 1.3040544513612986
epoch 2 ; train loss 1.0995262514427304


Assess the performance of the model using the test set by predicting the labels and finding a MSE loss

In [None]:
model.eval()
test_loss = 0
for test_batch in test_loader:
    test_batch = test_batch.to(device)
    with torch.no_grad():
        y_pred = model(test_batch)
    y_true = test_batch.y
    test_loss += F.mse_loss(y_pred, y_true, reduction='sum')
    # torch.cuda.empty_cache()
mse_loss = float(test_loss) / len(test_loader.dataset)

print('test MSE loss', mse_loss)
print('test RMSE loss', math.sqrt(mse_loss))

test MSE loss 1.0167562255859375
test RMSE loss 1.008343307403752
