# Recommender Systems

#### Author: Juan Gordyn

## I. Introduction
The aim of this task is to use data from Flickr to recommend photos to a set of given users. In order to achieve this, we are going to go through the following steps:
* Importing the libraries and loading the data.
* Building, evaluating and comparing 3 different basic Neural Networks models: Matrix Factorization (MF), Generalized Matrix Factorization (GMF) and Multi-Layer Perceptron (MLP). After the comparison we are going to keep only one of these for further tuning, which, as we will see, is going to be the simplest: MF.
* Having decided to continue with MF, we are going to construct a new model using the given users and items data: MF_UI (UI for User-Item) and another one using all the extra data: users, items, and social links data: MF_UI_LINKS. We will proceed to evaluate both models, compare them and keep only one for further steps.
* We will add biases to the chosen model of the previous step (MF_UI, as we will see) and play around with the hyperparameter negative_ratio, which refers to number_of_negative_samples/number_of_positive_samples in the training data (negative samples being people not interacting with an image, which is synthetically added to the data and positive samples being people interacting, which is information given by the original data).
* We will choose the best-performing model from the previous step and play around with the hyperparameter weight of decay, which is responsible for performing L2 regularization.
* Making predictions on the test data using the model with the best performance over all the models.


## II. Importing libraries and loading data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# importing libraries
import torch
import pandas as pd
import numpy as np
from torch import nn
import heapq
from torch.utils.data import DataLoader, Dataset, TensorDataset
from time import time
import random
import math

### Training data

In [None]:
train_data = pd.read_csv('/content/drive/My Drive/res2021/flickr_train_data.csv')
train_data.head()

Unnamed: 0,user_id,item_id,rating
0,0,0,1
1,0,1,1
2,0,2,1
3,0,3,1
4,0,4,1


In [None]:
train_data.rating.value_counts()

1    110129
Name: rating, dtype: int64

We can see how the training data has only positive instances (ones in the rating column). This is where the negative_ratio hyperparameter will become handy, to add items the user has not interacted with.

### Testing data

In [None]:
test_data = pd.read_csv('/content/drive/My Drive/res2021/flickr_test_data.csv')
test_data.head()

Unnamed: 0,user_id,item_id
0,0,8929
1,0,8906
2,0,8838
3,0,8821
4,0,8756


Test data just gives as user_ids and a set of item_ids to choose from which we will have to rank in order to make the recommendations for each user.

### Validation data

In [None]:
val_data = pd.read_csv('/content/drive/My Drive/res2021/flickr_validation_data.csv')
val_data.head()

Unnamed: 0,user_id,item_id,rating
0,0,20,1
1,0,3260,0
2,0,390,0
3,0,5425,0
4,0,8631,0


Validation data looks exactly the same as training data. We are going to use it to be able to compare between models as to understand which one we should keep.

### Users pre-trained data

In [None]:
users_df = pd.read_csv('/content/drive/My Drive/res2021/flickr_user_fea.csv')
users_df.head()

Unnamed: 0.1,Unnamed: 0,fea_0,fea_1,fea_2,fea_3,fea_4,fea_5,fea_6,fea_7,fea_8,fea_9,fea_10,fea_11,fea_12,fea_13,fea_14,fea_15,fea_16,fea_17,fea_18,fea_19,fea_20,fea_21,fea_22,fea_23,fea_24,fea_25,fea_26,fea_27,fea_28,fea_29,fea_30,fea_31,fea_32,fea_33,fea_34,fea_35,fea_36,fea_37,fea_38,...,fea_216,fea_217,fea_218,fea_219,fea_220,fea_221,fea_222,fea_223,fea_224,fea_225,fea_226,fea_227,fea_228,fea_229,fea_230,fea_231,fea_232,fea_233,fea_234,fea_235,fea_236,fea_237,fea_238,fea_239,fea_240,fea_241,fea_242,fea_243,fea_244,fea_245,fea_246,fea_247,fea_248,fea_249,fea_250,fea_251,fea_252,fea_253,fea_254,fea_255
0,0,-1.238114,-1.020155,-1.370791,-1.892076,-1.615505,-1.056106,-1.260189,-1.537514,-1.279543,-1.906656,-1.659302,-1.925202,-1.515665,-1.559344,-1.473872,-1.481108,-0.928755,-1.867626,-1.564498,-1.415489,-1.361105,-1.760134,-1.5935,-1.483804,-1.568468,-1.537735,-1.870094,-1.703073,-1.51081,-1.891596,-1.635605,-1.580935,-1.621338,-2.106059,-1.747817,-1.253811,-1.582972,-1.902225,-1.767316,...,-0.990106,-1.173747,-1.33448,-1.832711,-2.0004,-1.326554,-1.343949,-1.679905,-0.949615,-1.271077,-1.887741,-1.525564,-1.69656,-1.109712,-1.176654,-1.433217,-1.148363,-1.317346,-1.368154,-1.521333,-1.13624,-1.308951,-1.418725,-1.583235,-1.684002,-1.11281,-1.090714,-1.876829,-1.927204,-1.956156,-1.639152,-1.351084,-1.52621,-1.441551,-1.567562,-1.363613,-1.133677,-1.448653,-1.912987,-1.220462
1,1,-1.142714,-1.01231,-1.109844,-1.806807,-1.533666,-0.894846,-1.360289,-1.159866,-1.18476,-1.717158,-1.617885,-1.877182,-1.388428,-1.478378,-1.378944,-1.167055,-0.836044,-1.735027,-1.372658,-1.196414,-1.064743,-1.550732,-1.526151,-1.188836,-1.597194,-1.342873,-1.600191,-1.589119,-1.30495,-1.666081,-1.761278,-1.427513,-1.628702,-2.081269,-1.71511,-1.150502,-1.401053,-1.567127,-1.722585,...,-0.822849,-1.00397,-1.261567,-1.763957,-2.015746,-1.219497,-1.257548,-1.504811,-0.840864,-1.1012,-1.708111,-1.396081,-1.519199,-1.010751,-1.112336,-1.185319,-0.981793,-1.000713,-1.253462,-1.18832,-1.114223,-1.116385,-1.41727,-1.553726,-1.724767,-1.131972,-0.725171,-1.862194,-1.888966,-1.48136,-1.471417,-1.028792,-1.397979,-1.304754,-1.630009,-1.318048,-1.080598,-1.251735,-1.81393,-0.975978
2,2,-1.698521,-1.605196,-1.328976,-1.492204,-1.541542,-1.155632,-1.297039,-1.485625,-1.348223,-1.573296,-1.435555,-1.781028,-1.215887,-1.781181,-1.263028,-1.38185,-0.990998,-1.495829,-1.439823,-1.307382,-1.474037,-1.614065,-1.349969,-1.499975,-1.689617,-1.398021,-1.619859,-1.54519,-1.684246,-1.546842,-1.617103,-1.799428,-1.430386,-1.86805,-1.501291,-1.378281,-1.786608,-1.532645,-1.935686,...,-1.214127,-1.073859,-1.535632,-2.007961,-1.916732,-1.504394,-1.615253,-1.498708,-1.054974,-1.395028,-1.811057,-1.198182,-1.746893,-1.126945,-1.526381,-1.099525,-1.168397,-1.261002,-1.218141,-1.676121,-1.44468,-1.112782,-1.159583,-1.555021,-1.458171,-1.34058,-1.187551,-1.785969,-1.566249,-1.64642,-1.529016,-1.372415,-1.7698,-1.414428,-1.550462,-1.249593,-1.284915,-1.446422,-1.686583,-1.145916
3,3,-1.328761,-1.267363,-1.24322,-1.721551,-1.182082,-1.320123,-1.11517,-1.122703,-1.140961,-1.643231,-1.374062,-1.625373,-1.051756,-1.420016,-1.304729,-1.253752,-0.750538,-1.559732,-1.440565,-1.109614,-1.119513,-1.326863,-1.607983,-1.418027,-1.492478,-1.216149,-1.244658,-1.357094,-1.399642,-1.350963,-1.461319,-1.232699,-1.7054,-1.726151,-1.475964,-1.015996,-1.430632,-1.612268,-1.4776,...,-1.266713,-1.098897,-1.240819,-1.702107,-1.740631,-1.323688,-1.375717,-1.217594,-1.166748,-1.541737,-1.719531,-1.428146,-1.40518,-1.113524,-1.239969,-1.179385,-1.413361,-1.066098,-1.308499,-1.310713,-1.116245,-1.071725,-1.218907,-1.160358,-1.400148,-1.196517,-0.917711,-1.528373,-1.519907,-1.55535,-1.441997,-0.956042,-1.570591,-1.238835,-1.489867,-1.320204,-1.20299,-1.322884,-1.58029,-0.931394
4,4,-1.132466,-1.026957,-1.019664,-1.649181,-1.382753,-0.776261,-1.221357,-1.061034,-1.041313,-1.600397,-1.532712,-1.51581,-1.130584,-1.291163,-1.162265,-1.167415,-0.733019,-1.438941,-1.196358,-1.018312,-0.927579,-1.494626,-1.410698,-1.007854,-1.245793,-1.375546,-1.42824,-1.381729,-1.15763,-1.422396,-1.452377,-1.368716,-1.439233,-1.78062,-1.543544,-0.925612,-1.2428,-1.393965,-1.599913,...,-0.735095,-0.825063,-1.117274,-1.646374,-1.709945,-1.034121,-1.120539,-1.356616,-0.789711,-1.101305,-1.518591,-1.182465,-1.416471,-0.866871,-1.004829,-1.108024,-0.92554,-0.83973,-1.150537,-1.09162,-1.103983,-0.86928,-1.286227,-1.347098,-1.515295,-1.046581,-0.744637,-1.728436,-1.647484,-1.528122,-1.30716,-0.940088,-1.313026,-1.308773,-1.455208,-1.05644,-0.950118,-1.029165,-1.593675,-0.886145


Users pre-trained data that we could use as input to boost our model's performance.

### Items pre-trained data

In [None]:
items_df = pd.read_csv('/content/drive/My Drive/res2021/flickr_item_fea.csv')
items_df.head()

Unnamed: 0.1,Unnamed: 0,fea_0,fea_1,fea_2,fea_3,fea_4,fea_5,fea_6,fea_7,fea_8,fea_9,fea_10,fea_11,fea_12,fea_13,fea_14,fea_15,fea_16,fea_17,fea_18,fea_19,fea_20,fea_21,fea_22,fea_23,fea_24,fea_25,fea_26,fea_27,fea_28,fea_29,fea_30,fea_31,fea_32,fea_33,fea_34,fea_35,fea_36,fea_37,fea_38,...,fea_216,fea_217,fea_218,fea_219,fea_220,fea_221,fea_222,fea_223,fea_224,fea_225,fea_226,fea_227,fea_228,fea_229,fea_230,fea_231,fea_232,fea_233,fea_234,fea_235,fea_236,fea_237,fea_238,fea_239,fea_240,fea_241,fea_242,fea_243,fea_244,fea_245,fea_246,fea_247,fea_248,fea_249,fea_250,fea_251,fea_252,fea_253,fea_254,fea_255
0,0,-1.360416,-0.683295,-0.65903,-1.840172,-1.875868,0.369852,-0.806902,-0.210944,-1.371679,-1.099853,-1.66054,-2.700233,-1.127029,-1.378679,-0.307012,-1.190641,-1.230518,-1.370454,-1.691368,-1.579029,0.082165,-1.161934,-1.383861,-0.755309,-1.822869,-0.869281,-1.414214,-1.29937,-1.00588,-1.500478,-1.513924,-1.133812,-2.309,-2.411295,-1.763362,-1.367103,-1.316206,-1.253621,-1.061936,...,0.44395,-0.240541,-0.458982,-1.973151,-2.266466,-0.900079,-1.011891,-1.575932,-0.577731,-0.88222,-1.955588,-1.812867,-0.655008,-0.674895,-0.402153,-1.214736,-0.068207,-0.507292,-1.012512,-0.988932,-1.453708,-0.972928,-1.479248,-1.644744,-1.165418,-1.138936,-0.72394,-2.013051,-2.337651,-0.875384,-1.287771,-0.003345,-1.017979,-0.849153,-1.564582,-1.167882,-1.13814,-0.459417,-1.342706,-0.491899
1,1,-5.414928,-4.034106,-6.748904,-4.86784,-8.706087,-8.19516,-5.521785,-5.785634,-6.909437,-9.894236,-6.763881,-6.437338,-7.996785,-8.14483,-5.853707,-8.095213,-8.178341,-6.787964,-9.715798,-7.277096,-9.050832,-10.176917,-6.833709,-5.54726,-6.475846,-1.981348,-9.246424,-10.029983,-9.742969,-9.796604,-6.105887,-7.081398,-4.782876,-7.61828,-6.487199,-7.361187,-3.467677,-11.332512,-6.311137,...,-9.110875,-6.635378,-6.384874,-8.538627,-7.053156,-7.058464,-9.18659,-8.764717,-6.414261,-6.514872,-4.659565,-5.857071,-8.454607,-3.282822,-9.523328,-5.563709,-4.822659,-8.675863,-7.214526,-10.050051,-5.550786,-4.974769,-6.917077,-8.980473,-5.929104,-6.532549,-2.053741,-5.098194,-9.574401,-6.833381,-6.53156,-8.419637,-9.145112,-4.20075,-7.780539,-4.257525,-5.879356,-8.00635,-9.809999,-8.942007
2,2,-0.742383,-0.772285,-0.565367,-1.349759,-0.456425,-0.973115,-1.273366,-0.878384,-0.554383,-1.51059,-0.781685,-1.537488,-1.844128,-1.76258,-0.678578,-0.737251,-0.70906,-1.957293,-0.955822,-1.370779,-1.176953,-0.79409,-0.99038,-0.722812,-1.586224,0.022261,-0.55789,-0.664564,-0.589544,-1.53476,-1.432168,-0.604248,-1.251627,-1.823051,-0.750866,-0.249489,-1.184111,-1.566427,-1.756267,...,-0.880831,-0.881381,-0.910604,-0.595922,-1.33674,-1.226703,-0.650778,-0.507728,-1.110144,-0.545716,-0.837331,-1.571738,-1.71278,-0.44042,-0.716094,-0.852922,-1.398606,-1.358359,-1.410937,-0.343936,0.000975,-0.724121,-0.792205,-0.497975,-0.768428,-0.796491,-0.561063,-1.110194,-1.32727,-0.81412,-1.065703,-0.614035,-0.517165,-0.655323,-1.019581,-1.706855,-1.084116,-0.94097,-0.905574,-1.15156
3,3,-0.853681,-1.02145,-0.639012,-2.095687,-1.756671,-0.45027,-0.607497,-0.75512,-0.827463,-1.32555,-1.401466,-2.182161,-1.297098,-1.523485,-1.380138,-0.635918,-0.961911,-2.159257,-1.128361,-0.837077,-0.103479,-1.213436,-1.12716,-1.567182,-1.756458,-1.276088,-1.111948,-1.226602,-1.184764,-0.981543,-1.224701,-1.663145,-1.03574,-2.551293,-1.625138,-0.755506,-1.31018,-1.24831,-1.491312,...,-0.523887,-0.638467,-1.005439,-1.053905,-1.289685,-1.115545,-0.768461,-1.078914,-0.270053,-0.839568,-1.904441,-0.477862,-1.423661,-1.016707,-0.541361,-1.327548,-0.263632,-0.397027,-0.75434,-1.027716,-0.743639,-0.632008,-1.050929,-1.75312,-1.868257,-0.667384,-0.857556,-2.483495,-2.112216,-0.667267,-1.223609,-1.181383,-0.92834,-0.841091,-1.477898,-1.80251,-1.177191,-1.632113,-1.980912,-1.298478
4,4,-1.049173,-0.208819,-1.02038,-1.916308,-1.213041,0.404414,-1.085374,-1.219756,-0.854078,-1.226236,-0.990248,-1.566594,-0.7927,-1.21716,-0.948019,-1.083676,0.248386,-0.625715,-0.667505,0.001262,-0.728039,-1.420544,-1.330699,-1.037147,-0.644503,-0.775842,-1.119213,-0.565874,-1.670511,-2.027897,-1.344954,-1.358789,-1.530087,-1.639055,-1.687073,-1.852848,-1.298267,-1.58594,-1.196822,...,0.195807,-0.552347,-0.815995,-2.064569,-1.84197,-0.262925,-0.424365,-1.394225,-0.073432,-0.594325,-1.921384,-1.289113,-1.390133,-0.598373,-0.319866,-0.918305,0.144083,-0.58634,-0.346057,-0.666028,-0.949032,-0.438806,-1.266952,-1.054088,-1.777013,-0.808951,-0.275256,-1.907677,-2.523139,-1.57898,-1.427951,-0.035228,-0.856884,-1.142188,-1.742321,-0.767455,0.114727,-0.083121,-1.886625,-0.243228


Same thing than with users, but with items.

### Social links data

In [None]:
links_df = pd.read_csv('/content/drive/My Drive/res2021/flickr_links.csv')
links_df.head()

Unnamed: 0,src,des,weight
0,0,1431,1
1,0,955,1
2,0,1824,1
3,0,70,1
4,0,592,1


This is actually binary data stating if 2 people are friends or not, it's not pre-trained data like the previous 2. We will try to use it as well, but will have to transform it first into its sparsed version and then factorize it.

### General checkings

Let's also check how many different users and items we have among our training data, and also if we don't have any user missing (meaning all our users are consecutive, which will be important afterwards while building the different matrices):

In [None]:
# variable that will be activated if 2 user_ids are not consecutives
not_consecutive = 0
# unique users
unique_users = train_data.user_id.unique()
# number of different users
n_users = len(unique_users)
# looping over all the users
for i in unique_users:
    # if 2 consecutive users have not consecutive ids, activate not_consecutive
    # and print the index
    if unique_users[i] != unique_users[i-1] + 1:
        if i != 0:
            print(i)
            not_consecutive = 1
if not_consecutive == 0:
  print('All user_ids are consecutive and there are', n_users)
  n_items = len(train_data.item_id.unique())

All user_ids are consecutive and there are 3466


In [None]:
n_items = len(train_data.item_id.unique())
n_items

9004

We can see that we have 3466 distinct users and 9004 different images.

## III. Building the models

### MF, GMF and MLP basic models

#### Introduction

Now I will proceed to building the different Models, using the Neural Networks approach for all of them. Firstly we will present the very basic version of the different alternatives, in order to understand which will be the one that could potentially perform better. The basic alternatives that we are firstly going to assess are the following:

* Matrix Factorization (MF) Model that has an embedding for the users and the items and we make the prediction using a product between these 2.
* Generalized Matrix Factorization (GMF) Model in which we add an extra layer to model the interaction between user and item.
* Multi-Layer Perceptron (MLP) that consists of three types of layers: input, output and hidden layers, that can be useful to model any non-linear relationship between users and items.

#### Adding negative samples to the data

But before diving into building the models, we have to remark that the training data only has positive instances, meaning that there is only data about users and items that have interacted. There is not data about items that users have not interacted with. And it's important to include negative samples among our training data, so that our model can really learn what the user likes and rank it higher for instance than items he dislikes or has not interacted with. To try our basic models, we will take 5 negative samples (items the user has not interacted with) per every item the user has interacted with. Note that this N (in this case 5) becomes a hyperparameter that we will be able to tune afterwards when we have decided which model to continue with. Adding the negative samples:

In [None]:
# defining a function to be able to reuse it
def negative_sampler(neg_factor):
    # number_of_negative_samples = neg_factor * number_of_items_a_user_has_interacted_with
    # initializing data frame that will end up containing the training data +
    # the negative samples
    negative_samples_df = pd.DataFrame()
    # all the different items contained in training set
    all_items = set(train_data.item_id)
    # looping over all the different users
    for user_id in train_data.user_id.unique():
        # subsetting with user_id
        train_data_subset = train_data[train_data.user_id==user_id]
        # getting the number of different items the particular user has
        # interacted with
        quant_items_interaction = len(train_data_subset.item_id.unique())
        # items the user has not interacted with
        items_no_interaction = all_items - set(train_data_subset.item_id)
        # we will take n times the number of items the user has interacted with 
        # as negative samples and if n*items exceeds the number of remaining items 
        # we will be in trouble, so if this happens we just take all the items
        # the user has not interacted with
        if quant_items_interaction * neg_factor > len(items_no_interaction):
            sample_size = len(items_no_interaction)
        else:
            sample_size = neg_factor * quant_items_interaction
        # taking a random sample from the items the user has not interacted with
        items_no_interaction_sample = random.sample(items_no_interaction, sample_size)
        # initializing data frame that will be concatenated to the general data frame
        negative_samples = pd.DataFrame({'user_id':[user_id]*len(items_no_interaction_sample), 'item_id':items_no_interaction_sample, 'rating':[0]*len(items_no_interaction_sample)})
        # concatenating the existing positive cases and the sampled negative ones for each user
        train_data_subset = pd.concat([train_data_subset, negative_samples], ignore_index = True)
        # concatenating with the whole data
        negative_samples_df = pd.concat([negative_samples_df, train_data_subset], ignore_index = True)
    # ordering by user and then by item
    negative_samples_df = negative_samples_df.sort_values(by = ['user_id', 'item_id'])
    return(negative_samples_df)

In [None]:
# calling the function with factor of 5
negative_samples_df = negative_sampler(5)

In [None]:
# we can see how now for example user 0 has both items he has interacted with and
# items he hasn't
negative_samples_df[negative_samples_df.user_id==0]

Unnamed: 0,user_id,item_id,rating
0,0,0,1
1,0,1,1
2,0,2,1
3,0,3,1
4,0,4,1
...,...,...,...
89,0,8679,0
82,0,8708,0
69,0,8795,0
23,0,8884,0


In [None]:
negative_samples = len(negative_samples_df[(negative_samples_df.user_id==0) & (negative_samples_df.rating==0)])
positive_samples = len(negative_samples_df[(negative_samples_df.user_id==0) & (negative_samples_df.rating==1)])
print('We have', positive_samples, 'positive samples and', negative_samples, 'negative samples')

We have 20 positive samples and 100 negative samples


We can see that we have 5 times more of negative samples than positive's. We could do this to capture the complete sparsity of the matrix, but it would take a long time to train all our models, so this will do it for the moment.

### The models

We are going now to define the classes for the 3 different models:

In [None]:
# defining the basic Matrix Factorization model
class MF(nn.Module):
    def __init__(self, num_users, num_items, emb_size=256):
        super(MF, self).__init__()
        # embedding corresponding to the user matrix
        self.user_emb = nn.Embedding(num_users, emb_size)
        # embedding corresponding to the item matrix
        self.item_emb = nn.Embedding(num_items, emb_size)
        # initialization of the values of the matrices
        self.user_emb.weight.data.uniform_(0, 0.05)
        self.item_emb.weight.data.uniform_(0, 0.05)
        
    def forward(self, u, v):
        # the updated value for the matrices for each user/item
        U = self.user_emb(u)
        V = self.item_emb(v)
        # product between the matrices to return the prediction
        return (U*V).sum(1)

In [None]:
# defining the Generalized Matrix Factorization model 
class GMF(nn.Module):
    def __init__(self, n_user, n_item, n_emb=8):
        super(GMF, self).__init__()

        self.n_emb = n_emb
        self.n_user = n_user
        self.n_item = n_item
        # similar to the previous one but we add an extra layer to model
        # the interaction between user and item
        self.embeddings_user = nn.Embedding(n_user, n_emb)
        self.embeddings_item = nn.Embedding(n_item, n_emb)
        self.out = nn.Linear(in_features=n_emb, out_features=1)

        for m in self.modules():
            if isinstance(m, nn.Embedding):
                nn.init.normal_(m.weight)
            elif isinstance(m, nn.Linear):
                nn.init.uniform_(m.weight)

    def forward(self, users, items):

        user_emb = self.embeddings_user(users)
        item_emb = self.embeddings_item(items)
        # multiplication between matrices
        prod = user_emb*item_emb
        # activation of the multiplication will give the final predictions
        preds = torch.sigmoid(self.out(prod))

        return preds

In [None]:
class MLP(nn.Module):
    def __init__(self, n_user, n_item, layers, dropouts):
        super(MLP, self).__init__()

        self.layers = layers
        self.n_layers = len(layers)
        self.dropouts = dropouts
        self.n_user = n_user
        self.n_item = n_item
        # initializing users and items embeddings as in the previous models
        self.embeddings_user = nn.Embedding(n_user, int(layers[0]/2))
        self.embeddings_item = nn.Embedding(n_item, int(layers[0]/2))
        # defining the hidden layers
        self.mlp = nn.Sequential()
        for i in range(1,self.n_layers):
            self.mlp.add_module("linear%d" %i, nn.Linear(layers[i-1],layers[i]))
            self.mlp.add_module("relu%d" %i, torch.nn.ReLU())
            self.mlp.add_module("dropout%d" %i , torch.nn.Dropout(p=dropouts[i-1]))
        # output layer
        self.out = nn.Linear(in_features=layers[-1], out_features=1)

        for m in self.modules():
            if isinstance(m, nn.Embedding):
                nn.init.normal_(m.weight)

    def forward(self, users, items):

        user_emb = self.embeddings_user(users)
        item_emb = self.embeddings_item(items)
        # different than the other 2 previous models, we don't perform product
        # between the matrices but we concatenate them
        emb_vector = torch.cat([user_emb,item_emb], dim=1)
        emb_vector = self.mlp(emb_vector)
        preds = torch.sigmoid(self.out(emb_vector))

        return preds

Once we have defined our model classes, we can also define a function to train these models and another one to evaluate them.

First, the training function:

In [None]:
def train_epocs(model_name , train_data, epochs=10, lr=0.01, wd=0.0, unsqueeze=False):
    # defining optimizer that will update the weights of our model trying to minimize
    # loss function
    print('MODEL:' , model_name, 'LEARNING RATE =', lr)
    model = model_dict[model_name][0]
    criterion = model_dict[model_name][1]
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=wd)
    model.train()
    for i in range(epochs):
        # transforming input to tensor
        users = torch.LongTensor(train_data.user_id.values).cuda()
        items = torch.LongTensor(train_data.item_id.values).cuda()
        ratings = torch.FloatTensor(train_data.rating.values).cuda()
        if unsqueeze:
            ratings = ratings.unsqueeze(1)
        if model_name == 'GMF' or model_name == 'MLP':
          preds = model(users, items).squeeze(1)
        else:
          preds = model(users, items)
        # defining loss function
        loss = criterion(preds, ratings)
        # to update the parameters when evaluating
        optimizer.zero_grad()
        # calculating gradient
        loss.backward()
        # updating parameters
        optimizer.step()
        print('EPOC', i, 'LOSS:', loss.item())

Then, all the functions that relate to the evaluation of the performances of our models:

In [None]:
# defining the metric we are going to use when evaluating the different models
# NDCG is a metric that cares about which proportion of recommended items
# is accurate as well as their order of importance
def get_ndcg(ranklist, gtitem):
    for i in range(len(ranklist)):
        item = ranklist[i]
        if item == gtitem:
            return math.log(2) / math.log(i+2)
    return 0

In [None]:
# defining function that will use the above to retrieve the performance metric
def get_scores(items, preds, topk):
    gtitem = items[0]
    # the following 3 lines of code ensure that the fact that the 1st item is
    # gtitem does not affect the final rank
    randidx = np.arange(100)
    np.random.shuffle(randidx)
    items, preds = items[randidx], preds[randidx]
    map_item_score = dict( zip(items, preds) )
    # getting the topk user-item pairs with the highest scores (ordered)
    ranklist = heapq.nlargest(topk, map_item_score, key=map_item_score.get)
    ndcg = get_ndcg(ranklist, gtitem)
    return ndcg

In [None]:
# Evaluation function
def evaluate(model_name, val_loader, use_cuda, topk):
    # with model.eval() our parameters will not be updated
    model = model_dict[model_name][0]
    model.eval()
    scores=[]
    with torch.no_grad():
        # looping each batch in validation loader
        for data in val_loader:
            users = data[0]
            items = data[1]
            labels = data[2].float()
            if use_cuda:
                users, items, labels = users.cuda(), items.cuda(), labels.cuda()
            # the predictions
            preds = model(users, items)
            items_cpu = items.cpu().numpy()
            preds_cpu = preds.detach().cpu().numpy()
            litems=np.split(items_cpu, val_loader.batch_size//100)
            lpreds=np.split(preds_cpu, val_loader.batch_size//100)
            scores += [get_scores(it,pr,topk) for it,pr in zip(litems,lpreds)]
    ndcg = np.array(scores).mean()
    print(model_name, 'validation NDCG:', ndcg)
    print('\n')
    return (ndcg)

In [None]:
# defining the validation loader that is used in the above function.
# the size is 100 because we are given (in validation and test), 100 possible
# items to choose from for every user.

users_tensor_val = torch.tensor(val_data.user_id.tolist())
items_tensor_val = torch.tensor(val_data.item_id.tolist())
target_tensor_val = torch.tensor(val_data.rating.tolist())
dataval = TensorDataset(users_tensor_val, items_tensor_val, target_tensor_val)

val_loader = DataLoader(dataset=dataval,
    batch_size=100,
    shuffle=False
    )

### Training the models
We proceed to initializing each model:

In [None]:
model_GMF = GMF(n_users, n_items, 256)
criterion_GMF = nn.BCELoss()

In [None]:
model_MF = MF(n_users, n_items, 256)
criterion_MF = nn.BCEWithLogitsLoss()

In [None]:
layers = [128, 64, 32, 16] # first layer is n_emb*2
dropouts = [0., 0., 0.0] # len(dropouts) = len(layers)-1
model_MLP = MLP(n_users, n_items, layers, dropouts)
criterion_MLP = nn.BCELoss()

In [None]:
# we store our models and optimisers in a dictionary for easier manipulation
model_dict = dict()
model_dict['GMF'] = [model_GMF, criterion_GMF]
model_dict['MF'] = [model_MF, criterion_MF]
model_dict['MLP'] = [model_MLP, criterion_MLP]

In [None]:
use_cuda = torch.cuda.is_available()
if use_cuda:
    model_dict['GMF'] = [model_dict['GMF'][0].cuda(), model_dict['GMF'][1]]

In [None]:
use_cuda = torch.cuda.is_available()
if use_cuda:
    model_dict['MF'] = [model_dict['MF'][0].cuda(), model_dict['MF'][1]]

In [None]:
use_cuda = torch.cuda.is_available()
if use_cuda:
    model_dict['MLP'] = [model_dict['MLP'][0].cuda(), model_dict['MLP'][1]]

And now we are going to train each of the models and then evaluate them using the validation test, to see which one we are going to use for further analysis and tunning:

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# initializing data frame that will store the scores for
# the 3 different algorithms
scores_df = pd.DataFrame()
# index of the data frame
i = 0
for model_name in ['GMF', 'MF', 'MLP']:
    # model to device
    model_dict[model_name][0] = model_dict[model_name][0].to(device)
    # criterion to device
    model_dict[model_name][1] = model_dict[model_name][1].to(device)
    # we train each model using 20 epocs, the data containing
    # positive and negative samples and a learning rate of 0.1
    train_epocs(model_name, negative_samples_df, 20, lr=0.1)
    # calculating the validation score for each model, using
    # ktop = 15 as required by the task
    ndcg_score = evaluate(model_name, val_loader, use_cuda, 15)
    scores_df.loc[i, 'Model'] = model_name
    scores_df.loc[i, 'negative_samples_ratio'] = 5
    scores_df.loc[i, 'NDGC'] = ndcg_score
    i+=1

MODEL: GMF LEARNING RATE = 0.1
EPOC 0 LOSS: 6.551529884338379
EPOC 1 LOSS: 3.2693233489990234
EPOC 2 LOSS: 1.5473445653915405
EPOC 3 LOSS: 0.800610363483429
EPOC 4 LOSS: 0.4851033687591553
EPOC 5 LOSS: 0.3324180245399475
EPOC 6 LOSS: 0.25514301657676697
EPOC 7 LOSS: 0.21580785512924194
EPOC 8 LOSS: 0.1966424137353897
EPOC 9 LOSS: 0.1866900473833084
EPOC 10 LOSS: 0.17668569087982178
EPOC 11 LOSS: 0.1606050729751587
EPOC 12 LOSS: 0.13842368125915527
EPOC 13 LOSS: 0.11340328305959702
EPOC 14 LOSS: 0.08827237039804459
EPOC 15 LOSS: 0.06483779102563858
EPOC 16 LOSS: 0.044650908559560776
EPOC 17 LOSS: 0.028863536193966866
EPOC 18 LOSS: 0.017734767869114876
EPOC 19 LOSS: 0.010593939572572708
GMF validation NDCG: 0.06168397036257753


MODEL: MF LEARNING RATE = 0.1
EPOC 0 LOSS: 0.749582827091217
EPOC 1 LOSS: 1.3975855112075806
EPOC 2 LOSS: 0.7108043432235718
EPOC 3 LOSS: 0.8337104916572571
EPOC 4 LOSS: 1.1111959218978882
EPOC 5 LOSS: 0.9124086499214172
EPOC 6 LOSS: 0.7134793400764465
EPOC 7 LOS

### Performances

In [None]:
scores_df

Unnamed: 0,Model,negative_samples_ratio,NDGC
0,GMF,5.0,0.061684
1,MF,5.0,0.079096
2,MLP,5.0,0.060017


We can see that the performance of the MF model is better than the other 2, for the basic configuration. We are going then to continue working only with this model in order to try to enhance its performance. The good thing about this model, apart from its superior performance in this case, is that it's easier to include to it the rest of the data that we are given: pre-trained user information, pre-trained items information and social links between users. Because, until now, we have not used any more than the training data to build our model.

### Matrix factorization with users and items pre-trained data

Let's first try to build a MF model including the pre-trained data for users and items (we shall include the social links in a further model and compare them). But how to do this?
We know that in MF, the prediction is finally achieved by $U*V$, like in our previous basic MF model, where U and V are the embeddings of user and item respectively. But we could easily include the pre-trained features to the model by doing something like the following

$(U + pre\_trained\_user)*(V + pre\_trained\_item)$.

As we already know,  the first term of the product is still refering to user and the second one to item, so it would be OK. If we further develop the expression we will end up with the following cross-products at the time of making our prediction:

$(U*V + U*pre\_trained\_item + pre\_trained\_user*V + pre\_trained\_user*pre\_trained\_item)$

So let's go ahead and add all this relationships and see how it goes! But before that, one remark: as pre_trained_user and pre_trained_item, as stated by their names, are supposed to be pre_trained, we are not going to update their weights during the optimization (this will be traduced in the code as requires.grad = False).


In [None]:
# we first convert the pre_trained data into tensors, to be able to include
# it in our model
users_df = users_df.rename(columns = {'Unnamed: 0':'user_id'})
items_df = items_df.rename(columns = {'Unnamed: 0':'item_id'})
# tensor for user
users_tensor = torch.tensor(users_df.drop('user_id', axis=1).values.astype(np.float32))
# tensor for item data
items_tensor = torch.tensor(items_df.drop('item_id', axis=1).values.astype(np.float32))

In [None]:
class MF_UI(nn.Module):
    def __init__(self, num_users, num_items, emb_size=256):
        super(MF_UI, self).__init__()
        # first part: same as basic MF
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.item_emb = nn.Embedding(num_items, emb_size)
        self.user_emb.weight.data.uniform_(0, 0.05)
        self.item_emb.weight.data.uniform_(0, 0.05)
        # here we initialize 2 new embeddings that we are going
        # to fill up with the user and item tensors created above
        self.pre_trained_users = nn.Embedding(num_users, emb_size)
        self.pre_trained_items = nn.Embedding(num_items, emb_size)
        # for user pre-trained data
        self.pre_trained_users.weight=nn.Parameter(users_tensor)
        # to avoid updating weights
        self.pre_trained_users.weight.requires_grad = False
        # for items data
        self.pre_trained_items.weight=nn.Parameter(items_tensor)
        # to avoid updating their weights
        self.pre_trained_items.weight.requires_grad = False
        
    def forward(self, users, items):
        # u and v are defined as basic MF
        u = self.user_emb(users)
        v = self.item_emb(items)
        # pre_trained data
        trained_users = self.pre_trained_users(users)
        trained_items = self.pre_trained_items(items)
        # developing the expression with all the cross-products explained above
        ui = torch.sum(u * v, dim=1)
        u_pretrained_i = torch.sum(u*trained_items, dim =1)
        v_pretrained_u = torch.sum(v*trained_users, dim = 1)
        pretrained_i_u = torch.sum(trained_users*trained_items, dim = 1)
        # the prediction finally is the sum of all
        pred = ui + u_pretrained_i + v_pretrained_u + pretrained_i_u
        return pred

### Matrix factorization with users, items and social data

And what if we want to take this one step further and also include the social link data? We could do it analogously and then compare both models. Social link data is strictly refered to the users, so our "user side" of the product would now have 3 components, as follows:

$(U + pre\_trained\_user + social\_links)*(V + pre\_trained\_item)$

which would yield:

$U*V + U*pre\_trained\_item + pre\_trained\_user*V + pre\_trained\_user*pre\_trained\_item + social\_links*V + social\_links*pre\_trained\_item$

Our social links data is a user-user interaction matrix of num_users*num_users. To be able to make it interact with the rest, we could factorize it into 2 matrices W and H with 256 factors each (to match with all the rest). Our prediction would expand then to the following:

$U*V + U*pre\_trained\_item + pre\_trained\_user*V + pre\_trained\_user*pre\_trained\_item + W*V + W*pre\_trained\_item + H*V + H*pre\_trained\_item$

So, before building our model, let's factorize the social links matrix:

In [None]:
# we are going to use the Non-Negative Matrix Factorization module from sklearn
from sklearn.decomposition import NMF
# first creating the sparse matrix of num_users*num_users because we are only
# given the positive instances of the social links
links_matrix_sparse = np.zeros((len(train_data.user_id.unique()),len(train_data.user_id.unique())))
# storing the user relationships in both ways to be able to fill the sparse
# matrix with ones in the corresponding coordinates
links_src_des = list(zip(links_df.src, links_df.des))
links_des_src = list(zip(links_df.des, links_df.src))
# filling sparse matrix with users that are "friends"
for index in links_src_des:
  links_matrix_sparse[index] = 1
for index in links_des_src:
  links_matrix_sparse[index] = 1
# decomposing the matrix into W and H
model = NMF(n_components=256, init='random', random_state=0)
W = model.fit_transform(links_matrix_sparse)
H_t = np.transpose(model.components_)

Now we are ready to build the model which is analogous to the one containing only the user and item interactions:

In [None]:
class MF_UI_LINKS(nn.Module):
    def __init__(self, num_users, num_items, emb_size=256):
        super(MF_UI_LINKS, self).__init__()
        # same thing as before but we have to initialize 2 extra matrices:
        # W and H
        self.user_emb = nn.Embedding(num_users, emb_size)
        self.item_emb = nn.Embedding(num_items, emb_size)
        self.user_emb.weight.data.uniform_(0, 0.05)
        self.item_emb.weight.data.uniform_(0, 0.05)
        self.pre_trained_users = nn.Embedding(num_users, emb_size)
        self.pre_trained_items = nn.Embedding(num_items, emb_size)
        self.pre_trained_users.weight=nn.Parameter(users_tensor)
        self.pre_trained_users.weight.requires_grad = False
        self.pre_trained_items.weight=nn.Parameter(items_tensor)
        self.pre_trained_items.weight.requires_grad = False
        self.links_W = nn.Embedding(num_users, emb_size)
        self.links_H = nn.Embedding(num_users, emb_size)
        self.links_W.weight=nn.Parameter(torch.tensor(W))
        self.links_W.weight.requires_grad = False
        self.links_H.weight=nn.Parameter(torch.tensor(H_t))
        self.links_H.weight.requires_grad = False

    def forward(self, users, items):
        # we add the interactions specified above
        u = self.user_emb(users)
        v = self.item_emb(items)
        trained_users = self.pre_trained_users(users)
        trained_items = self.pre_trained_items(items)
        links_W = self.links_W(users)
        links_H = self.links_H(users)
        ui = torch.sum(u * v, dim=1)
        u_pretrained_i = torch.sum(u*trained_items, dim =1)
        v_pretrained_u = torch.sum(v*trained_users, dim = 1)
        pretrained_i_u = torch.sum(trained_users*trained_items, dim = 1)
        links_W_items = torch.sum(v*links_W, dim = 1)
        links_W_pretrained_i = torch.sum(links_W*trained_items, dim = 1)
        links_H_items = torch.sum(v*links_H, dim = 1)
        links_H_pretrained_i = torch.sum(links_H*trained_items, dim = 1)
        pred = ui + u_pretrained_i + v_pretrained_u + pretrained_i_u + \
        links_W_items + links_W_pretrained_i + links_H_items + links_H_pretrained_i
        return pred

### Training and evaluating the models

Now that we have the 2 models let's train them and compare their performances with the validation set:

In [None]:
# initialize the model with pretrained user and item
model_MF_UI = MF_UI(n_users, n_items, 256)
criterion_MF_UI = nn.BCEWithLogitsLoss()
# initialize the model with pretrained user, item and social links
model_MF_UI_LINK = MF_UI_LINKS(n_users, n_items, 256)
criterion_MF_UI_LINK = nn.BCEWithLogitsLoss()
# we store our models in the same dictionary we were working on with the
# basic models
model_dict['MF_UI'] = [model_MF_UI, criterion_MF_UI]
model_dict['MF_UI_LINK'] = [model_MF_UI_LINK, criterion_MF_UI_LINK]
use_cuda = torch.cuda.is_available()

In [None]:
# we define another function to train and evaluate the models from now on,
# in order not to repeat code over and over
def train_evaluate_models(models, negative_samples, index_df, wd):
    # arguments:
    # models: list of all the models we want to train and evaluate
    # negative_samples: the negative_instances/positive_instances ratio
    # index_df: last index in the scores_df data frame, to be able to append the
    # performance to the ones for other models that we already calculated
    # wd is weight of decay, L2 regularization parameter
    # we run the negative samples function to obtain the input data with the
    # specified ratio
    negative_samples_df = negative_sampler(negative_samples)
    for model_name in models:
        if use_cuda:
            model_dict[model_name] = [model_dict[model_name][0].cuda(), model_dict[model_name][1]]
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        # model to device
        model_dict[model_name][0] = model_dict[model_name][0].to(device)
        model_dict[model_name][1] = model_dict[model_name][1].to(device)
        # we train each model using the data containing positive and negative
        # but let's gradually decrease the learning rates, for our model to be able
        # to refine itself taking smaller steps towards the opposite direction of the
        # gradient as the training increases
        train_epocs(model_name, negative_samples_df, 20, lr=0.1, wd = wd)
        train_epocs(model_name, negative_samples_df, 20, lr=0.05, wd = wd)
        train_epocs(model_name, negative_samples_df, 25, lr=0.01, wd = wd)
        train_epocs(model_name, negative_samples_df, 25, lr=0.001, wd = wd)
        # calculating the validation score for each model, using
        # ktop = 15 as required by the task
        ndcg_score = evaluate(model_name, val_loader, use_cuda, 15)
        scores_df.loc[index_df, 'Model'] = model_name
        scores_df.loc[index_df, 'negative_samples_ratio'] = negative_samples
        scores_df.loc[index_df, 'NDGC'] = ndcg_score
        index_df+=1

In [None]:
# we are going to compare the 2 models from above, using 7 as the ratio
# negative_samples/positive_samples
train_evaluate_models(['MF_UI', 'MF_UI_LINK'], 7, len(scores_df) , wd = 0)

MODEL: MF_UI LEARNING RATE = 0.1
EPOC 0 LOSS: 433.7292175292969
EPOC 1 LOSS: 374.7277526855469
EPOC 2 LOSS: 320.4078063964844
EPOC 3 LOSS: 270.92572021484375
EPOC 4 LOSS: 226.3919219970703
EPOC 5 LOSS: 186.9141845703125
EPOC 6 LOSS: 152.4609375
EPOC 7 LOSS: 122.86993408203125
EPOC 8 LOSS: 97.80474090576172
EPOC 9 LOSS: 76.81956481933594
EPOC 10 LOSS: 59.47769546508789
EPOC 11 LOSS: 45.40359115600586
EPOC 12 LOSS: 34.32390594482422
EPOC 13 LOSS: 26.091543197631836
EPOC 14 LOSS: 20.641616821289062
EPOC 15 LOSS: 17.64630889892578
EPOC 16 LOSS: 16.65658950805664
EPOC 17 LOSS: 16.994243621826172
EPOC 18 LOSS: 18.024377822875977
EPOC 19 LOSS: 19.233539581298828
MODEL: MF_UI LEARNING RATE = 0.05
EPOC 0 LOSS: 20.28384017944336
EPOC 1 LOSS: 14.913393020629883
EPOC 2 LOSS: 10.447698593139648
EPOC 3 LOSS: 6.953083515167236
EPOC 4 LOSS: 4.5775146484375
EPOC 5 LOSS: 3.410247325897217
EPOC 6 LOSS: 3.124905824661255
EPOC 7 LOSS: 3.1906793117523193
EPOC 8 LOSS: 3.2039833068847656
EPOC 9 LOSS: 3.026603

### Performances

In [None]:
scores_df

Unnamed: 0,Model,negative_samples_ratio,NDGC
0,GMF,5.0,0.061684
1,MF,5.0,0.079096
2,MLP,5.0,0.060017
3,MF_UI,7.0,0.214617
4,MF_UI_LINK,7.0,0.214956


We can see that the performances of both new models improved considerably when compared to the old models, and are almost the same when compared to each other, but MF_UI is a simpler model, which is always better. That is why we are going to proceed with this model for further tuning.

### MF with UI pretrained and biases

We are going to add biases to the pre-selected model and see how it goes. The biases added will correspond to the users and the items. I believe it is good to represent this in our model because we definitely will have users that will have a tendence to like more images than others, as well as images that will be naturally more liked that others, independently of the interaction between the 2. 
The prediction calculation will be the same as in the MF_UI model but adding the 2 new terms for the biases. We present the model as follows, as a modification of the MF_UI class:

In [None]:
class MF_UI_BIAS(nn.Module):
    def __init__(self, num_users, num_items, emb_size=256):
        super(MF_UI_BIAS, self).__init__()
        # Exactly the same as MF_UI but we add user bias and item bias
        self.user_emb = nn.Embedding(num_users, emb_size)
        # user bias embedding
        self.user_bias = nn.Embedding(num_users, 1)
        self.item_emb = nn.Embedding(num_items, emb_size)
        # item bias embedding
        self.item_bias = nn.Embedding(num_items, 1)
        self.user_emb.weight.data.uniform_(0, 0.05)
        self.item_emb.weight.data.uniform_(0, 0.05)
        # we initialize the weights of the biases
        self.user_bias.weight.data.uniform_(-0.01,0.01)
        self.item_bias.weight.data.uniform_(-0.01,0.01)
        self.pre_trained_users = nn.Embedding(num_users, emb_size)
        self.pre_trained_items = nn.Embedding(num_items, emb_size)
        self.pre_trained_users.weight=nn.Parameter(users_tensor)
        self.pre_trained_users.weight.requires_grad = False
        self.pre_trained_items.weight=nn.Parameter(items_tensor)
        self.pre_trained_items.weight.requires_grad = False
        
    def forward(self, users, items):
        u = self.user_emb(users)
        v = self.item_emb(items)
        # biases
        b_users = self.user_bias(users).squeeze()
        b_items = self.item_bias(items).squeeze()
        trained_users = self.pre_trained_users(users)
        trained_items = self.pre_trained_items(items)
        ui = torch.sum(u * v, dim=1)
        u_pretrained_i = torch.sum(u*trained_items, dim =1)
        v_pretrained_u = torch.sum(v*trained_users, dim = 1)
        pretrained_i_u = torch.sum(trained_users*trained_items, dim = 1)
        # we add them to the summation of all the other terms
        pred = ui + u_pretrained_i + v_pretrained_u + pretrained_i_u + b_users + b_items 
        return pred

### Training and evaluating the models

Now that we have defined the model with the biases, we could compare it with the one MF_UI without the biases, using different values for the negative_samples/positive_samples ratio and see what happens:

In [None]:
# looping over different values for the ratio (we have to be careful not to 
# make steps that are too big because if our data is too big it can crash
# the session)
for negative_ratio in [7,10,15,17]:
  # initialize the model with pretrained user and item
  model_MF_UI = MF_UI(n_users, n_items, 256)
  criterion_MF_UI = nn.BCEWithLogitsLoss()
  # initialize the model with pretrained user, item and biases
  model_MF_UI_BIAS = MF_UI_BIAS(n_users, n_items, 256)
  criterion_MF_UI_BIAS = nn.BCEWithLogitsLoss()
  # we store our models in the same dictionary we were working on with the
  # basic models
  model_dict['MF_UI'] = [model_MF_UI, criterion_MF_UI]
  model_dict['MF_UI_BIAS'] = [model_MF_UI_BIAS, criterion_MF_UI_BIAS]
  use_cuda = torch.cuda.is_available()
  # the train_evaluate function will iterate over the 2 models, train and
  # evaluate and we are going to be able to see the results in the scores_df
  train_evaluate_models(['MF_UI', 'MF_UI_BIAS'], negative_ratio, len(scores_df), wd = 0)

MODEL: MF_UI LEARNING RATE = 0.1
EPOC 0 LOSS: 434.3218688964844
EPOC 1 LOSS: 375.2796936035156
EPOC 2 LOSS: 320.920654296875
EPOC 3 LOSS: 271.400390625
EPOC 4 LOSS: 226.82066345214844
EPOC 5 LOSS: 187.28656005859375
EPOC 6 LOSS: 152.78294372558594
EPOC 7 LOSS: 123.14726257324219
EPOC 8 LOSS: 98.04100799560547
EPOC 9 LOSS: 77.01679229736328
EPOC 10 LOSS: 59.64283752441406
EPOC 11 LOSS: 45.5445556640625
EPOC 12 LOSS: 34.44348907470703
EPOC 13 LOSS: 26.198272705078125
EPOC 14 LOSS: 20.718915939331055
EPOC 15 LOSS: 17.70241355895996
EPOC 16 LOSS: 16.685121536254883
EPOC 17 LOSS: 17.008817672729492
EPOC 18 LOSS: 18.026330947875977
EPOC 19 LOSS: 19.22974967956543
MODEL: MF_UI LEARNING RATE = 0.05
EPOC 0 LOSS: 20.275278091430664
EPOC 1 LOSS: 14.902205467224121
EPOC 2 LOSS: 10.43221664428711
EPOC 3 LOSS: 6.9371018409729
EPOC 4 LOSS: 4.553884983062744
EPOC 5 LOSS: 3.3823108673095703
EPOC 6 LOSS: 3.1194376945495605
EPOC 7 LOSS: 3.186985492706299
EPOC 8 LOSS: 3.1980388164520264
EPOC 9 LOSS: 3.024

### Performances

In [None]:
scores_df

Unnamed: 0,Model,negative_samples_ratio,NDGC
0,GMF,5.0,0.061684
1,MF,5.0,0.079096
2,MLP,5.0,0.060017
3,MF_UI,7.0,0.214617
4,MF_UI_LINK,7.0,0.214956
5,MF_UI,7.0,0.205384
6,MF_UI_BIAS,7.0,0.216861
7,MF_UI,10.0,0.229761
8,MF_UI_BIAS,10.0,0.231508
9,MF_UI,15.0,0.235472


We can see how the MF_UI_BIAS peforms always slightly better than its MF_UI counterpart and the performance increases as negative_samples_ratio increases. 

Unfortunately, negative_samples_ratio cannot be increased further than that unless we want to work with batches, which I already tried and got worse performances, so we will keep it that way.

So, we will choose the model with the highest performance, which is MF_UI_BIAS with negative_samples_ratio = 17 and play around with the regularization parameter Weight Decay, to see if we can further improve our performance.
Basically the weight of decay is an implementation of the L2 regularization so it can be helpful to prevent overfiting. With larger values for wd we will be imposing a larger penalty to the model, reducing the size of its coefficients. With smaller values of wd we would be allowing the model to be more flexible, imposing no penalty at all when wd=0.

### Regularization for MF with biases

In [None]:
# adding weight decay column to the scores data frame, until now it has always
# been 0
scores_df['wd'] = 0
# reordering the columns
scores_df = scores_df.loc[:, ['Model', 'negative_samples_ratio', 'wd', 'NDGC']]
scores_df.head()

Unnamed: 0,Model,negative_samples_ratio,wd,NDGC
0,GMF,5.0,0,0.061684
1,MF,5.0,0,0.079096
2,MLP,5.0,0,0.060017
3,MF_UI,7.0,0,0.214617
4,MF_UI_LINK,7.0,0,0.214956


In [None]:
# calculating the performances for different values of weight of decay:
wd_values = [1e-12, 1e-10, 1e-8, 1e-6, 0.01]
# adding index to be able to store the weight of decay in the scores data frame,
# which is not supported by the train_evaluate function
j = len(scores_df)
wd_model_number = 1
for wd in wd_values:
  # initialize model
  model_MF_UI_BIAS_WD= MF_UI_BIAS(n_users, n_items, 256)
  criterion_MF_UI_BIAS_WD = nn.BCEWithLogitsLoss()
  model_dict['MF_UI_BIAS_WD'+'_'+ str(wd_model_number)] = [model_MF_UI_BIAS_WD, criterion_MF_UI_BIAS_WD]
  use_cuda = torch.cuda.is_available()
  # negative_samples ratio is fixed in 17 because it corresponded to the 
  # highest score with bias included
  train_evaluate_models(['MF_UI_BIAS_WD'+'_'+ str(wd_model_number)], 17, len(scores_df), wd = wd)
  scores_df.loc[j, 'wd'] = wd
  j += 1
  wd_model_number += 1

MODEL: MF_UI_BIAS_WD_1 LEARNING RATE = 0.1
EPOC 0 LOSS: 468.7370910644531
EPOC 1 LOSS: 404.8341064453125
EPOC 2 LOSS: 345.98126220703125
EPOC 3 LOSS: 292.3467102050781
EPOC 4 LOSS: 244.04315185546875
EPOC 5 LOSS: 201.18739318847656
EPOC 6 LOSS: 163.76315307617188
EPOC 7 LOSS: 131.60537719726562
EPOC 8 LOSS: 104.37188720703125
EPOC 9 LOSS: 81.5759048461914
EPOC 10 LOSS: 62.75010299682617
EPOC 11 LOSS: 47.480194091796875
EPOC 12 LOSS: 35.44874954223633
EPOC 13 LOSS: 26.493160247802734
EPOC 14 LOSS: 20.522741317749023
EPOC 15 LOSS: 17.191877365112305
EPOC 16 LOSS: 16.00737762451172
EPOC 17 LOSS: 16.262903213500977
EPOC 18 LOSS: 17.281593322753906
EPOC 19 LOSS: 18.493213653564453
MODEL: MF_UI_BIAS_WD_1 LEARNING RATE = 0.05
EPOC 0 LOSS: 19.546798706054688
EPOC 1 LOSS: 13.831331253051758
EPOC 2 LOSS: 9.086775779724121
EPOC 3 LOSS: 5.431343078613281
EPOC 4 LOSS: 3.096982002258301
EPOC 5 LOSS: 2.111593723297119
EPOC 6 LOSS: 2.0245373249053955
EPOC 7 LOSS: 2.2192471027374268
EPOC 8 LOSS: 2.3367

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


MODEL: MF_UI_BIAS_WD_2 LEARNING RATE = 0.1
EPOC 0 LOSS: 468.5909423828125
EPOC 1 LOSS: 404.69482421875
EPOC 2 LOSS: 345.84820556640625
EPOC 3 LOSS: 292.2190856933594
EPOC 4 LOSS: 243.92327880859375
EPOC 5 LOSS: 201.076416015625
EPOC 6 LOSS: 163.660400390625
EPOC 7 LOSS: 131.51089477539062
EPOC 8 LOSS: 104.28237915039062
EPOC 9 LOSS: 81.49382781982422
EPOC 10 LOSS: 62.678443908691406
EPOC 11 LOSS: 47.4193000793457
EPOC 12 LOSS: 35.4023323059082
EPOC 13 LOSS: 26.468406677246094
EPOC 14 LOSS: 20.515199661254883
EPOC 15 LOSS: 17.205604553222656
EPOC 16 LOSS: 16.034231185913086
EPOC 17 LOSS: 16.286767959594727
EPOC 18 LOSS: 17.29694938659668
EPOC 19 LOSS: 18.506542205810547
MODEL: MF_UI_BIAS_WD_2 LEARNING RATE = 0.05
EPOC 0 LOSS: 19.5560359954834
EPOC 1 LOSS: 13.843720436096191
EPOC 2 LOSS: 9.09827709197998
EPOC 3 LOSS: 5.453711032867432
EPOC 4 LOSS: 3.1190061569213867
EPOC 5 LOSS: 2.1260550022125244
EPOC 6 LOSS: 2.0282719135284424
EPOC 7 LOSS: 2.220259428024292
EPOC 8 LOSS: 2.3400335311889

### Final scores

In [None]:
# scores data frame ordered from highest NDGC to lowest
scores_df = scores_df.sort_values(by=['NDGC'], ascending = False)
scores_df

Unnamed: 0,Model,negative_samples_ratio,wd,NDGC
15,MF_UI_BIAS_WD_3,17.0,1e-08,0.24404
12,MF_UI_BIAS,17.0,0.0,0.241829
13,MF_UI_BIAS_WD_1,17.0,1e-12,0.24008
14,MF_UI_BIAS_WD_2,17.0,1e-10,0.238832
10,MF_UI_BIAS,15.0,0.0,0.237491
11,MF_UI,17.0,0.0,0.236593
9,MF_UI,15.0,0.0,0.235472
8,MF_UI_BIAS,10.0,0.0,0.231508
7,MF_UI,10.0,0.0,0.229761
16,MF_UI_BIAS_WD_4,17.0,1e-06,0.226451


We can see that the best performing model of all is the Matrix Factorization, having included users and items pre-trained features, with a negative_ratio of 17, biases and a weight of decay of 1e-8.

### Making predictions

Now that we have found our best performing model we can go ahead and make the predictions:

In [None]:
# creating test loader

users_tensor_test = torch.tensor(test_data.user_id.tolist())
items_tensor_test = torch.tensor(test_data.item_id.tolist())
datatest = TensorDataset(users_tensor_test, items_tensor_test)

test_loader = DataLoader(dataset=datatest,
    # for speed purposes we use large test batch sizes. These will be broken in 
    # chunks of 100 because we are given 100 instances of each user
    batch_size=100,
    shuffle=False
    )

In [None]:
# best-performing model
model = model_dict['MF_UI_BIAS_WD_3'][0]
# data frame that will store the recommendations
recommendations_df = pd.DataFrame()
i = 0
model.eval()
with torch.no_grad():
    for data in test_loader:
        user = data[0]
        items = data[1]
        if use_cuda:
            user, items = user.cuda(), items.cuda()
        preds = model(user, items)
        items_cpu = items.cpu().numpy()
        preds_cpu = preds.detach().cpu().numpy()
        litems=np.split(items_cpu, test_loader.batch_size//100)
        lpreds=np.split(preds_cpu, test_loader.batch_size//100)
        gtitem = items[0]
        # the following 3 lines of code ensure that the fact that the 1st item is
        # gtitem does not affect the final rank
        randidx = np.arange(100)
        np.random.shuffle(randidx)
        items, preds = np.array(litems[0])[randidx], np.array(lpreds[0])[randidx]
        map_item_score = dict( zip(items, preds) )
        rank_list = heapq.nlargest(15, map_item_score, key=map_item_score.get)
        user_id = user.tolist()[0]
        # storing predictions in data frame
        for recommended_item in rank_list:
          recommendations_df.loc[i, 'user_id'] = user_id
          recommendations_df.loc[i, 'item_id'] = recommended_item
          i += 1
recommendations_df['user_id'] = recommendations_df['user_id'].astype(int)
recommendations_df['item_id'] = recommendations_df['item_id'].astype(int)
# saving results
recommendations_df.to_csv('/content/drive/My Drive/res2021/31240992_bias17_output.csv', index = False)

## IV. References

* James Le (2020). Recommendation System Series Part 4: The 7 Variants of Matrix Factorization For Collaborative Filtering. Retrieved from 
https://towardsdatascience.com/recsys-series-part-4-the-7-variants-of-matrix-factorization-for-collaborative-filtering-368754e4fab5

* GitHub (2021). Matrix Factorization Experiments. Retrieved from https://github.com/khanhnamle1994/MetaRec/tree/master/Matrix-Factorization-Experiments

* GitHub (2019). Generalized Matrix Factorization (GMF). Retrieved from https://github.com/jrzaurin/RecoTour/blob/master/Amazon/neural_cf/Chapter02_GMF.ipynb

* GitHub (2019). Multi-Layer Perceptron (MLP) approach to Matrix Factorization. Retrieved from https://github.com/jrzaurin/RecoTour/blob/master/Amazon/neural_cf/Chapter03_MLP.ipynb
