# SVD++ 

We implement a variation of the SVD++ algorithm proposed by Yehuda Koren. Specifically, we apply the following changes:

- add learning rate decay
- add momentum to the gradients
- use heuristics to initialize user biases, item biases, and item factors as proposed by Zhengzheng Xian et al., instead of learning them in the optimization step

The decay and momentum are added to stabilize the optimization, while the heuristic initializations are needed to prevent prohibitive training times.

As a further way to speed-up the training process, we implement the optimization step in cython.

We call this algorithm **SGDPP2**, which stands for Stochastic Gradient Descent Plus-Plus v2.

### Importing the libraries

In [3]:
import numpy as np
import pandas as pd
from surprise import AlgoBase, PredictionImpossible, Reader, Dataset, accuracy

ModuleNotFoundError: No module named 'surprise'

### Initialize the hyperparameters

We first initialize the hyperparameters for the algorithm, such as the number of latent factors.

In [1]:
# latent factors
n_factors = 192
# number of epochs
n_epochs = 85
# initialization Gaussian mean for matrices P, Q
init_mean = 0.2
# intialization Gaussian standard deviation for matrices P, Q
init_std = 0.005
# learning rate for matrix P, representing the users' latent features
lr_pu = 0.005
# learning rate for matrix Q, representing the items' latent features
lr_qi = 0.005
# the strenght of the gradient momentum for matrix P
alpha_pu = 0.3
# the strenght of the gradient momentum for matrix Q
alpha_qi = 0.3
# the (linear) decay rate associated with the learning rate for matrix P
decay_pu = 0.02
# the (linear) decay rate associated with the learning rate for matrix Q
decay_qi = 0.05
# the regularization strenght for matrix P
reg_pu = 0.06
# the regularization strenght for matrix Q
reg_qi = 0.065
# the regularization strenght for the initialization of the user biases
lambda_bu = 25
# the regularization strenght for the initialization of the item biases
lambda_bi = 0.5
# the regularization strenght for the initialization of the item factors
lambda_yj = 50

### Prepare the dataset

We first read the dataset as a pandas DataFrame.

In [None]:
data_train_raw = pd.read_csv(file_path)

# parse rows and columns
row_str = data_train_raw['Id'].apply(lambda x: x.split('_')[0])
row_id = row_str.apply(lambda x: int(x.split('r')[1]) - 1)
col_str = data_train_raw['Id'].apply(lambda x: x.split('_')[1])
col_id = col_str.apply(lambda x: int(x.split('c')[1]) - 1)

# apply changes
data_train_raw['row'] = row_id
data_train_raw['col'] = col_id

# dataset as data frame
data_train_df = data_train_raw.loc[:,['row', 'col', 'Prediction']]

### Prepare training and test sets

Next we prepare the training and test sets by using the surprise library, based on the previously computed DataFrame.

In [None]:
# set up surprise dataset
reader = Reader()
dataset = Dataset.load_from_df(df[['row', 'col', 'Prediction']], reader)

# now set up training and test set, with a test split of 25%
trainset, testset = train_test_split(data, test_size=0.25)

### Heuristic initialization

We first initialize the user and item biases. Each entry $b_i$, $b_u$ is initialized as follows:

$$b_i = \frac{\sum_{u\in R(i)}(r_{ui}-\mu)}{\lambda_{bi} + |R(i)|}$$
$$b_u = \frac{\sum_{i\in R(u)}(r_{ui}-\mu-b_i)}{\lambda_{bu} + |R(u)|}$$

where $R(u)$ is the set of items rated by user $u$, and viceversa.

The item factors are initialized as follows:

$$Y_u = \frac{\sum_{i\in R(u)}V_i}{\sqrt{|R(u)|}(\lambda_{yj} + |R(u)|)}$$

where $V \in R^{items\times factors}$ is the items-to-factors matrix obtained from SVD. Note that for this step we do not impute the ratings matrix.