<a href="https://colab.research.google.com/github/LinaDanilina/recommender-system/blob/master/rnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [2]:
! ls
%cd drive/My Drive/spotlight

drive  sample_data
/content/drive/My Drive/spotlight


In [0]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [0]:
from sklearn.metrics import (
    mean_squared_error,
    mean_absolute_error,
    r2_score,
    explained_variance_score,
    roc_auc_score,
    log_loss,
)

In [0]:
train=pd.read_csv('train.csv')
train=train[['user_id','movie_id','rating','unix_timestamp']]
train=train.rename(columns={'user_id':'user_ids','movie_id':'item_ids','rating':'ratings','unix_timestamp':'timestamps'})

test=pd.read_csv('test.csv',delimiter=',')
test=test[['user_id','movie_id','rating','unix_timestamp']]
test=test.rename(columns={'user_id':'user_ids','movie_id':'item_ids','rating':'ratings','unix_timestamp':'timestamps'})

In [20]:
from spotlight.interactions import Interactions

train=Interactions(train['user_ids'][:].values,train['item_ids'][:].values,train['ratings'][:].values,train['timestamps'][:].values)
test=Interactions(test['user_ids'][:].values,test['item_ids'][:].values,test['ratings'][:].values,test['timestamps'][:].values)
print(train,test)

<Interactions dataset (943 users x 1682 items x 75000 interactions)> <Interactions dataset (943 users x 1680 items x 25000 interactions)>


In [21]:
train.user_ids

array([ 46, 845,  22, ..., 523,   4, 406])

We can feed our dataset to the ExplicitFactorizationModel class - and sklearn-like object that allows us to train and evaluate the explicit factorization models.

Internally, the model uses the BilinearNet class to represents users and items. It's composed of a 4 embedding layers:

* a (num_users x latent_dim) embedding layer to represent users,
* a (num_items x latent_dim) embedding layer to represent items,
* a (num_users x 1) embedding layer to represent user biases, and
* a (num_items x 1) embedding layer to represent item biases.

Together, these give us the predictions. Their accuracy is evaluated using one of the Spotlight losses. In this case, we'll use the regression loss, which is simply the squared difference between the true and the predicted rating.

In [0]:
import torch

from spotlight.factorization.explicit import ExplicitFactorizationModel

model = ExplicitFactorizationModel(loss='regression',
                                   embedding_dim=128,  # latent dimensionality
                                   n_iter=10,  # number of epochs of training
                                   batch_size=1024,  # minibatch size
                                   l2=1e-9,  # strength of L2 regularization
                                   learning_rate=1e-3,
                                   use_cuda=torch.cuda.is_available())

In [23]:
history=model.fit(train, verbose=True)

Epoch 0: loss 13.161509874704722
Epoch 1: loss 8.068518078004992
Epoch 2: loss 2.042907594023524
Epoch 3: loss 1.1207967320004024
Epoch 4: loss 0.9621324225051983
Epoch 5: loss 0.9081068216143428
Epoch 6: loss 0.8802197535295744
Epoch 7: loss 0.8652686218957644
Epoch 8: loss 0.8548531395358008
Epoch 9: loss 0.846434818731772


In [24]:
from spotlight.evaluation import rmse_score

train_rmse = rmse_score(model, train)
test_rmse = rmse_score(model, test)

print('Train RMSE {:.3f}, test RMSE {:.3f}'.format(train_rmse, test_rmse))

Train RMSE 0.907, test RMSE 0.952


In [0]:
 predictions = model.predict(test.user_ids, test.item_ids)

In [0]:
def metrics(data_true, data_pred):
    mse=np.sqrt(mean_squared_error(data_true, data_pred))
    mae=mean_absolute_error(data_true, data_pred)
    r2=r2_score(data_true, data_pred)
    ex_var=explained_variance_score(test.ratings, predictions)
    df=pd.DataFrame({"MSE": mse, "MAE":mae, "R2_score":r2, "explained variance":ex_var},index=[0])
    return df

In [27]:
metrics(test.ratings, predictions)

Unnamed: 0,MSE,MAE,R2_score,explained variance
0,0.952237,0.750379,0.289279,0.289508
