In [1]:
import pandas as pd

from src.metrics import map_score, mrr_score, ndcg_score, rmse_score
from src.models.mean_rating_baseline import MeanRatingRecommender
from src.utils import train_test_split, to_user_movie_matrix, make_binary_matrix

Let's load the dataset with users' ratings of movies and split it to training/test subsets by the timestamp.

In [2]:
movies = pd.read_table("../data/movies_extended.csv", sep=',', engine='python', encoding='latin1')

ratings = pd.read_table("../data/ratings.dat", sep="::", names=['UserID', 'MovieID', 'Rating', 'Timestamp'], engine='python')
ratings['Timestamp'] = pd.to_datetime(ratings['Timestamp'], unit='s')

ratings = ratings[ratings['MovieID'].isin(movies['MovieID'])]

train_ratings, test_ratings = train_test_split(ratings, 'Timestamp')
user_movie_train = to_user_movie_matrix(train_ratings)
user_movie_test = to_user_movie_matrix(test_ratings) 

Now let's train the baseline model, which calculates average ratings for each movie and predicts it to any user.

In [3]:
baseline = MeanRatingRecommender()
baseline.train(user_movie_train)

y_pred = baseline.predict(make_binary_matrix(user_movie_test.get_rating_matrix()))

Given the predicted ratings and test dataset, we are going to evaluate our model by four metrics:
* mean average precision (MAP)
* mean reciprocal rank (MRR)
* normalized discounted cumulative gain (NDCG)
* root mean squared error (RMSE)

In [4]:
map_score_value = map_score(user_movie_test, y_pred, top=5)
mrr_score_value = mrr_score(user_movie_test, y_pred, top=5)
ndcg_score_value = ndcg_score(user_movie_test, y_pred, top=5)
rmse_score_value = rmse_score(user_movie_test, y_pred)

print(f'Baseline MAP: {map_score_value}')
print(f'Baseline MRR: {mrr_score_value}')
print(f'Baseline NDCG: {ndcg_score_value}')
print(f'Baseline RMSE: {rmse_score_value}')

Baseline MAP: 0.16926473001805403
Baseline MRR: 0.40654029213851967
Baseline NDCG: 0.2329809889762562
Baseline RMSE: 1.0103175201035013


A MAP of 0.181 indicates that, on average, about 18.1% of the recommended items are relevant.

An MRR of 0.392 means that, on average, the first relevant item appears between the 2nd and 3rd positions in the recommendation list. This indicates that users are likely to find relevant items fairly quickly, which is a positive outcome.

NDCG measures the quality of the recommendations by considering the position of the relevant items in the list, with higher-ranked items contributing more to the score.

RMSE measures the differences between the predicted and actual ratings, which in our case is lower than 1.