# Recommender Systems

<small><i>Updated January 2022</i></small>

<div class="alert alert-info" style = "border-radius:10px;border-width:3px;border-color:darkblue;font-family:Verdana,sans-serif;font-size:16px;">
<h2>Outline</h2>
<ol>
    <li>What is a recommender system?</li>
    <li>How to build a recommender system? </li>
    <li>How to evaluate its success?</li>
</ol>
</div>

## Steps to build a recommender system:
<ol>
    <li>Data collection and understanding</li>
    <li>Data filtering/cleaning</li>
    <li>Learning<br>
        <span style="font-size:smaller">E.g., using item/user similarity function</span></li>
    <li>Evaluation</li>
</ol>

<hr/>

# Hands on
## Evaluating a RecSys: case use on our CF RecSys for Movielens

We will use again the MovieLens dataset, which you should have downloaded to complete the first notebook.

Let us first load the libraries that we are going to need:

In [None]:
%autosave 150
%matplotlib inline
import pandas as pd
import numpy as np
import math
import copy
import random
import matplotlib.pylab as plt

And, next, the dataset:

In [None]:
# The dataset is composed of 3 main files

# The users file 
u_cols = ['user_id', 'age', 'gender', 'occupation', 'zip_code']
users = pd.read_csv('ml-100k/u.user', sep='|', names=u_cols)

# The movies (items) file
m_cols = ['movie_id', 'title', 'release_date']
# It contains aditional columns indicating, among other the movies' genre.
# Let's only load the first three columns:
movies = pd.read_csv('ml-100k/u.item', sep='|', names=m_cols, usecols=range(3), encoding='latin-1')

# The ratings file 
r_cols = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_csv('ml-100k/u.data', sep='\t', names=r_cols)


# We merge all three dataframes into a single dataset
data = pd.merge(pd.merge(ratings, users), movies)
# and keep only the columns that we are going to use
data = data[['user_id', 'rating', 'movie_id', 'title']]

We will use a subset of just 100 users, the ones with a largest number of ratings. We keep a 20% of them for evaluation purposes, and learn with the remaining 80%:

In [None]:
np.random.seed(7) # for replicability

# We keep only data regarding the 100 users with the largest number of ratings
#user_id_most_raters = data.groupby('user_id').size().sort_values(ascending=False).head(100).keys()
#data = data[data['user_id'].isin(user_id_most_raters)].copy()
data = data[data['user_id']<=100].copy() # get only data from 100 users
print('Dataset size:', data.shape)
print('Usuaris:', data.user_id.nunique())
print('Films:',data.movie_id.nunique())

# Make train/test partition
msk = np.random.rand(len(data)) < 0.8
trData = data[msk]
tsData = data[~msk]

print("The training set has "+ str(trData.shape[0]) +" ratings, and the test set "+ str(tsData.shape[0]))

We use the similarity function defined in the second noteboook. By default, we use Pearson Similarity and a lower bound of 5 minimum items rated in common to measure it:

In [None]:
from scipy.stats import pearsonr
from scipy.spatial.distance import euclidean

# euclidean distance based similarity using scipy's euclidean definition
def euclideanSimilarity(v1, v2):
    return 1.0 / (1.0 + euclidean(v1,v2))

# wrapper for pearson correlation similarity which uses scipy's definition
def pearsonSimilarity(v1, v2):
    res = pearsonr(v1, v2)[0]
    if math.isnan(res) or res < 0:
        res = 0
    return res

# Returns a similarity score for two users
def similarityFunction(myData, user1, user2, similarity=pearsonSimilarity, minCommonItems=5):
    # Get movies rated by user1
    movies_user1 = myData[myData['user_id'] == user1]
    # Get movies rated by user2
    movies_user2 = myData[myData['user_id'] == user2]
    
    # Find commonly rated films
    rep = pd.merge(movies_user1, movies_user2, on='movie_id')    
    if len(rep) < minCommonItems:
        return 0   

    return similarity(rep['rating_x'], rep['rating_y'])

**Copy** in the following cell you implementation of the CF class that you implemented in the second notebook:

In [None]:

### TODO: copy here your implementation of CollaborativeFiltering class from Notebook 2


Now, we train it with the training subset:

In [None]:
my_recsys = CollaborativeFiltering()
my_recsys.fit(trData)

In this notebook, we are interested in defining how we are going to evaluate the recommender systems that we build.

## Evaluation criteria: metrics

Performance evaluation of recommender systems is itself an entire research topic. Some commonly used metrics include:<br>
* $RMSE = \sqrt{(\frac{\sum(\hat{y}-y)^2}{n})}$
* Precision / Recall / F-scores
* ROC curves
* Cost curves

Let's implement the root mean square error (RMSE):

In [None]:
def rmse(y_pred, y_true):
    """ Compute Root Mean Squared Error. """
    return np.sqrt(np.mean(np.power(y_pred - y_true, 2)))

def evaluate(estimation_func, myTsData, metric=rmse):
    """ RMSE-based predictive performance evaluation with pandas. """
    
    # we keep the pairs user-movie for which we are going to obtain an estimation
    pairs_to_estimate = zip(myTsData.user_id, myTsData.movie_id)

    # we do obtain the estimations
    estimated_values = np.array([estimation_func(u,i) for (u,i) in pairs_to_estimate ])

    # finally, we compare the estimations and the real values with the chosen metric
    real_values = myTsData.rating.values
    return metric(estimated_values, real_values)

Now we calculate the RMSE of this estimation procedure within the test data (this might take a while!):

In [None]:
print('RMSE for Collaborative Recommender: %s' % evaluate(my_recsys.predict, tsData))

Thus, we obtain a performance measure for our RecSys that we can use for evaluation or model selection. In this context, RMSE and MAE (mean absolute error) are the most popular metrics. 

<div class="alert alert-success">
Question #1.-<br>
<span style="color:black">Implement MAE.
</span></div>

In [None]:
# Question 1
def mae(y_pred, y_true):
        print("mae: not implemented yet")


In [None]:
print('MAE for Collaborative Recommender: %s' % evaluate(my_recsys.predict, tsData, metric=mae))

However, the main criticism of these metrics is that they really don't measure user experience. 
As users are commonly presented with a set of `N` recommendations for they to choose from, evaluating according to the top-N recommendations is probably closer to what is important to the final user.
<br>

<div class="alert alert-success">
Question #2.-<br>
<span style="color:black">Create a new method in the recommender class that returns the top-N recommendations for a user, where N is a parameter of the method. Recommendations need to be movies unseen by the user.</span></div>

In [None]:
# Question 2
import operator

def get_top_N_recommendations(self, user_id, N=10):
    print("get_top_N_recommendations: not implemented yet")


# We add this function to the CF class: it returns the N items with largest 
# estimated rating as recommendations
CollaborativeFiltering.get_top_N_recommendations = get_top_N_recommendations

And, now, we can obtain the set of movies recommended to user 'user_id':

In [None]:
my_recsys.get_top_N_recommendations(user_id=1,N=20)

How can we evaluate this type of recommendation? 

### Mean Average Precision at N
In machine learning, we define precision as:
$$ P=\frac{TP}{FP+TP}$$
that is, the proportion of really relevant items among all the recommended items.

Precision at K is defined as the precision calculated by considering only the first K recommendations:
$$ P@K=\frac{TP@K}{K}$$

But which is the appropriate `K` value? To deal with this inconvenience, we use the average precision, which summarizes over different values as follows,

$$AP@N = \frac{1}{\min(N,m)}\sum_{K=1}^N P@K \cdot rel(u,k)$$

where `m` is the number of really relevant elements, and $rel(u,k)$ is a function that tells whether the `k`-th recommended element is really relevant to user `u` or not.

<div class="alert alert-success">
Question #3.-<br>
<span style="color:black">Why do we normalize the Average Precision with $\min(m,N)$?</span></div>

[[ Your answer here! ]]

<div class="alert alert-success">
Question #4.-<br>
<span style="color:black">Implement AP@N.</span></div>

In [None]:
# Question 4
def APatN(l_pred, l_real, N):
    '''Calculate Average Precision at N. Assumes that there is no repeated elements in l_pred, 
       and l_pred needs to be in descending ordering. '''
    AP = 0.0
    TP = 0.0

    for i,item in enumerate(l_pred):
        if item in l_real:
            TP += 1
            ### TODO: computate Precision@K and add it to AP

    return AP / min(len(l_real), N)

Now, we can test the Average Precision at N of the recommendations that we make to an user:

In [None]:
u=7
N=10
l_pred = my_recsys.get_top_N_recommendations(user_id=u,N=N)

userLikedMovies = tsData[ tsData['user_id'] == u ] 
l_real = list(userLikedMovies.movie_id[ userLikedMovies['rating'] > 3 ]) # we assume like if rating > 3

print('AP@N (with N=',N,') for user',u,': %s' % APatN(l_pred, l_real, N))

Note that, so far, we only have considered the case of a single recommendation (for a single user). To average over multiple users, we use the Mean Average Precision:
$$MAP@N=\frac{1}{|U|}\sum_{u=1}^{|U|}AP_u@N=\frac{1}{|U|}\sum_{u=1}^{|U|}\frac{1}{\min(N,m)}\sum_{K=1}^{N}P_u@K\cdot rel(u,k)$$

<div class="alert alert-success">
Question #5.-<br>
<span style="color:black">Implement MAP@N.</span></div>

In [None]:
# Question 5
def MAPatN(lists_pred, lists_real, N):
    print("get_top_N_recommendations: not implemented yet")


def evaluate_topN(estimation_func, myTsData, metric=MAPatN, N=10):
    '''Performance evaluation of Top-N-based recommendations.'''

    dfPosScores = myTsData[ myTsData['rating'] > 3 ] # we assume like if rating > 3
    lists_real = dfPosScores.groupby('user_id')['movie_id'].apply(list)
    lists_real = [ lists_real.iloc[i] for i in range(len(lists_real))]
    
    # find the users
    users_in_tspos = list(dfPosScores.user_id.unique())

    # we do obtain the recommendations for all the users
    lists_recommendations = [ estimation_func(u,N) for u in users_in_tspos ]

    return metric(lists_recommendations, lists_real, N)

Now, we can test the CF RecSys when it provides Top-N recommendations (we keep a subsample of the test set; otherwise, this might take so long!):

In [None]:
N = 10
# select a subset of users
users = list(tsData['user_id'].unique())
random.shuffle(users)
users = users[:int(len(users)*0.1)]
# select the data of that subsample of users
tsSubData = tsData[tsData['user_id'].isin(users)]

print('MAP@N (with N=',N,') for Collaborative Recommender: %s' % evaluate_topN(my_recsys.get_top_N_recommendations, 
                                                                               tsSubData,
                                                                               N=N))