In [None]:
# Assignment 4

Welcome to the last assignment! Here you will evaluate the recommenders you developed in the previous assignments. We will use part of the MovieLens 20M dataset.

You will write and execute your code in Python using this Jupyter Notebook.

**TASK:** Your job is to: (1) Copy your code from previous assignments into the correct place in the python files `Recommender_CB.py`, `Recommender_CF_UU.py`, and `Recommender_MF.py`.
(2) Fill in the missing code in this notebook.
In both cases, the place to enter your code is clearly marked with comments.

**SUBMISSION:** You will submit this Notebook via the Interface of JupyterHub.

- Submissions are possible until **25.04.2023 23:59 CEST**.
- Do **NOT** rename the file it needs to be named as "assignment4.ipynb" (in the case if you want to run the Jupyter Notebook offline).
- Please **save** ("File -> "Save and Checkpoint") and **close** your Jupyter Notebook ("file" -> "Close and Halt") before you hand in your solution.
- We will use our own python files to validate if your implementation in the `assignment4.ipynb` file is correct. You can submit your python files if you want or delete them before you submit. If you delete them you will get an error during validation (import failes obviously) in this case just ignore the error message.
- Please note that the validation function only checks if there are syntax errors in your implementation. There is also a time-based constraint set to 30 seconds, which is too short in many cases. The resulting timeout error has nothing to do with the automatic evaluation and can be ignored.

**PLEASE NOTE THE FOLLOWING:**
- Fill only missing code. The place to enter your code is clearly marked with comments.
- do not change the locked cells
- check that there are no Python syntax errors
- check your code and also if your code solves the required tasks
- check if you have not created any infinite loops
- A line/cell must not take longer than 10 min, otherwise the line will be marked as faulty during automatic evaluation

**GRADING:** We will test whether your code produces the expected output. Therefore hidden tests will compare results of the standard solution with yours (based on the whole dataset - multiple, randomly selected inputs - accuracy of the solution must be within two decimal places). Note that the visible test cells are only an indicator for the correctness of your solution. They **do not** guarantee that your solution is correct. 

Late submissions are not possible. We will automatically collect all submissions at the end of the deadline!

We reserve the right to carry out automatic plagiarism checks. Please do not exploit the submission system. We will look at all submissions and such submissions will be scored 0 points.

## Preparation
Importing necessary modules.

In [None]:
%load_ext autoreload
%autoreload 2

import csv
import pandas as pd
import numpy as np

#if you wish to disable warnings, uncomment the following two lines
#import warnings
#warnings.filterwarnings("ignore")

In [None]:
np.set_printoptions(threshold=500, precision=4)
pd.options.display.max_seq_items = 100
%precision 4

Make sure to enter the correct location of your data.

In [None]:
data_directory = '~/shared/data/assignment3/'

## Create the movies DataFrame

In [None]:
links = pd.read_csv(data_directory + 'links.csv')
movies_plain = pd.read_csv(data_directory + 'movies.csv')
metadata = pd.read_csv(data_directory + 'movies_metadata.csv', low_memory=False)
metadata.drop(metadata.columns[[0,1,2,4,6,7,8,10,11,12,13,14,15,16,17,18,19,20,21,22,23]], axis=1, inplace=True)
keywords = pd.read_csv(data_directory + 'keywords.csv', low_memory=False)
credits = pd.read_csv(data_directory + 'credits.csv', low_memory=False)

keywords['id'] = keywords['id'].astype('int')
links=links[links['tmdbId'].isnull()==False]
links['tmdbId'] = links['tmdbId'].astype('int')
metadata = metadata.drop([19730, 29503, 35587])
metadata['id'] = metadata['id'].astype('int')
credits['id'] = credits['id'].astype('int')

movies = metadata.merge(links, how='inner', left_on='id', right_on='tmdbId')
movies = movies.merge(movies_plain, how='inner', left_on='movieId', right_on='movieId')
movies = movies.merge(keywords, how='inner', left_on='id', right_on='id')
movies = movies.merge(credits, how='inner', left_on='id', right_on='id')
movies = movies.drop(columns=['tmdbId','genres_y'])
movies.rename(columns={'genres_x': 'genres'}, inplace=True)

movies=movies[movies['overview'].isnull()==False]

movies = movies[movies['movieId'] < 1000]

from ast import literal_eval

features = ['cast', 'crew', 'keywords', 'genres']
for feature in features:
    movies[feature] = movies[feature].apply(literal_eval)
    

# Get the director's name from the crew feature. If director is not listed, return NaN
def get_director(x):
    for i in x:
        if i['job'] == 'Director':
            return i['name']
    return np.nan

# Returns the list top 3 elements or entire list; whichever is more.
def get_list(x):
    if isinstance(x, list):
        names = [i['name'] for i in x]
        #Check if more than 3 elements exist. If yes, return only first three. If no, return entire list.
        if len(names) > 3:
            names = names[:3]
        return names

    #Return empty list in case of missing/malformed data
    return []

# Define new director, cast, genres and keywords features that are in a suitable form.
movies['director'] = movies['crew'].apply(get_director)

features = ['cast', 'keywords', 'genres']
for feature in features:
    movies[feature] = movies[feature].apply(get_list)

    
# Function to convert all strings to lower case and strip names of spaces
def clean_data(x):
    if isinstance(x, list):
        return [str.lower(i.replace(" ", "")) for i in x]
    else:
        # Check if string exists. If not, return empty string
        if isinstance(x, str):
            return str.lower(x.replace(" ", ""))
        else:
            return ''

# Apply clean_data function to your features.
features = ['cast', 'keywords', 'director', 'genres']

for feature in features:
    movies[feature] = movies[feature].apply(clean_data)

    
# Drop duplicate movies   
import collections
movie_ids = movies['movieId'].tolist()
movie_ids_dup = [x for  x, y in collections.Counter(movie_ids).items() if y > 1]

for movie_id in movie_ids_dup:
    to_drop = movies.index[movies.movieId == movie_id].tolist()[1:]
    movies.drop(to_drop, inplace=True)

movies.drop(columns='crew', inplace=True)


movies.rename(columns={'overview':'plot'}, inplace=True)

def create_metadata(x):
        return ' '.join(x['keywords']) + ' ' + ' '.join(x['cast']) + ' ' + x['director'] + ' ' + ' '.join(x['genres'])  

# Create a new metadata feature
movies['metadata'] = movies.apply(create_metadata, axis=1)

display(movies.head())

## Create the ratings DataFrame

In [None]:
ratings = pd.read_csv(data_directory + 'ratings.csv')
ratings = ratings.drop(columns=['timestamp'])
ratings = ratings[(ratings['userId'] < 1000) & (ratings['movieId'] < 100) ]

ratings = ratings[ratings['movieId'].isin(movies['movieId'])]

## keep users with more than 2 ratings
ratings_count = ratings.groupby(['userId', 'movieId']).size().groupby('userId').size()
ratings_ok = ratings_count[ratings_count >= 2].reset_index()[['userId']]
ratings = ratings.merge(ratings_ok, 
               how = 'right',
               left_on = 'userId',
               right_on = 'userId')


ratings.columns = ['user', 'item', 'rating']

display(ratings.head())

## Split ratings into train and test subsets

In [None]:
from sklearn.model_selection import train_test_split


ratings_train, ratings_test = train_test_split(ratings,
                                               stratify=ratings['user'],
                                               test_size=0.20,
                                               random_state=42)



### keep only users which have at least one positive (>3) ratings in train
positive_ratings = ratings_train[ratings_train['rating']>3]
positive_userIds = positive_ratings['user'].unique()

ratings_train = ratings_train[ratings_train['user'].isin(positive_userIds)]
ratings_test = ratings_test[ratings_test['user'].isin(positive_userIds)]


### keep in test only ratings for movies which appear in train
ratings_test = ratings_test[ratings_test['item'].isin(ratings_train['item'])]

item_ids = ratings_train['item'].unique()
item_ids.sort()
print(f'{len(item_ids)} items overall')

print(len(ratings_train['user'].unique()), 'users,', len(ratings_train['item'].unique()), 'items,', len(ratings_train.index), 'ratings in train set')
print(len(ratings_test['user'].unique()), 'users,', len(ratings_test['item'].unique()), 'items,', len(ratings_test.index), 'ratings in test set')

In [None]:
## trim movies dataframe to contain only movies in item_ids

movies[movies['movieId'].isin(item_ids)]

movies.rename(columns={'movieId': 'item_id'}, inplace=True)

## Import the recommenders --- TO EDIT EXTERNALLY

You will implement the three recommenders from the previous assignments as classes in separate files. 

Fill in the missing code in the files `Recommender_CB.py`, `Recommender_CF_UU.py`, and `Recommender_MF.py`. For the biggest part, you have to copy over the code you wrote for the previous assignments. 

IMPORTANT: because the recommenders are now implemented as separate classes, you may have to prefix non local variables with `self.` so that they are visible.

In general, the idea is to implement a simple recommender API:
```
class Recommender:  
    def __init__(self):
        pass
    
    def build_model(self, ratings_train, movies):
        pass
    
    def recommend(self, user_id, item_ids=None, topN=20):
        pass
```

Once done, we can import these classes.

In [None]:
from Recommender_CB import Recommender_CB
from Recommender_CF_UU import Recommender_CF_UU
from Recommender_MF import Recommender_MF

## Testing the recommenders

Make sure you have correctly copied the code from previous assignments to the right place.

In [None]:
cbr = Recommender_CB('plot')
cbr.build_model(ratings_train, movies)

print(cbr.recommend(10))
print(cbr.recommend(100))
print(cbr.recommend(200))

**EXPECTED OUTPUT:**
```
[16, 36, 21, 8, 94, 69, 84, 76, 18, 60, 52, 35, 89, 57, 23, 58, 34, 55, 83, 2]
[16, 36, 8, 21, 94, 69, 76, 35, 89, 52, 23, 84, 99, X, X, X, X, X, X, X]
[92, 71, 50, 37, 65, 29, 61, 79, 39, 47, 2, 84, 88, 73, 21, 86, 51, 7, 13, 67]
```
where `X` means any item id, as from this point on all remaining items have zero similarity with the user profile.

In [None]:
uucf = Recommender_CF_UU()
uucf.build_model(ratings_train)

print(uucf.recommend(10))
print(uucf.recommend(100))
print(uucf.recommend(200))

**EXPECTED OUTPUT:**
```
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]
[38, 65, 35, 67, 73, 12, 37, 63, 82, 54, 50, 66, 60, 34, 47, 17, 18, 97, 4, 45]
[67, 90, 97, 50, 73, 94, 16, 47, 58, 55, 26, 78, 99, 82, 57, 85, 61, 11, 25, 84]
```

In [None]:
mfr = Recommender_MF()
mfr.build_model(ratings_train)

print(mfr.recommend(10))
print(mfr.recommend(100))
print(mfr.recommend(200))

**EXPECTED OUTPUT:**
```
[50, 47, 17, 6, 32, 36, 16, 62, 58, 73, 29, 72, 11, 82, 84, 41, 96, 8, 7, 92]
[50, 47, 1, 17, 6, 36, 16, 58, 62, 73, 72, 11, 29, 41, 21, 92, 82, 84, 42, 74]
[50, 47, 1, 6, 25, 36, 58, 16, 72, 29, 62, 11, 73, 82, 42, 92, 84, 34, 21, 41]
```

## Implement the Evaluator --- TO EDIT

The following class evaluates the ranking produced by a recommender.

You need to edit the following functions:
- `get_ground_truth` in 1 place
- `get_ranking_metrics` in 3 places


**NOTE:** You should implement the following variant of DCG (there is a slight difference from the course slides):

$$\text{DCG}@k = \sum_{i=1}^k \frac{2^{rel_i}-1}{\log(i+1)}$$

where $rel_i$ is the relevance score of the item at position $i$ in the ranking. The definition of IDCG, and thus NDCG, is the same.

In [None]:
import math
import random
import warnings


DEBUG = True

if DEBUG:
    random.seed(42)

class Evaluator:
    
    def __init__(self, topN=20):
        self.topN = topN

    
    def init_data(self, ratings_train, ratings_test):
        self.ratings_train = ratings_train
        self.ratings_test = ratings_test
        self.find_unrated_items()
    
    
    ### store for each user her/his unrated items
    def find_unrated_items(self):
        all_items = set(self.ratings_train['item'].tolist())
        
        self.unrated = {}
        
        for user_id in self.ratings_train['user'].unique():
            rated_train_items = self.ratings_train[self.ratings_train['user'] == user_id]['item'].tolist()
            rated_test_items = self.ratings_test[self.ratings_test['user'] == user_id]['item'].tolist()

            rated_items = set(rated_train_items) | set(rated_test_items) # union of sets
            unrated_items = list(all_items - rated_items)
            random.shuffle(unrated_items)
            
            self.unrated[user_id] = unrated_items
            
    
    ### get the ratings of user_id in ratings_test
    def get_ground_truth(self, user_id):
        
        ## get the test ratings of user_id as a DataFrame subset of `self.ratings_test`
        # YOUR CODE HERE
        user_ratings = self.ratings_test.loc[self.ratings_test["user"]==user_id]
        
        ## dictionary of ground truth ratings
        ground_truth = pd.Series(user_ratings['rating'].values, index=user_ratings['item']).to_dict()

        return ground_truth
    
    
    def get_recommendations(self, model, user_id):
        ground_truth = self.get_ground_truth(user_id)
        n_test = len(ground_truth)
        
        ## we will create a total of topN items, and ask the recommender to rank them
        ## among these items, we will include the ground truth items
        
        ## 1. select (topN - n_test) unrated items
        item_ids = self.unrated[user_id][:self.topN - n_test]
        
        ## 2; add ground truth items
        item_ids = item_ids + list(ground_truth.keys())
        
        ## get the model's ranking
        recommendations = model.recommend(user_id, item_ids, self.topN)
        return recommendations
    
    
    ### evaluate the model on given user_id
    def eval_model_on_user(self, model, user_id, verbose=True):
        ground_truth = self.get_ground_truth(user_id)
        if verbose:
            print('ground truth', ground_truth)
        n_test = len(ground_truth)
        
        ## we will create a total of topN items, and ask the recommender to rank them
        ## among these items, we will include the ground truth items
        
        ## 1. select (topN - n_test) unrated items
        item_ids = self.unrated[user_id][:self.topN - n_test]
        
        ## 2; add ground truth items
        item_ids = item_ids + list(ground_truth.keys())
        
        ## get the model's ranking
        recommendations = model.recommend(user_id, item_ids, self.topN)
        if verbose:
            print('recommendations', recommendations)
        
        ## evaluate the ranking
        metrics = self.get_ranking_metrics(ground_truth, recommendations)
        
        return metrics
    
    
    ### evaluate the model on all users
    def eval_model(self, model, n_users=-1):
        metrics_all = []
        count = 0;
        for user_id in self.ratings_train['user'].unique():
            count+=1
            print("\r", "evaluated on ", count, " users", end="", sep="")
            metrics = self.eval_model_on_user(model, user_id, verbose=False)
            if metrics is None:
                continue
            metrics_all.append(metrics)
            if count == n_users:
                break
        
        print("\n")
        
        
        ## store all metrics in a DataFrame for easy manipulation
        metrics_all_df = pd.DataFrame(metrics_all)
        self.metrics_all_df = metrics_all_df        
        
        ## average over all metrics
        hits_array = metrics_all_df.hits
        hits = np.nanmean(hits_array)
        ap_array = metrics_all_df.ap
        ap = np.nanmean(ap_array)
        
        rec_array = np.vstack(metrics_all_df.rec)
        prec_array = np.vstack(metrics_all_df.prec)
        ndcg_array = np.vstack(metrics_all_df.ndcg)
        
        
        with warnings.catch_warnings(): ## ignore division by 0
            warnings.simplefilter("ignore", category=RuntimeWarning)
            rec = np.nanmean(rec_array, axis=0)
            prec = np.nanmean(prec_array, axis=0)
            ndcg = np.nanmean(ndcg_array, axis=0)
        
        
        metrics_avg = {'hits':hits,
                   'ap':ap,
                   'rec':np.array(rec),
                   'prec':np.array(prec),
                   'ndcg':np.array(ndcg)}
        
        return metrics_avg
        
    ### get some evaluation metrics for ranking with respect to ground_truth
    def get_ranking_metrics(self, ground_truth, ranking):
        n_test = len(ground_truth)
        if n_test == 0:
            return None
        
        hits = 0 ## number of relevant in ranking
        rec = [] ## recall at every position of ranking
        prec = [] ## precision at every position of ranking
        dcg = [] ## DCG at every position of ranking
        ap = 0 ## average precision
        
                
        ## scan the ranking and compute hits, rec, prec, dcg, ap
        # YOUR CODE HERE
        
        ground_truth = dict(ground_truth)
        for key, value in ground_truth.items():
            if key in ranking:
                if value>0:
                    hits+=1
        
        prec_num = 0; rec_num = 0; dcg = []; dcg_val = 0
        for i in range(len(ranking)):
            if ranking[i] in ground_truth.keys():
                value = ground_truth[ranking[i]]
                if value>0:
                    prec_num+=1
                    rec_num+=(1/hits)
                    ap+=round(prec_num/(i+1), 3)
                dcg_val += ((2**(value))-1)/math.log((i+1)+1)
            else:
                dcg_val += 0
            dcg.append(dcg_val)
            prec.append(round(prec_num/(i+1), 3))
            rec.append(round(rec_num, 3))
        
        if (hits != 0):
            ap /= hits
        else:
            ap = 0
        
        ## constuct the ideal ranking from ground truth to compute idcg
        ideal = sorted(ground_truth, key=ground_truth.get, reverse=True)
        idcg = []
        
        ## scan the ideal ranking and compute idcg
        # YOUR CODE HERE
        idcg_val = 0
        for i in range(len(ideal)):
            if ideal[i] in ground_truth.keys():
                value = ground_truth[ideal[i]]
                idcg_val += ((2**(value))-1)/math.log((i+1)+1)
            else:
                idcg_val += 0
            idcg.append(idcg_val)
        
        
        ## make sure the dcg and idcg lists have the same length
        if len(ideal) >= len(ranking):
            idcg = idcg[:len(ranking)]
        else:
            last_idcg = idcg[-1]
            for i in range(len(ranking) - len(ideal)):
                idcg.append(last_idcg)
        
        ## compute NDCG = DCG/IDCG
        ## TIP convert lists to `np.array` to do the division and then back to a list with `.tolist()`
        # YOUR CODE HERE
        ndcg = []
        for i in range(len(dcg)):
            ndcg.append(dcg[i]/idcg[i])
        
        rec = np.array(rec)
        prec = np.array(prec)
        ndcg = np.array(ndcg)
        
        ## make them have length self.topN, fill in with nan 
        rec = np.append(rec, np.repeat(np.nan, self.topN - len(rec)))
        prec = np.append(prec, np.repeat(np.nan, self.topN - len(prec)))
        ndcg = np.append(ndcg, np.repeat(np.nan, self.topN - len(ndcg)))
        
        metrics = {'hits':hits,
                   'ap':ap,
                   'rec':np.array(rec),
                   'prec':np.array(prec),
                   'ndcg':np.array(ndcg)}
        
        return metrics

## Test the ranking metrics

In [None]:
evl = Evaluator(topN = 10)

ground_truth = {200:5, 100:4, 400:3, 1000:3}
ranking = list(range(100, 1100, 100))
display(evl.get_ranking_metrics(ground_truth, ranking))

ground_truth = {100:4, 200:3, 400:3, 1000:3}
ranking = list(range(100, 1100, 100))
display(evl.get_ranking_metrics(ground_truth, ranking))

ground_truth = {100:1, 200:1, 300:1, 400:1}
ranking = list(range(100, 1100, 100))
display(evl.get_ranking_metrics(ground_truth, ranking))

ground_truth = {100:1, 200:1, 300:1, 400:1}
ranking = list(range(100, 1100, 100))
display(evl.get_ranking_metrics(ground_truth, ranking))

ground_truth = {200:5, 100:4, 400:3, 1000:3}
ranking = list(range(1000, 0, -100))
display(evl.get_ranking_metrics(ground_truth, ranking))

ground_truth = {100:4, 200:3, 400:3, 1000:3}
ranking = list(range(1000, 0, -100))
display(evl.get_ranking_metrics(ground_truth, ranking))



evl = Evaluator(topN = 5)

ground_truth = {200:5, 100:4, 400:3, 1000:3}
ranking = list(range(100, 1100, 100))[:5]
display(evl.get_ranking_metrics(ground_truth, ranking))

ground_truth = {100:4, 200:3, 400:3, 1000:3}
ranking = list(range(100, 1100, 100))[:5]
display(evl.get_ranking_metrics(ground_truth, ranking))

ground_truth = {100:1, 200:1, 300:1, 400:1}
ranking = list(range(100, 1100, 100))[:5]
display(evl.get_ranking_metrics(ground_truth, ranking))

ground_truth = {100:1, 200:1, 300:1, 400:1}
ranking = list(range(100, 1100, 100))[:5]
display(evl.get_ranking_metrics(ground_truth, ranking))

**EXPECTED OUTPUT:**
```
{'hits': 4,
 'ap': 0.7875,
 'rec': array([0.25, 0.5 , 0.5 , 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 1.  ]),
 'prec': array([1.    , 1.    , 0.6667, 0.75  , 0.6   , 0.5   , 0.4286, 0.375 ,
        0.3333, 0.4   ]),
 'ndcg': array([0.4839, 0.8541, 0.7861, 0.7998, 0.7998, 0.7998, 0.7998, 0.7998,
        0.7998, 0.8429])}
{'hits': 4,
 'ap': 0.7875,
 'rec': array([0.25, 0.5 , 0.5 , 0.75, 0.75, 0.75, 0.75, 0.75, 0.75, 1.  ]),
 'prec': array([1.    , 1.    , 0.6667, 0.75  , 0.6   , 0.5   , 0.4286, 0.375 ,
        0.3333, 0.4   ]),
 'ndcg': array([1.    , 1.    , 0.8473, 0.865 , 0.865 , 0.865 , 0.865 , 0.865 ,
        0.865 , 0.9431])}
{'hits': 4,
 'ap': 1.0000,
 'rec': array([0.25, 0.5 , 0.75, 1.  , 1.  , 1.  , 1.  , 1.  , 1.  , 1.  ]),
 'prec': array([1.    , 1.    , 1.    , 1.    , 0.8   , 0.6667, 0.5714, 0.5   ,
        0.4444, 0.4   ]),
 'ndcg': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])}
{'hits': 4,
 'ap': 1.0000,
 'rec': array([0.25, 0.5 , 0.75, 1.  , 1.  , 1.  , 1.  , 1.  , 1.  , 1.  ]),
 'prec': array([1.    , 1.    , 1.    , 1.    , 0.8   , 0.6667, 0.5714, 0.5   ,
        0.4444, 0.4   ]),
 'ndcg': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])}
{'hits': 4,
 'ap': 0.5048,
 'rec': array([0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.5 , 0.5 , 0.75, 1.  ]),
 'prec': array([1.    , 0.5   , 0.3333, 0.25  , 0.2   , 0.1667, 0.2857, 0.25  ,
        0.3333, 0.4   ]),
 'ndcg': array([0.2258, 0.173 , 0.1592, 0.149 , 0.149 , 0.149 , 0.1987, 0.1987,
        0.3973, 0.4896])}
{'hits': 4,
 'ap': 0.5048,
 'rec': array([0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.5 , 0.5 , 0.75, 1.  ]),
 'prec': array([1.    , 0.5   , 0.3333, 0.25  , 0.2   , 0.1667, 0.2857, 0.25  ,
        0.3333, 0.4   ]),
 'ndcg': array([0.4667, 0.3605, 0.3055, 0.2699, 0.2699, 0.2699, 0.3599, 0.3599,
        0.4412, 0.6084])}
{'hits': 3,
 'ap': 0.9167,
 'rec': array([0.25, 0.5 , 0.5 , 0.75, 0.75]),
 'prec': array([1.    , 1.    , 0.6667, 0.75  , 0.6   ]),
 'ndcg': array([0.4839, 0.8541, 0.7861, 0.7998, 0.7998])}
{'hits': 3,
 'ap': 0.9167,
 'rec': array([0.25, 0.5 , 0.5 , 0.75, 0.75]),
 'prec': array([1.    , 1.    , 0.6667, 0.75  , 0.6   ]),
 'ndcg': array([1.    , 1.    , 0.8473, 0.865 , 0.865 ])}
{'hits': 4,
 'ap': 1.0000,
 'rec': array([0.25, 0.5 , 0.75, 1.  , 1.  ]),
 'prec': array([1. , 1. , 1. , 1. , 0.8]),
 'ndcg': array([1., 1., 1., 1., 1.])}
{'hits': 4,
 'ap': 1.0000,
 'rec': array([0.25, 0.5 , 0.75, 1.  , 1.  ]),
 'prec': array([1. , 1. , 1. , 1. , 0.8]),
 'ndcg': array([1., 1., 1., 1., 1.])}
```

## Test the Evaluator

In [None]:
evl = Evaluator(topN = 20)

evl.init_data(ratings_train, ratings_test)

Show the ground truth for users 10, 100, 200

In [None]:
display(evl.get_ground_truth(10))
display(evl.get_ground_truth(100))
display(evl.get_ground_truth(200))

Evaluate UU-CF on users 10, 100, 200:

In [None]:
display(evl.eval_model_on_user(uucf, 10))
display(evl.eval_model_on_user(uucf, 100))
display(evl.eval_model_on_user(uucf, 200))

**EXPECTED OUTPUT:**
```
ground truth {11: 4.0}
recommendations [83, 37, 16, 38, 34, 14, 26, 78, 3, 86, 77, 31, 67, 79, 4, 49, 68, 94, 20, 11]
{'hits': 1,
 'ap': 0.0500,
 'rec': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 1.]),
 'prec': array([0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ,
        0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.05]),
 'ndcg': array([0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    , 0.    , 0.2277])}
ground truth {50: 5.0}
recommendations [38, 65, 67, 50, 28, 6, 96, 94, 10, 95, 24, 7, 31, 89, 77, 71, 46, 93, 5, 51]
{'hits': 1,
 'ap': 0.2500,
 'rec': array([0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1.]),
 'prec': array([0.    , 0.    , 0.    , 0.25  , 0.2   , 0.1667, 0.1429, 0.125 ,
        0.1111, 0.1   , 0.0909, 0.0833, 0.0769, 0.0714, 0.0667, 0.0625,
        0.0588, 0.0556, 0.0526, 0.05  ]),
 'ndcg': array([0.    , 0.    , 0.    , 0.4307, 0.4307, 0.4307, 0.4307, 0.4307,
        0.4307, 0.4307, 0.4307, 0.4307, 0.4307, 0.4307, 0.4307, 0.4307,
        0.4307, 0.4307, 0.4307, 0.4307])}
ground truth {6: 5.0, 21: 5.0}
recommendations [94, 6, 41, 69, 4, 3, 76, 21, 40, 8, 24, 39, 88, 23, 15, 37, 49, 71, 56, 51]
{'hits': 2,
 'ap': 0.3750,
 'rec': array([0. , 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1. , 1. , 1. , 1. , 1. , 1. ,
        1. , 1. , 1. , 1. , 1. , 1. , 1. ]),
 'prec': array([0.    , 0.5   , 0.3333, 0.25  , 0.2   , 0.1667, 0.1429, 0.25  ,
        0.2222, 0.2   , 0.1818, 0.1667, 0.1538, 0.1429, 0.1333, 0.125 ,
        0.1176, 0.1111, 0.1053, 0.1   ]),
 'ndcg': array([0.    , 0.3869, 0.3869, 0.3869, 0.3869, 0.3869, 0.3869, 0.5803,
        0.5803, 0.5803, 0.5803, 0.5803, 0.5803, 0.5803, 0.5803, 0.5803,
        0.5803, 0.5803, 0.5803, 0.5803])}
```

Evaluate MF on users 10, 100, 200:

In [None]:
display(evl.eval_model_on_user(mfr, 10))
display(evl.eval_model_on_user(mfr, 100))
display(evl.eval_model_on_user(mfr, 200))

**EXPECTED OUTPUT:**
```
ground truth {11: 4.0}
recommendations [16, 11, 67, 37, 34, 14, 86, 79, 26, 94, 77, 78, 68, 31, 38, 49, 83, 20, 4, 3]
{'hits': 1,
 'ap': 0.5000,
 'rec': array([0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1.]),
 'prec': array([0.    , 0.5   , 0.3333, 0.25  , 0.2   , 0.1667, 0.1429, 0.125 ,
        0.1111, 0.1   , 0.0909, 0.0833, 0.0769, 0.0714, 0.0667, 0.0625,
        0.0588, 0.0556, 0.0526, 0.05  ]),
 'ndcg': array([0.    , 0.6309, 0.6309, 0.6309, 0.6309, 0.6309, 0.6309, 0.6309,
        0.6309, 0.6309, 0.6309, 0.6309, 0.6309, 0.6309, 0.6309, 0.6309,
        0.6309, 0.6309, 0.6309, 0.6309])}
ground truth {50: 5.0}
recommendations [50, 6, 7, 67, 96, 28, 89, 77, 71, 51, 94, 10, 46, 38, 31, 5, 24, 95, 65, 93]
{'hits': 1,
 'ap': 1.0000,
 'rec': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1.]),
 'prec': array([1.    , 0.5   , 0.3333, 0.25  , 0.2   , 0.1667, 0.1429, 0.125 ,
        0.1111, 0.1   , 0.0909, 0.0833, 0.0769, 0.0714, 0.0667, 0.0625,
        0.0588, 0.0556, 0.0526, 0.05  ]),
 'ndcg': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1.])}
ground truth {6: 5.0, 21: 5.0}
recommendations [6, 21, 41, 8, 56, 37, 76, 88, 51, 40, 71, 69, 94, 23, 4, 49, 15, 24, 3, 39]
{'hits': 2,
 'ap': 1.0000,
 'rec': array([0.5, 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ,
        1. , 1. , 1. , 1. , 1. , 1. , 1. ]),
 'prec': array([1.    , 1.    , 0.6667, 0.5   , 0.4   , 0.3333, 0.2857, 0.25  ,
        0.2222, 0.2   , 0.1818, 0.1667, 0.1538, 0.1429, 0.1333, 0.125 ,
        0.1176, 0.1111, 0.1053, 0.1   ]),
 'ndcg': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1.])}
```

Evaluate CB on users 10, 100, 200:

In [None]:
display(evl.eval_model_on_user(cbr, 10))
display(evl.eval_model_on_user(cbr, 100))
display(evl.eval_model_on_user(cbr, 200))

**EXPECTED OUTPUT:**
```
ground truth {11: 4.0}
recommendations [16, 94, 34, 83, 77, 49, 37, 31, 38, 11, 14, 26, 3, 4, 20, 78, 79, 67, 68, 86]
{'hits': 1,
 'ap': 0.1000,
 'rec': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1.]),
 'prec': array([0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.1   , 0.0909, 0.0833, 0.0769, 0.0714, 0.0667, 0.0625,
        0.0588, 0.0556, 0.0526, 0.05  ]),
 'ndcg': array([0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.2891, 0.2891, 0.2891, 0.2891, 0.2891, 0.2891, 0.2891,
        0.2891, 0.2891, 0.2891, 0.2891])}
ground truth {50: 5.0}
recommendations [94, 89, 96, 95, 93, 77, 71, 67, 65, 31, 28, 24, 10, 7, 6, 5, 51, 50, 46, 38]
{'hits': 1,
 'ap': 0.0556,
 'rec': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        1., 1., 1.]),
 'prec': array([0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.0556, 0.0526, 0.05  ]),
 'ndcg': array([0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ,
        0.    , 0.2354, 0.2354, 0.2354])}
ground truth {6: 5.0, 21: 5.0}
recommendations [71, 37, 39, 88, 21, 51, 23, 40, 94, 8, 24, 76, 3, 41, 49, 4, 6, 15, 56, 69]
{'hits': 2,
 'ap': 0.1588,
 'rec': array([0. , 0. , 0. , 0. , 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
        0.5, 0.5, 0.5, 1. , 1. , 1. , 1. ]),
 'prec': array([0.    , 0.    , 0.    , 0.    , 0.2   , 0.1667, 0.1429, 0.125 ,
        0.1111, 0.1   , 0.0909, 0.0833, 0.0769, 0.0714, 0.0667, 0.0625,
        0.1176, 0.1111, 0.1053, 0.1   ]),
 'ndcg': array([0.    , 0.    , 0.    , 0.    , 0.2372, 0.2372, 0.2372, 0.2372,
        0.2372, 0.2372, 0.2372, 0.2372, 0.2372, 0.2372, 0.2372, 0.2372,
        0.3842, 0.3842, 0.3842, 0.3842])}
```

Evaluate UU-CF on all users:

In [None]:
%time metrics = evl.eval_model(uucf)
display(metrics)

**EXPECTED OUTPUT:**
```
evaluated on 665 users

CPU times: user 16 s, sys: 716 ms, total: 16.7 s
Wall time: 16.7 s
{'hits': 1.9125874125874125,
 'ap': 0.22099337649325082,
 'rec': array([0.0261, 0.0876, 0.1472, 0.215 , 0.2773, 0.3395, 0.3999, 0.4783,
        0.5316, 0.5972, 0.6444, 0.6899, 0.7376, 0.7845, 0.8239, 0.8722,
        0.9089, 0.9321, 0.9576, 1.    ]),
 'prec': array([0.0542, 0.0865, 0.0956, 0.1045, 0.1091, 0.1098, 0.1104, 0.1134,
        0.1134, 0.1135, 0.1124, 0.1107, 0.1099, 0.1091, 0.1078, 0.1063,
        0.1042, 0.101 , 0.0982, 0.0956]),
 'ndcg': array([0.0476, 0.0847, 0.1174, 0.1492, 0.1765, 0.2015, 0.2242, 0.252 ,
        0.2704, 0.2906, 0.3056, 0.3199, 0.3327, 0.346 , 0.3559, 0.3675,
        0.3762, 0.3822, 0.3878, 0.3972])}
```

Evaluate MF on all users:

In [None]:
%time metrics = evl.eval_model(mfr)
display(metrics)

**EXPECTED OUTPUT:**
```
evaluated on 665 users

CPU times: user 2.78 s, sys: 358 ms, total: 3.14 s
Wall time: 3.07 s
{'hits': 1.9125874125874125,
 'ap': 0.45428173065156513,
 'rec': array([0.2604, 0.3981, 0.4548, 0.503 , 0.5359, 0.5507, 0.57  , 0.5879,
        0.6033, 0.6223, 0.6469, 0.6607, 0.6769, 0.703 , 0.7293, 0.7503,
        0.7777, 0.8235, 0.8962, 1.    ]),
 'prec': array([0.4073, 0.3243, 0.2558, 0.2168, 0.1867, 0.1617, 0.1431, 0.1298,
        0.1193, 0.1114, 0.106 , 0.1005, 0.0959, 0.093 , 0.0909, 0.0882,
        0.0867, 0.0876, 0.0905, 0.0956]),
 'ndcg': array([0.3694, 0.4236, 0.4456, 0.4672, 0.4793, 0.4843, 0.4912, 0.4971,
        0.5023, 0.5086, 0.5154, 0.5195, 0.5244, 0.5315, 0.5385, 0.5435,
        0.5507, 0.5623, 0.5792, 0.6012])}
```

Evaluate CB on all users:

In [None]:
%time metrics = evl.eval_model(cbr)
display(metrics)

**EXPECTED OUTPUT:**
```
evaluated on 665 users

CPU times: user 1.86 s, sys: 195 ms, total: 2.05 s
Wall time: 2.07 s
{'hits': 1.9125874125874125,
 'ap': 0.25669138784805334,
 'rec': array([0.08  , 0.14  , 0.1938, 0.2567, 0.3136, 0.3609, 0.4126, 0.4441,
        0.4798, 0.5209, 0.5653, 0.6126, 0.6649, 0.7043, 0.7383, 0.7868,
        0.8246, 0.882 , 0.9324, 1.    ]),
 'prec': array([0.1591, 0.1364, 0.1265, 0.1233, 0.122 , 0.1174, 0.1146, 0.1088,
        0.1043, 0.1021, 0.1001, 0.0982, 0.0982, 0.0962, 0.0943, 0.0944,
        0.0937, 0.0941, 0.0945, 0.0956]),
 'ndcg': array([0.1302, 0.1502, 0.1763, 0.2044, 0.2289, 0.2472, 0.2665, 0.2778,
        0.2901, 0.3032, 0.3155, 0.3287, 0.3431, 0.3535, 0.3632, 0.3766,
        0.3868, 0.4017, 0.4135, 0.4292])}
```

In [None]:
# feel free to use this field for additional tests

In [None]:
# feel free to use this field for additional tests

In [None]:
# feel free to use this field for additional tests

In [None]:
# feel free to use this field for additional tests

In [None]:
# feel free to use this field for additional tests

In [None]:
# Hidden

In [None]:
# Hidden