# Import libraries and data

Dataset was obtained in the capstone project description (direct link [here](https://d3c33hcgiwev3.cloudfront.net/_429455574e396743d399f3093a3cc23b_capstone.zip?Expires=1530403200&Signature=FECzbTVo6TH7aRh7dXXmrASucl~Cy5mlO94P7o0UXygd13S~Afi38FqCD7g9BOLsNExNB0go0aGkYPtodekxCGblpc3I~R8TCtWRrys~2gciwuJLGiRp4CfNtfp08sFvY9NENaRb6WE2H4jFsAo2Z2IbXV~llOJelI3k-9Waj~M_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A)) and splited manually in separated csv files. They were stored at my personal github account (folder link [here](https://github.com/caiomiyashiro/RecommenderSystemsNotebooks/tree/master/data/capstone)) and you can download and paste inside your working directory in order for this notebook to run.

In [1]:
import pandas as pd
import numpy as np

## Preprocess data

Float data came with ',' in the csv and python works with '.', so it treated the number as text. In order to convert them to numbers, I first replaced all the commas by punct and then converted the columns to float.

In [2]:
items = pd.read_csv('data/capstone/Capstone Data - Office Products - Items.csv', index_col=0) 
actual_ratings = pd.read_csv('data/capstone/Capstone Data - Office Products - Ratings.csv', index_col=0) 

content_based = pd.read_csv('data/capstone/Capstone Data - Office Products - CBF.csv', index_col=0)
user_user = pd.read_csv('data/capstone/Capstone Data - Office Products - User-User.csv', index_col=0)
item_item = pd.read_csv('data/capstone/Capstone Data - Office Products - Item-Item.csv', index_col=0)
matrix_fact = pd.read_csv('data/capstone/Capstone Data - Office Products - MF.csv', index_col=0)
pers_bias = pd.read_csv('data/capstone/Capstone Data - Office Products - PersBias.csv', index_col=0)

items[['Availability','Price']] = items[['Availability','Price']].apply(lambda col: col.apply(lambda elem: str(elem).replace(',', '.'))).astype(float)

# preprocess
content_based = content_based.apply(lambda col: col.apply(lambda elem: str(elem).replace(',', '.'))).astype(float)
user_user = user_user.apply(lambda col: col.apply(lambda elem: str(elem).replace(',', '.'))).astype(float)
item_item = item_item.apply(lambda col: col.apply(lambda elem: str(elem).replace(',', '.'))).astype(float)
matrix_fact = matrix_fact.apply(lambda col: col.apply(lambda elem: str(elem).replace(',', '.'))).astype(float)
pers_bias = pers_bias.apply(lambda col: col.apply(lambda elem: str(elem).replace(',', '.'))).astype(float)

print('items.shape = ' + str(items.shape))
print('actual_ratings.shape = ' + str(actual_ratings.shape))
print('content_based.shape = ' + str(content_based.shape))
print('user_user.shape = ' + str(user_user.shape))
print('item_item.shape = ' + str(item_item.shape))
print('matrix_fact.shape = ' + str(matrix_fact.shape))
print('pers_bias.shape = ' + str(pers_bias.shape))

actual_ratings.head()

items.shape = (200, 7)
actual_ratings.shape = (200, 100)
content_based.shape = (200, 100)
user_user.shape = (200, 100)
item_item.shape = (200, 100)
matrix_fact.shape = (200, 100)
pers_bias.shape = (200, 100)


Unnamed: 0_level_0,64,65,75,79,83,112,252,271,301,305,...,3411,3430,3524,3533,3625,3902,3991,4047,4342,4462
item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
24,,,,,,,4.0,5.0,4.0,5.0,...,,,,,,,,,,
30,,,,,,,,,,,...,,,,,,,,,,
35,,,,4.0,,,,,,,...,,,,,,,,,,
41,,,,,,,,,,,...,,,,,,,,,,
45,,5.0,4.0,5.0,,,,,,5.0,...,,,,,,,,,,


# Class RecommenderEvaluator

In order to become easier to evaluate the metrics, I created a class that receives all the original ratings and predicted ratings for every recommender system and defined functions to extract all the metrics established in section 1 of the capstone report. Lets take a look at a summary of the class before looking at the code:
- **Constructor (init)**: receive all recommendation algorithms, besides the actual rating list and the list of items. All data is contained in the data downloaded from Coursera. Besides storing all recommendation algorithms, the constructor also calculate the 20 most frequent items, which is used in the popularity metric calculation.

- **get_observed_ratings**: as the ratings matrix is sparse, this method only returns the items a user with id userId has purchased.

- **get_top_n**: by ordering all the predicted ratings for each recommendation algorithm, we can extract what would be their 'top' recommendation for a given user. Given a parameter $n$, we can then return all the top $n$ recommendations for all the recommendation algorithms.

- **rmse**: by comparing the observed ratings a given user has given to an item and the predicted rating an algorithm has defined for a user, we can have an idea of how much error the algorithm is predicting the user's ratings. Here we don't work with lists, as usually each user has rated only a few amount of items. So here we get all the items the user has rated, recover these items from the algorithms' recommendations and them calculate the error.

- **nDCG**: By looking at lists now, we can have an idea of how optimal the ranked lists are. By using the scoring factor defined in the report, we can calculate the overall DCG for the recommenders' lists and then normalise them using the concepts of the nDCG.

- **Price and avalaibility diversity**: Diversity metric which evaluate how the recommended items' prices vary, *i.e.*, how is the standard deviation of the price. The higher, the better in this case. The same is for the availability index, but here, with higher standard deviations, it means the models are recommending items which are present and not present in local stores.

- **Popularity**: A popular recommender tries to recommend items which has a high chance of being purchased. In the formulation of this metric, an item has a high chance of being purchased if lots of people have purchased them. In the class constructor, we take the observed ratings data and the item list and select which were the top $n$ (standard = 20) most purchased data. In a recommendation list, we return the ration of how many items were inside this list of top $n$ ones.

In [3]:
class RecommenderEvaluator:
    
    def __init__(self, items, actual_ratings, content_based, user_user, item_item, matrix_fact, pers_bias):
        
        self.items = items
        self.actual_ratings = actual_ratings
        # static data containing the average score given by each user
        self.average_rating_per_userid = actual_ratings.apply(lambda row: np.average(row[~np.isnan(row)]))
        
        self.content_based = content_based
        self.user_user = user_user
        self.item_item = item_item
        self.matrix_fact = matrix_fact
        self.pers_bias = pers_bias
        
        # aggregate list. Makes for loops among all recommenders' predictions easier
        self.recommenders_list = [self.content_based, self.user_user, self.item_item, self.matrix_fact,self.pers_bias]
        self.recommenders_list_names = ['content_based', 'user_user', 'item_item', 'matrix_fact','pers_bias']
        
        # Used for item popularity metric.
        # Calculate the 20 most popular items (item which most of the customers bought)
        N_LIM = 20
        perc_users_bought_item = self.actual_ratings.apply(lambda item: np.sum(~np.isnan(item)), axis=0)/actual_ratings.shape[1]
        sort_pop_items = np.argsort(perc_users_bought_item)[::-1]
        self.pop_items = perc_users_bought_item.iloc[sort_pop_items][:N_LIM].index.values.astype(np.int)
    
        
    def get_observed_ratings(self, userId):
        """
        Returns all the items a given user evaluated and their ratings. Used mainly by all the metrics calculation
        :parameter: userId - user id
        :return: array of rated items. Index is the item id and value is the item rating
        """
        userId = str(userId)
        filtered_ratings = self.actual_ratings[userId]
        rated_items = filtered_ratings[~np.isnan(filtered_ratings)]
        return rated_items
    
    def get_top_n(self, userId, n):
        """
        Get the top n recommendations for every recommender in the list given a user id
        :parameter: userId - user id
        :parameter: n - max number of recommendations to return
        :return: dictionary where the key is the recommender's name and the value is an array of size n for the top n recommnendations.
        """
        userId = str(userId)
        predicted_ratings = dict()
        for recommender, recommender_name in zip(self.recommenders_list,self.recommenders_list_names):
            item_ids = recommender[userId].argsort().sort_values()[:n].index.values
            predicted_ratings[recommender_name] = item_ids
        return predicted_ratings
    
    def rmse(self, userId):
        """
        Root Mean Square Error of the predicted and observed values between the recommender's prediction and the actual ratings
        :parameter: userId - user id
        :return: dataframe of containing the rmse from all recommenders given user id
        """
        userId = str(userId)
        observed_ratings = self.get_observed_ratings(userId)
        rmse_list = {'rmse': []}
        for recommender in self.recommenders_list:
            predicted_ratings = recommender.loc[observed_ratings.index, userId]
            rmse_list['rmse'].append(np.sqrt(np.average((predicted_ratings - observed_ratings)**2)))
        rmse_list = pd.DataFrame(rmse_list, index = self.recommenders_list_names)
        return rmse_list
    
    def nDCG(self, userId):
        """
        Normalised Discounted Cumulative Gain for all recommenders given user id
        :parameter: userId - user id
        :return: dataframe of containing the nDCG from all recommenders given user id
        """
        ri = self.get_observed_ratings(userId)
        top5 = self.get_top_n(userId,5)

        # 1st step: Given recommendations, transform list into scores (see score transcriptions in the capstone report)
        scores_all = []
        for name, item_list in top5.items():
            scores = np.empty_like(item_list) # initialise 'random' array
            scores[:] = -10                   ###########################
                                                                   # check which items returned by the recommender
            is_already_rated = np.isin(item_list, ri.index.values) # the user already rated. Items users didn't rate
            scores[~is_already_rated] = 0                          # receive score = 0
            for index, score in enumerate(scores):
                if(score != 0):                                    # for each recommended items the user rated
                    if(ri[item_list[index]] < self.average_rating_per_userid[userId] - 1): # score accordingly the report 
                        scores[index] = -1
                    elif((ri[item_list[index]] >= self.average_rating_per_userid[userId] - 1) & 
                         (ri[item_list[index]] < self.average_rating_per_userid[userId] + 0.5)):
                        scores[index] = 1
                    else:
                        scores[index] = 2
            scores_all.append(scores)                              # append all the transformed scores
        scores_all  

        # 2nd step: Given scores, calculate the model's DCG, ideal DCG and then nDCG
        nDCG_all = dict()
        for index_model, scores_model in enumerate(scores_all):   # for each model
            model_DCG = 0                                         # calculate model's DCG
            for index, score in enumerate(scores_model):          #
                index_ = index + 1                                #
                model_DCG = model_DCG + score/np.log2(index_ + 1) #   
            ideal_rank_items = np.sort(scores_model)[::-1]                        # calculate model's ideal DCG
            ideal_rank_DCG = 0                                                    #
            for index, ideal_score in enumerate(ideal_rank_items):                #
                index_ = index + 1                                                #
                ideal_rank_DCG = ideal_rank_DCG + ideal_score/np.log2(index_ + 1) #
            if((ideal_rank_DCG == 0) | (np.abs(ideal_rank_DCG) < np.abs(model_DCG))): # if nDCG is 0 or only negative scores came up
                nDCG = 0 
            else:                                                     # calculate final nDCG when ideal DCG is != 0
                nDCG = model_DCG/ideal_rank_DCG
                                                         
            nDCG_all[self.recommenders_list_names[index_model]] = nDCG # save each model's nDCG in a dict
            # convert it to dataframe
            result_final = pd.DataFrame(nDCG_all, index=range(1)).transpose()
            result_final.columns = ['nDCG']
        return result_final

    def price_diversity(self,userId):
        """
        Mean and standard deviation of the price of the top n products recommended by each algorithm. 
        Intuition for a high price wise diversity recommender is to have a high price standard deviation
        :parameter: userId - user id
        :return: dataframe of containing the price's mean and standard deviation from all recommenders given user id
        """

        topn = self.get_top_n(userId,5)

        stats = pd.DataFrame()
        for key, value in topn.items():
            data_filtered = self.items.loc[topn[key]][['Price']].agg(['mean','std']).transpose()
            data_filtered.index = [key]
            stats = stats.append(data_filtered)
        return stats
    
    def availability_diversity(self,userId):
        """
        Mean and standard deviation of the availabity index of the top n products recommended by each algorithm. 
        Intuition for a high availabity diversity is to have a small mean value in the availabity index
        :parameter: userId - user id
        :return: dataframe of containing the availabity index's mean and standard deviation from all recommenders given user id
        """
        topn = self.get_top_n(userId,5)

        stats = pd.DataFrame()
        for key, value in topn.items():
            data_filtered = self.items.loc[topn[key]][['Availability']].agg(['mean','std']).transpose()
            data_filtered.index = [key]
            stats = stats.append(data_filtered)
        return stats
    def popularity(self, userId):
        """
        Return the ratio of how many items of the top n items are among the most popular purchased items. Default is
        the 20 most purchased items.
        :parameter: userId - user id
        :return: dataframe of containing ratio of popular items in the recommended list from all recommenders given user id
        """
        topn = self.get_top_n(userId,5)

        recommended = re.get_top_n(userId,5)

        results = {'popularity': []}
        for recommender, recommendations in recommended.items():
            popularity = np.sum(np.isin(recommendations,self.pop_items))
            results['popularity'].append(popularity)
        return pd.DataFrame(results,index = self.recommenders_list_names)
        

# Test methods:

Just to have an idea of the output of each method, lets call all them with a test user. At the next section we will calculate these metrics for all users.

In [4]:
userId = '64'
re = RecommenderEvaluator(items, actual_ratings, content_based, user_user, item_item, matrix_fact, pers_bias)

## Test RMSE

In [5]:
re.rmse(userId)

Unnamed: 0,rmse
content_based,0.772809
user_user,0.624086
item_item,0.797922
matrix_fact,0.853212
pers_bias,0.845591


## Test nDCG

In [6]:
re.nDCG(userId)

Unnamed: 0,nDCG
content_based,0.0
item_item,0.63093
matrix_fact,0.0
pers_bias,0.0
user_user,0.0


## Test Diversity - Price and Availability

In [7]:
re.price_diversity(userId)

Unnamed: 0,mean,std
content_based,10.376,5.160923
user_user,19.846,14.888584
item_item,6.518,3.736117
matrix_fact,9.706,5.622004
pers_bias,9.89,5.121875


In [8]:
re.availability_diversity(userId)

Unnamed: 0,mean,std
content_based,0.595712,0.197574
user_user,0.628495,0.124328
item_item,0.531987,0.228363
matrix_fact,0.588537,0.197005
pers_bias,0.588596,0.17263


## Test Popularity

In [9]:
re.popularity(userId)

Unnamed: 0,popularity
content_based,0
user_user,0
item_item,0
matrix_fact,0
pers_bias,0


# Average metrics by all users

Espefically for user 907, the recommendations from the user user came with all nulls (original dataset). This specifically impacted the RMSE calculation, as one Nan damaged the entire average calculation. So specifically for RMSE we did a separate calculation section. All the other metrics are going the be calculated in the next code block.

In [10]:
re = RecommenderEvaluator(items, actual_ratings, content_based, user_user, item_item, matrix_fact, pers_bias)

i = 0
count = np.array([0,0,0,0,0])
for userId in actual_ratings.columns:
    if(userId == '907'):
        rmse_recommenders = re.rmse(userId).fillna(0)
    else:
        rmse_recommenders = re.rmse(userId)
    count = count + rmse_recommenders['rmse']

# as we didn't use user 907 for user user, divide it by the number of users - 1
denominator = [len(actual_ratings.columns)] * 5
denominator[1] = len(actual_ratings.columns) - 1
print('Average RMSE for all users')
count/ denominator

Average RMSE for all users


content_based    0.572387
user_user        0.545130
item_item        0.574672
matrix_fact      0.659029
pers_bias        0.666273
Name: rmse, dtype: float64

In [35]:
count_nDCG = np.array([0,0,0,0,0])
count_diversity_price = np.ndarray([5,2])
count_diversity_availability = np.ndarray([5,2])
count_popularity = np.array([0,0,0,0,0])

for userId in actual_ratings.columns:
    nDCG_recommenders = re.nDCG(userId)
    count_nDCG = count_nDCG + nDCG_recommenders['nDCG']
    
    diversity_price_recommenders = re.price_diversity(userId)
    count_diversity_price = count_diversity_price + diversity_price_recommenders[['mean','std']]
    
    diversity_availability_recommenders = re.availability_diversity(userId)
    count_diversity_availability = count_diversity_availability + diversity_availability_recommenders[['mean','std']]
    
    popularity_recommenders = re.popularity(userId)
    count_popularity = count_popularity + popularity_recommenders['popularity'] 
    break

print('\n---')
print('Average nDCG')
print('---\n')
print(count_nDCG/len(actual_ratings.columns))
print('\n---')
print('Average Price Diversity')
print('---\n')
print(count_diversity_price/len(actual_ratings.columns))
print('\n---')
print('Average Availability Diversity')
print('---\n')
print(count_diversity_availability/len(actual_ratings.columns))
print('\n---')
print('Average Popularity')
print('---\n')
print(count_popularity/len(actual_ratings.columns))


---
Average nDCG
---

content_based    0.136505
item_item        0.146798
matrix_fact      0.155888
pers_bias        0.125180
user_user        0.169080
Name: nDCG, dtype: float64

---
Average Price Diversity
---

                    mean        std
content_based  19.240238  19.178071
user_user      21.910497  25.222586
item_item      25.880743  32.173458
matrix_fact    21.119133  26.189485
pers_bias       9.890000   5.121875

---
Average Availability Diversity
---

                   mean       std
content_based  0.677648  0.227399
user_user      0.831211  0.329105
item_item      0.670905  0.211142
matrix_fact    0.658213  0.208816
pers_bias      0.687496  0.223849

---
Average Popularity
---

content_based    0.00
user_user        0.04
item_item        0.01
matrix_fact      0.00
pers_bias        0.00
Name: popularity, dtype: float64


# Final Analysis

In terms of **RMSE**, the user-user collaborative filtering showed to be the most effective, despite it not being significantly better.

For nDCG rank score, again user user and now item item collaborative filtering were the best.

In terms of price diversity, the item item algorith was the most diverse, providing products varying ~32 dollars from the mean item price list. Matrix factorisation and user user follow right behind, with price standard deviation around 25 dollars. An interesting factor here was the *pers_bias* algorithm, as it recommended basically cheap products with a low standard deviation.

For the availabity index, all the algorithms besides the user user managed to recommend items not so present in the local stores **together** with items present in local stores, as we can see they also provided items with availability index high (high standard deviation).

In terms of popularity, no algorithm actually managed to obtain good scores in the way we defined. So, if the popularity is focused in the future, we can either change the popularity concept or improve mechanics in the recommender so it predict higher scores for the most popular items in the store.

After this evaluation, it seemed to us that the item-item recommender system had an overall better performance, highlighted in terms of its diversity scores. Unfortunately, the items that item item recommender has suggested are in overall pricy, and we can check if there is any mixture possibility with the pers_bias algorithm, as it really indicated cheap prices and a low price standard deviation. Matrix factorization performed good as well but it didn't outperform any of the other recommenders.