# Part 4: Recommendation system for retail using collaborative filtering user-based

# In this notebook we will focus on collaborative filtering with user-based which allows us to facilitate recommandation
I use some articles during the creation of this notebook  :  
* https://towardsdatascience.com/build-a-user-based-collaborative-filtering-recommendation-engine-for-anime-92d35921f304


In this notebook, we will focus on 2 main steps, knowing that users will be similar if they like similar items : 
* First we discover which users are similar
* Then recommend items that other similar users like

# The summary of the notebook is written below

I. Import useful library and python file containing our functions  

II. Retrieving our data from a previous notebook  

* A. User_item matrix
* B. Product_info_mapped

III. Selecting similar users  

IV. Recommandation for this specific user   

* A. Unknown rating
* B. Known rating
* C. Metric RMSE for one user  

V. Metrics



# I. Import useful library and python file containing our functions

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
import operator
import statistics
from sklearn.metrics import mean_squared_error 
import Function_04 as f4

# set the graphs to show in the jupyter notebook
%matplotlib inline

# II. Retrieving our data from a previous notebook

## A. User_item matrix

We retrieve our user_item matrix with only customer that we are interested in

In [2]:
user_item_matrix = pd.read_hdf("user_item_matrix.hdf","user_item_matrix")
user_item_matrix.head()

object_id,1_1,1_3,1_4,2_1,2_3,2_4,3_10,3_4,3_5,3_8,...,5_10,5_11,5_12,5_3,5_6,5_7,6_10,6_11,6_12,6_2
cust_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
266783,0.0,0.0,2.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
266784,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,...,3.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0
266785,0.0,0.0,0.0,4.0,0.0,3.5,0.0,0.0,0.0,0.0,...,0.0,5.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0
266788,0.0,0.0,0.0,1.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
266794,0.0,3.5,0.0,0.0,2.5,0.0,1.5,0.0,0.0,4.0,...,0.0,3.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## B. Product_info_mapped

In [3]:
Product_map = pd.read_csv("product_info_mapped.csv")

# III. Selecting similar users

We select a specific user_id in order to do some prediction

In [4]:
current_user = 266784
similar_user_indices = f4.similar_users(current_user, user_item_matrix, k=6)
similar_user_indices

[268232, 270423, 273627, 273025, 269368, 274452]

# IV. Recommandation for this specific user 

Like in a previous notebook, we can use the recommend_item_for_all function.  
We need to set a max_value corresponding to the number of max similar user we take to estimate the rating.  

In [5]:
#we set here the max value at 5
max_value = 5

## A. Unknown rating

First, let's see prediction for unknown rate

In [6]:
unknown = f4.recommend_item_for_all(current_user, similar_user_indices, max_value, user_item_matrix,Product_map, False)
unknown.head(23)

Unnamed: 0,object_name,object_id,mean_from_similar_user
22,Home and kitchen_Tools,6_12,3.0
17,Books_Comics,5_3,3.0
14,Books_Academic,5_12,1.0
1,Clothing_Women,1_1,0.0
21,Home and kitchen_Bath,6_11,0.0
20,Home and kitchen_Kitchen,6_10,0.0
19,Home and kitchen_Furnishing,6_2,0.0
18,Books_DIY,5_6,0.0
16,Books_Children,5_11,0.0
12,Bags_Women,4_4,0.0


## B. Known rating

We can check here is info are similar or at least close to the real value

In [7]:
known = f4.recommend_item_for_all(current_user, similar_user_indices, max_value, user_item_matrix,Product_map, True)
known

Unnamed: 0,object_name,mean_from_similar_user,real_rank_from_266784
0,Books_Fiction,4.2,5.0
1,Electronics_Mobiles,4.0,2.0
2,Books_Non-Fiction,2.8,3.0


## C. Metric RMSE for one user

We will mesure RMSE metric for only one user.  
This is not representative but it shows how metric will be mesure.

In [8]:
#Real values are in the column number 2 : 'real_rank_from_......'
realVals = known[known.columns[2]]

#Predicted values are in the column number 1 : 'mean_from_similar_user'
predictedVals = known[known.columns[1]]

#calucul of mean squared error
mse = mean_squared_error(realVals, predictedVals)

#calucul of root mean squared error
rmse = mean_squared_error(realVals, predictedVals, squared = False)


print('The MSE for this user is :', mse)
print('The RMSE for this user is :', rmse)

The MSE for this user is : 1.5599999999999998
The RMSE for this user is : 1.2489995996796797


# V. Metrics

If we repeat the previous process multiple time, we can obtain a mean MSE and RMSE and estimate a metric value

In [9]:
list_cust = user_item_matrix.index.tolist()
print(f'We have {len(list_cust)} different customers')

We have 4031 different customers


In [10]:
max_value = 10
nb_values_to_mesure = 1000
RMSE = []
MSE = []

def find_metrics(current_user):
    similar_user_indices = f4.similar_users(current_user, user_item_matrix, k=6)
    known = f4.recommend_item_for_all(current_user, similar_user_indices, max_value, user_item_matrix,Product_map, True)
    mse, rmse = f4.meatrics_mse_rmse(known)
    MSE.append(mse)
    RMSE.append(rmse)
    
for i,user in enumerate(list_cust): 
    find_metrics(user)
    if i == nb_values_to_mesure - 1: #if we only want nb_values_to_mesure, we need to set it to nb_values_to_mesure - 1
        break
        

In [11]:
print('mean mse:', statistics.mean(MSE))
print('mean rmse:', statistics.mean(RMSE))

mean mse: 1.3612454408803645
mean rmse: 1.0765877477675778


The RMSE score tends to 1.07, meaning that the estimated ratings on average are about 1.07 higher or lower than the actual ratings.
Here, our scale is from 0 to 5, so it's is not a bit better than previous notebooks.