# Part 4: Recommendation system for retail using collaborative filtering user-based

# In this notebook we will focus on collaborative filtering with user-based which allows us to facilitate recommandation
I use some articles during the creation of this notebook  :  
* https://towardsdatascience.com/build-a-user-based-collaborative-filtering-recommendation-engine-for-anime-92d35921f304


In this notebook, we will focus on 2 main steps, knowing that users will be similar if they like similar items : 
* First we discover which users are similar, according to theirs tastes
* Then recommend items that other similar users like

# The summary of the notebook is written below

I. Import useful library and python file containing our functions  

II. Retrieving our data from a previous notebook  

* A. User_item matrix
* B. Product_info_mapped

III. Selecting similar users  

IV. Recommandation for this specific user   

* A. Unknown rating
* B. Known rating
* C. Metric RMSE for one user  

V. Metrics



* We just make sure we are in the "root" directory

# I. Import useful library and python file containing our functions

In [1]:
%cd ..

/Users/kofficornelis/Documents/Retail_V2


In [2]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
import operator
import statistics
from sklearn.metrics import mean_squared_error 
import utils.Function_04 as f4

# set the graphs to show in the jupyter notebook
%matplotlib inline

# II. Retrieving our data from a previous notebook

## A. User_item matrix

We retrieve our user_item matrix with only customer that we are interested in

In [3]:
user_item_matrix = pd.read_hdf("output/user_item_matrix.hdf","user_item_matrix")
user_item_matrix.head()

object_id,1_1,1_3,1_4,2_1,2_3,2_4,3_10,3_4,3_5,3_8,...,5_10,5_11,5_12,5_3,5_6,5_7,6_10,6_11,6_12,6_2
cust_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
266783,0.0,0.0,2.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,...,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
266784,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,...,3.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0
266785,0.0,0.0,0.0,4.0,0.0,3.5,0.0,0.0,0.0,0.0,...,0.0,5.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0
266788,0.0,0.0,0.0,1.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
266794,0.0,3.5,0.0,0.0,2.5,0.0,1.5,0.0,0.0,4.0,...,0.0,3.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## B. Product_info_mapped

In [4]:
Product_map = pd.read_csv("output/product_info_mapped.csv")

# III. Selecting similar users

We select a specific user_id in order to do some prediction

In [5]:
CURRENT_USER = 266784

Here, we display a list with the k=6 top similar users according to items they liked

In [6]:
similar_user_indices = f4.similar_users(CURRENT_USER, user_item_matrix, k=10)
similar_user_indices

[268232,
 270423,
 273627,
 273025,
 269368,
 274452,
 271682,
 270218,
 274155,
 267858]

# IV. Recommandation for this specific user 

Like in a previous notebook, we can use the recommend_item_for_all function.  
We need to set a max_value corresponding to the number of max similar user we take to estimate the rating.  

In [7]:
#we set here the max value at 8
max_value = 8

## A. Unknown rating

First, let's see prediction for unknown rate

In [8]:
unknown = f4.recommend_item_from_uid(CURRENT_USER, similar_user_indices, max_value, user_item_matrix,Product_map, False)
unknown.head(23)

Unnamed: 0,object_name,object_id,mean_from_similar_user
22,Home and kitchen_Tools,6_12,3.0
17,Books_Comics,5_3,3.0
21,Home and kitchen_Bath,6_11,2.0
18,Books_DIY,5_6,2.0
16,Books_Children,5_11,2.0
7,Electronics_Computers,3_5,1.0
14,Books_Academic,5_12,1.0
1,Clothing_Women,1_1,1.0
12,Bags_Women,4_4,0.0
20,Home and kitchen_Kitchen,6_10,0.0


## B. Known rating

We can check here is info are similar or at least close to the real value

In [9]:
known = f4.recommend_item_from_uid(CURRENT_USER, similar_user_indices, max_value, user_item_matrix,Product_map, True)
known

Unnamed: 0,object_name,object_id,mean_from_similar_user,real_rating_from_266784
0,Electronics_Mobiles,3_4,4.0,2.0
1,Books_Fiction,5_7,4.0,5.0
2,Books_Non-Fiction,5_10,2.875,3.0


## C. Metric RMSE for one user

We will mesure RMSE metric for only one user.  
This is not representative but it shows how metric will be mesure.

In [10]:
#Real values are in the column number 2 : 'real_rating_from_......'
realVals = known[known.columns[2]]

#Predicted values are in the column number 1 : 'mean_from_similar_user'
predictedVals = known[known.columns[1]]

#calucul of mean squared error
mse = mean_squared_error(realVals, predictedVals)

#calucul of root mean squared error
rmse = mean_squared_error(realVals, predictedVals, squared = False)


print('The MSE for this user is :', mse)
print('The RMSE for this user is :', rmse)

The MSE for this user is : 86961.58854166667
The RMSE for this user is : 294.8925033663397


# V. Metrics

If we repeat the previous process multiple time, we can obtain a mean MSE and RMSE and estimate a metric value

In [11]:
list_cust = user_item_matrix.index.tolist()
print(f'We have {len(list_cust)} different customers')

We have 4031 different customers


In [14]:
max_value = 10
how_many_user_to_mesure = 500
RMSE = []
MSE = []

def find_metrics(current_user):
    'Function which add mse and rmse score to lists for a selected user_id'
    
    similar_user_indices = f4.similar_users(current_user, user_item_matrix, k=10)
    known = f4.recommend_item_from_uid(current_user, similar_user_indices, max_value, user_item_matrix,Product_map, True)
    mse, rmse = f4.meatrics_mse_rmse(known)
    MSE.append(mse)
    RMSE.append(rmse)

# we run the previous function for 500 diferents users before mesuring the mean mse and rmse
for i,user in enumerate(list_cust): 
    find_metrics(user)
    if i == how_many_user_to_mesure - 1:
        break
        

In [15]:
print('mean mse:', statistics.mean(MSE))
print('mean rmse:', statistics.mean(RMSE))

mean mse: 1.2605292578081362
mean rmse: 1.0358838952785183


The RMSE score tends to 1.03.  
This mean that the estimated ratings on average are about 1.03 higher or lower than the actual ratings.  
Here, our scale is from 0 to 5, so it's is not a bit better than previous notebooks.

## Thank you for finishing part 4, this model is about collaborative filtering and his result is fine but not as good as the previous one with surprise.  
## The fifth and final part is a consolidated method using all 3 previous notebook and models.