
<font size="5" color="blue"><Strong>Collaborative Filtering:</Strong></font> Recommend items based on user behavior(ratings/reviews)
<ol>
    <li><font size="3" color="green">User based</font>: Recommend items based on the similarity between users (two users give good rates to the same products), so will be similar.</li>
    <li><font size="3" color="green">Item based</font>: Recommend products based on the similarity between items (the same users give item A and item B similar ratings), so the two items are similar.</li>
</ol>

<font size="4" color="blue"><Strong>General Steps</Strong></font> 
<ol>
    <li>Build a utility matrix (users,items).</li>
    <li>Build a correlation matrix to get the similarity.</li>
</ol>

<font size="3" color="blue">Problem statement:</font> Develop a recommender system for recommending a restaurant to a specific user based on his previous experiences.

<font size="3" color="blue">Approach:</font> Using collaborative filtering approaches (item based & user based)

These datasets are hosted on: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data

<font size="4" color="blue"><Strong>Import libraries</Strong></font> 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

<font size="4" color="blue"><Strong>Reading the data</Strong></font> 

In [2]:
rating  = pd.read_csv('data/rating_final.csv')
rating.head()

Unnamed: 0,userID,placeID,rating,food_rating,service_rating
0,U1077,135085,2,2,2
1,U1077,135038,2,2,1
2,U1077,132825,2,2,2
3,U1077,135060,1,2,2
4,U1068,135104,1,1,2


In [3]:
cuisine = pd.read_csv('data/chefmozcuisine.csv')
cuisine.head()

Unnamed: 0,placeID,Rcuisine
0,135110,Spanish
1,135109,Italian
2,135107,Latin_American
3,135106,Mexican
4,135105,Fast_Food


In [4]:
geoplaces = pd.read_csv('data/geoplaces2.csv',encoding='latin-1')
geoplaces.head()

Unnamed: 0,placeID,latitude,longitude,the_geom_meter,name,address,city,state,country,fax,...,alcohol,smoking_area,dress_code,accessibility,price,url,Rambience,franchise,area,other_services
0,134999,18.915421,-99.184871,0101000020957F000088568DE356715AC138C0A525FC46...,Kiku Cuernavaca,Revolucion,Cuernavaca,Morelos,Mexico,?,...,No_Alcohol_Served,none,informal,no_accessibility,medium,kikucuernavaca.com.mx,familiar,f,closed,none
1,132825,22.147392,-100.983092,0101000020957F00001AD016568C4858C1243261274BA5...,puesto de tacos,esquina santos degollado y leon guzman,s.l.p.,s.l.p.,mexico,?,...,No_Alcohol_Served,none,informal,completely,low,?,familiar,f,open,none
2,135106,22.149709,-100.976093,0101000020957F0000649D6F21634858C119AE9BF528A3...,El Rincón de San Francisco,Universidad 169,San Luis Potosi,San Luis Potosi,Mexico,?,...,Wine-Beer,only at bar,informal,partially,medium,?,familiar,f,open,none
3,132667,23.752697,-99.163359,0101000020957F00005D67BCDDED8157C1222A2DC8D84D...,little pizza Emilio Portes Gil,calle emilio portes gil,victoria,tamaulipas,?,?,...,No_Alcohol_Served,none,informal,completely,low,?,familiar,t,closed,none
4,132613,23.752903,-99.165076,0101000020957F00008EBA2D06DC8157C194E03B7B504E...,carnitas_mata,lic. Emilio portes gil,victoria,Tamaulipas,Mexico,?,...,No_Alcohol_Served,permitted,informal,completely,medium,?,familiar,t,closed,none


<font size="4" color="blue"><Strong>Data Cleaning</Strong></font> 

<font size="2" color="green"><Strong>Feature Selection</Strong></font>

In [5]:
rating_features = ['userID','placeID','rating']
cuisine_features = ['placeID','Rcuisine']
geoplaces_features = ['placeID','name']

In [6]:
rating = rating[rating_features]
cuisine = cuisine[cuisine_features]
geoplaces = geoplaces[geoplaces_features]

<font size="2" color="green"><Strong>Merging Features</Strong></font>

In [7]:
rating_clean_data = rating.merge(cuisine,on='placeID').merge(geoplaces,on='placeID')
rating_clean_data.head()

Unnamed: 0,userID,placeID,rating,Rcuisine,name
0,U1077,135085,2,Fast_Food,Tortas Locas Hipocampo
1,U1108,135085,1,Fast_Food,Tortas Locas Hipocampo
2,U1081,135085,1,Fast_Food,Tortas Locas Hipocampo
3,U1056,135085,2,Fast_Food,Tortas Locas Hipocampo
4,U1134,135085,2,Fast_Food,Tortas Locas Hipocampo


<font size="4" color="blue"><Strong>Data Exploration</Strong></font> 

<font size="2" color="green"><Strong>General Exploration</Strong></font>

In [8]:
# explore the number of observations and features
rating_clean_data.shape

(1043, 5)

In [9]:
# explore the null values and  data types of the features
rating_clean_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1043 entries, 0 to 1042
Data columns (total 5 columns):
userID      1043 non-null object
placeID     1043 non-null int64
rating      1043 non-null int64
Rcuisine    1043 non-null object
name        1043 non-null object
dtypes: int64(2), object(3)
memory usage: 48.9+ KB


In [10]:
# explore the descriptive statistics for the numerical features
rating_clean_data.describe()

Unnamed: 0,placeID,rating
count,1043.0,1043.0
mean,134158.123682,1.215724
std,1105.26291,0.770411
min,132560.0,0.0
25%,132856.0,1.0
50%,135028.0,1.0
75%,135053.0,2.0
max,135109.0,2.0


<font size="2" color="green"><Strong>Explore the average of ratings per place</Strong></font>

In [11]:
average_rating = pd.DataFrame(rating_clean_data.groupby('placeID')['rating'].mean())
average_rating.columns={'average_rating'}
average_rating.head()

Unnamed: 0_level_0,average_rating
placeID,Unnamed: 1_level_1
132560,0.5
132572,1.0
132583,1.0
132584,1.333333
132594,0.6


<font size="2" color="green"><Strong>Explore total number of ratings per place</Strong></font>

In [12]:
count_rating = pd.DataFrame(rating_clean_data.groupby('placeID')['rating'].count())
count_rating.columns = {'count_rating'}
count_rating.head()

Unnamed: 0_level_0,count_rating
placeID,Unnamed: 1_level_1
132560,4
132572,15
132583,4
132584,6
132594,5


<font size="2" color="green"><Strong>Add the average and count to the rating_clean_data data frame</Strong></font>

In [13]:
rating_clean_data = rating_clean_data.merge(average_rating,on='placeID').merge(count_rating,on='placeID')
rating_clean_data.head()

Unnamed: 0,userID,placeID,rating,Rcuisine,name,average_rating,count_rating
0,U1077,135085,2,Fast_Food,Tortas Locas Hipocampo,1.333333,36
1,U1108,135085,1,Fast_Food,Tortas Locas Hipocampo,1.333333,36
2,U1081,135085,1,Fast_Food,Tortas Locas Hipocampo,1.333333,36
3,U1056,135085,2,Fast_Food,Tortas Locas Hipocampo,1.333333,36
4,U1134,135085,2,Fast_Food,Tortas Locas Hipocampo,1.333333,36


<font size="4" color="blue"><Strong>Data Preparation</Strong></font> 

<font size="2" color="green"><Strong>Build a utility matrix (users, items) using a pivot table</Strong></font>

In [14]:
user_place_utility_matrix = pd.pivot_table(rating_clean_data,index='userID', columns='placeID',values='rating')
user_place_utility_matrix.head()

placeID,132560,132572,132583,132584,132594,132608,132609,132613,132626,132630,...,135073,135074,135075,135079,135085,135086,135088,135104,135106,135109
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
U1001,,,,,,,,,,,...,,,,,0.0,,,,,
U1002,,,,,,,,,,,...,,,,,1.0,,,,1.0,
U1003,,,,,,,,,,,...,,,2.0,2.0,,,,,,
U1004,,,,,,,,,,,...,,,,,,,,,2.0,
U1005,,,,,,,,,,,...,,,,,,,,,,


<font size="5" color="green">Item-based Collaborative filtering</font><font size="4" color="black">: Recommend products based on the similarity between items (the same users give item A and item B similar ratings), so the two items are similar.</font>

<font size="3" color="blue">Example</font>: If the user like item A and item A is similar to item B (the same users give item A and item B similar ratings), so will recommend product B to the user.

<font size="4" color="blue"><Strong>Build a correlation matrix to get the similarity between items (using pearson's r)</Strong></font> 

<font size="2" color="green"><Strong>Select the place the current user went and liked</Strong></font> 

In [15]:
# let's assume the current user went and liked place (132560), we need to recommend for him 3 other places
current_place_id = 135085

<font size="2" color="green"><Strong>Get the ratings for the current place the user went</Strong></font> 

In [16]:
current_place_ratings = user_place_utility_matrix[current_place_id]
current_place_ratings = current_place_ratings[current_place_ratings>=0]
current_place_ratings.count()

36

<font size="2" color="green"><Strong>Build a correlation matrix between all places and the current one</Strong></font> 

In [18]:
similar_to_current_place = user_place_utility_matrix.corrwith(current_place_ratings)
similar_to_current_place = pd.DataFrame(similar_to_current_place, columns=['PearsonR']).dropna()
similar_to_current_place.head()

Unnamed: 0_level_0,PearsonR
placeID,Unnamed: 1_level_1
132572,-0.428571
132723,0.301511
132754,0.930261
132825,0.700745
132834,0.814823


In [19]:
# For making the recommendation more significant, we can filter out the places that have less than 10 ratings
similar_to_current_place_with_rating = similar_to_current_place.merge(count_rating,on='placeID')
current_item_similarity_after_filtering = similar_to_current_place_with_rating[similar_to_current_place_with_rating['count_rating']>=10]
current_item_similarity_after_filtering.head()

Unnamed: 0_level_0,PearsonR,count_rating
placeID,Unnamed: 1_level_1,Unnamed: 2_level_1
132572,-0.428571,15
132723,0.301511,12
132754,0.930261,13
132825,0.700745,32
132834,0.814823,25


<font size="2" color="green"><Strong>Get the similarity between items and the current one descending</Strong></font> 

In [20]:
sorted_similar_to_current_place_after_filtering = np.abs(current_item_similarity_after_filtering).sort_values(by= 'PearsonR',ascending=False)

In [21]:
# Reset data frame index to move placeID to be feature
sorted_similar_to_current_place_after_filtering = sorted_similar_to_current_place_after_filtering.reset_index()
sorted_similar_to_current_place_after_filtering.head()

Unnamed: 0,placeID,PearsonR,count_rating
0,135085,1.0,36.0
1,135053,1.0,24.0
2,132754,0.930261,13.0
3,135028,0.892218,15.0
4,135042,0.881409,20.0


In [22]:
# Drop the current place from the recommended list
sorted_similar_to_current_place_after_filtering =\
sorted_similar_to_current_place_after_filtering[sorted_similar_to_current_place_after_filtering['placeID']!= 135085]

<font size="2" color="green"><Strong>Recommend the top 3 places for the current user</Strong></font> 

In [23]:
recommended_places = sorted_similar_to_current_place_after_filtering.head(3).drop('PearsonR',1)
recommended_places = recommended_places.merge(geoplaces,on= "placeID").merge(average_rating,on='placeID')
# Just show features in proper order
recommended_places[['placeID','name','count_rating','average_rating']]

Unnamed: 0,placeID,name,count_rating,average_rating
0,135053,La Fontana Pizza Restaurante and Cafe,24.0,1.125
1,132754,Cabana Huasteca,13.0,1.461538
2,135028,La Virreina,15.0,1.533333


<font size="5" color="green">User-based Collaborative filtering</font><font size="3" color="black">: Recommend items based on the similarity between users (two users give good rates to the same products), so will be similar.</font>

<font size="3" color="blue">Example</font>: If user A likes item X and user A is similar to user B (two users give good rates to the same products)). so recommend item X to user B.

<font size="4" color="blue"><Strong>Build a correlation matrix to get the similarity between users (using pearson's r)</Strong></font>  

<font size="2" color="green"><Strong>Select  the current user you would like to recommend places for</Strong></font> 

In [24]:
# let's assume the current user (U1005) , we need to recommend for him 3 other places
current_user_id = 'U1005'

In [25]:
# Get the Utility matrix transpose to move the users from observations to features 
user_place_utility_matrix = user_place_utility_matrix.transpose()

<font size="2" color="green"><Strong>Get the ratings for the current user</Strong></font> 

In [26]:
current_user_ratings = user_place_utility_matrix[current_user_id]
current_user_ratings = current_user_ratings[current_user_ratings>=0]
print('Rating counts from the current user::',current_user_ratings.count())

Rating counts from the current user:: 5


<font size="2" color="green"><Strong>Build a correlation matrix between all users and the current one</Strong></font> 

In [27]:
similar_to_current_user = user_place_utility_matrix.corrwith(current_user_ratings)
similar_to_current_user = pd.DataFrame(similar_to_current_user, columns=['PearsonR']).dropna()
print('Similar users to the current user::',similar_to_current_user.count()['PearsonR'])
similar_to_current_user.head()

Similar users to the current user:: 15


Unnamed: 0_level_0,PearsonR
userID,Unnamed: 1_level_1
U1005,1.0
U1018,0.866025
U1022,-1.0
U1024,0.5
U1045,-1.0


<font size="2" color="green"><Strong>Get the similarity between user and the current one descending</Strong></font> 

In [28]:
sorted_similar_to_current_user = np.abs(similar_to_current_user).sort_values(by= 'PearsonR',ascending=False)
sorted_similar_to_current_user.head()

Unnamed: 0_level_0,PearsonR
userID,Unnamed: 1_level_1
U1005,1.0
U1022,1.0
U1045,1.0
U1054,1.0
U1085,1.0


In [29]:
# Reset data frame index to move userID to be feature
sorted_similar_to_current_user = sorted_similar_to_current_user.reset_index()
sorted_similar_to_current_user.head()

Unnamed: 0,userID,PearsonR
0,U1005,1.0
1,U1022,1.0
2,U1045,1.0
3,U1054,1.0
4,U1085,1.0


In [30]:
# Drop the current user  from the recommended list
sorted_similar_to_current_user =\
sorted_similar_to_current_user[sorted_similar_to_current_user['userID']!= current_user_id]
sorted_similar_to_current_user

Unnamed: 0,userID,PearsonR
1,U1022,1.0
2,U1045,1.0
3,U1054,1.0
4,U1085,1.0
5,U1099,1.0
6,U1101,1.0
7,U1104,1.0
8,U1113,1.0
9,U1120,1.0
10,U1018,0.866025


<ul><li><font size="2" color="green"><Strong>We will recommend places the top similar users go</Strong></font></li>
<li><font size="2" color="green"><Strong>For making the recommendation more significant, we can filter out the users that have less than 3 ratings</Strong></font></li><ul>

 <font size="5" color="blue"><Strong>Follow more hands-on data science use cases:</Strong></font> https://www.linkedin.com/in/ahmedhamdyse/
<ol>