<h3 style="display:inline">User-based Collaborative Filtering:</h3><h4 style="display:inline; margin-left:-40px;">Food Recommender System Case Study</h4>


In this case study, you are asked to develop a food recommender system using content-based filtering. You are given records of different types of food recipes, and rating users have given on these recipes. Your task consist of 

<ol>
    <li>Building a food recommender engine that suggests top similar recipes to a given product using <b style="color:blue">user-based collaborative filtering</b></li>
    <li>Estimate a user rating on a recipe he has never tasted using <b style="color:blue">user-based collaborative filtering</b></li>
</ol>

<b style="color:blue">Step 1. Load the datasets</b>

In [103]:
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

food_df=pd.read_csv('../datasets/food_recommender_datasets/1662574418893344.csv')
food_df.head()

Unnamed: 0,Food_ID,Name,C_Type,Veg_Non,Describe
0,1,summer squash salad,Healthy Food,veg,"white balsamic vinegar, lemon juice, lemon rin..."
1,2,chicken minced salad,Healthy Food,non-veg,"olive oil, chicken mince, garlic (minced), oni..."
2,3,sweet chilli almonds,Snack,veg,"almonds whole, egg white, curry leaves, salt, ..."
3,4,tricolour salad,Healthy Food,veg,"vinegar, honey/sugar, soy sauce, salt, garlic ..."
4,5,christmas cake,Dessert,veg,"christmas dry fruits (pre-soaked), orange zest..."


<h3>Preprocessing and Future Extraction</h3><br/>
<b style="color:blue">Step 2. Verify whether there are missing values and Impute data/Remove rows if necessary</b>

In [104]:
food_df.isna().any(axis=1).sum()

0

<b style="color:blue">Step 2a. Create a User-Item Matrix -> Load the rating dataframe</b>

In [114]:
rating_df = pd.read_csv('../datasets/food_recommender_datasets/ratings.csv')
rating_df.head()

Unnamed: 0,User_ID,Food_ID,Rating
0,1.0,88.0,4.0
1,1.0,46.0,3.0
2,1.0,24.0,5.0
3,1.0,25.0,4.0
4,2.0,49.0,1.0


<b style="color:blue">Step 2b. Create a User-Item Matrix -> Create a dataframe of average ratings and num of ratings per food (i.e. use rating_df and group_by)</b>

In [115]:
avg_ratings_df = rating_df.groupby('Food_ID').agg(avg_rating=('Rating','mean'),num_ratings=('Rating','count')).reset_index()
avg_ratings_df = avg_ratings_df.sort_values(by=['num_ratings'],ascending=False).reset_index()
avg_ratings_df.drop(columns=['index'],inplace=True)
avg_ratings_df.head()

Unnamed: 0,Food_ID,avg_rating,num_ratings
0,163.0,3.571429,7
1,23.0,3.333333,6
2,5.0,6.5,6
3,49.0,5.5,6
4,65.0,4.8,5


<b style="color:blue">Step 2c. Create a User-Item Matrix -> Select the most popular food</b>

In [116]:
import numpy as np

threshold_r = 5 #np.floor(avg_ratings_df[['num_ratings']].mean()+0.5)[0]
print(f'median number of ratings {threshold_r}')

median number of ratings 5


In [117]:
top_rated_movies_df = avg_ratings_df[avg_ratings_df['num_ratings']>=threshold_r]
top_rated_movies_df['Food_ID'] = top_rated_movies_df['Food_ID'].astype('int')
top_rated_movies_df.head()

Unnamed: 0,Food_ID,avg_rating,num_ratings
0,163,3.571429,7
1,23,3.333333,6
2,5,6.5,6
3,49,5.5,6
4,65,4.8,5


In [118]:
print('Number of popular foods:', len(top_rated_movies_df))

Number of popular foods: 12


<b style="color:blue">Step 2d. Create a User-Item Matrix -> Create a dataframe of ratings of most popular recipes</b>

In [119]:
#len(rating_df)

In [124]:
merged_rating_df = pd.merge(rating_df,top_rated_movies_df,on='Food_ID',how='inner')
merged_rating_df['Food_ID'] = merged_rating_df['Food_ID'].astype('int')
merged_rating_df.head()

Unnamed: 0,User_ID,Food_ID,Rating,avg_rating,num_ratings
0,1.0,46,3.0,5.4,5
1,3.0,46,2.0,5.4,5
2,20.0,46,6.0,5.4,5
3,69.0,46,9.0,5.4,5
4,97.0,46,7.0,5.4,5


<b style="color:blue">Step 2e. Create a User-Item Matrix using a pivot table index(user_id), columns: recipes, values=ratings</b>

In [121]:
#len(merged_rating_df)

In [148]:
pivot_df = merged_rating_df.pivot_table(index='User_ID',columns='Food_ID',values='Rating',aggfunc='mean',fill_value=0).reset_index()
pivot_df.head()

Food_ID,User_ID,5,7,18,21,22,23,46,47,49,53,65,163
0,1.0,0,0,0,0,0,0,3,0,0.0,0,0,0
1,2.0,0,0,0,0,0,0,0,0,1.0,0,0,0
2,3.0,0,0,0,0,0,0,2,0,0.0,0,3,0
3,4.0,0,0,0,1,0,0,0,0,0.0,0,0,0
4,6.0,0,0,0,0,5,0,0,0,0.0,0,0,0


<b style="color:blue">Step 3. Build User-based Similarity matrix</b>

In [146]:
from sklearn.metrics.pairwise import cosine_similarity 

user_cosine_sim_matrix = cosine_similarity(pivot_df,pivot_df)
user_cosine_sim_matrix.shape

(51, 51)

In [147]:
user_cosine_sim_matrix

array([[1.        , 0.28284271, 0.60677988, ..., 0.38369165, 0.3153588 ,
        0.31528023],
       [0.28284271, 1.        , 0.57207755, ..., 0.89210726, 0.8919694 ,
        0.90525847],
       [0.60677988, 0.57207755, 1.        , ..., 0.6686346 , 0.63784459,
        0.68277455],
       ...,
       [0.38369165, 0.89210726, 0.6686346 , ..., 1.        , 0.99466547,
        0.99441763],
       [0.3153588 , 0.8919694 , 0.63784459, ..., 0.99466547, 1.        ,
        0.99426396],
       [0.31528023, 0.90525847, 0.68277455, ..., 0.99441763, 0.99426396,
        1.        ]])

<b style="color:blue">Step 4. Select the top N movies for user p</b>

<b style="color:blue">Step 5. Rate a recipe for user p</b>