<img src="Instacart_logo_small.png" alt="Instacart" style="width: 100px;"/>

# Recommend products to Instacart Customers

## Introduction

As per the English dictionary, Recommendation means a suggestion/advice that something is good. 

In today's world of multiple choices, customers are often confused with options. Browsing through hundreds of products makes shopping a challenging and time-consuming experience. 
So, how about we login to a shopping site and see 10 recommended products tailored to our taste? Just Add those to cart, checkout and done! This is what a recommendation engine does! Thus a Recommendation Engine is a Machine Learning Technique that let us predict what a user may or may not like among a list of given items. 

Here, let's use the dataset provided by Instacart to build a recommendation engine and Evaluate how our recommendation works?

## Prepare Data

#### Load Libraries

In [1]:
# Imports
from implicit.als import AlternatingLeastSquares
from datetime import datetime
from pathlib import Path
from sklearn.metrics.pairwise import cosine_similarity

import scipy.sparse as sparse
import implicit
import pandas as pd
import numpy as np
import pickle
import time
from joblib import dump, load
from sklearn import metrics
import random
import warnings
warnings.filterwarnings('ignore')


#### Load Data

In [2]:
# Order datasets
df_order_products_prior = pd.read_csv("instacart_2017_05_01/order_products_prior.csv")
df_order_products_train = pd.read_csv("instacart_2017_05_01/order_products_train.csv")
df_orders = pd.read_csv("instacart_2017_05_01/orders.csv") 
# Products
df_products = pd.read_csv("instacart_2017_05_01/products.csv")
# Departments
df_departments = pd.read_csv("instacart_2017_05_01/departments.csv")

#### Merge Prior Orders with Products and Department

In [3]:
# Merge prior orders and products
df_merged_order_products_prior = pd.merge(df_order_products_prior, df_products, on="product_id", how="left")
# Merge prior orders and departments
df_merged_order_products_prior = pd.merge(df_merged_order_products_prior, df_departments, on="department_id", how="left")

#### Merge Train Orders with Products and Department

In [4]:
# Merge train orders and products
df_merged_order_products_train = pd.merge(df_order_products_train, df_products, on="product_id", how="left")
# Merge train orders and departments
df_merged_order_products_train = pd.merge(df_merged_order_products_train, df_departments, on="department_id", how="left")

**In this project, we will use the Prior Data for Building a Model and Train Data for Testing our Model performance.** 

In [5]:
df_merged_order_products_prior.head(2)

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,department
0,2,33120,1,1,Organic Egg Whites,86,16,dairy eggs
1,2,28985,2,1,Michigan Organic Kale,83,4,produce


In [6]:
df_merged_order_products_train.head(2)

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,department
0,1,49302,1,1,Bulgarian Yogurt,120,16,dairy eggs
1,1,11109,2,1,Organic 4% Milk Fat Whole Milk Cottage Cheese,108,16,dairy eggs


## Part 1 - Basic Recommendations

### 1. Trending Products at Instacart

New to Instacart and not sure what to order? Let's check out what other customers are buying?

In [7]:
def get_k_popular(k, df_items):
    """
    Returns the `k` most popular products based on purchase count in the dataset
    """
    popular_products = list(df_items["product_name"].value_counts().head(k).index)
    return popular_products

In [8]:
get_k_popular(10,df_merged_order_products_prior)

['Banana',
 'Bag of Organic Bananas',
 'Organic Strawberries',
 'Organic Baby Spinach',
 'Organic Hass Avocado',
 'Organic Avocado',
 'Large Lemon',
 'Strawberries',
 'Limes',
 'Organic Whole Milk']

### 2. Recommendations from the Departments that the Customers are interested in

In [9]:
def get_k_popular_dept_items(k, dept_id, df_items):
    """
    Returns the `k` most popular products from the Dept that is passed as a parameter
    
    k       : No. Of Recommendations
    dept_id : Pass in the Department Id that you are looking details for
    df_items: Pass in the Dataframe with Details
    
    """
    dept_popular_products = list(df_items[df_items.department_id == dept_id]["product_name"].value_counts().head(k).index)
    return dept_popular_products

#### Let's see what are the 10 popular products from Department_id = 6 (International)

In [10]:
get_k_popular_dept_items(10,6,df_merged_order_products_prior)

['Organic Sea Salt Roasted Seaweed Snacks',
 'Taco Seasoning',
 'New Mexico Taco Skillet Sauce For Chicken',
 'Sriracha Chili Sauce',
 'Original Roasted Seaweed Snacks',
 'Coconut Milk',
 'Organic Spicy Taco Seasoning',
 'Sriracha Hot Chili Sauce',
 'Roasted Sesame Seaweed Snacks',
 'Sliced Water Chestnuts']

### 3. Recommended Items that other Customers Often Buy Again

In [11]:
def get_reordered_prods(k, df_items):
    """
    Returns the `k` most popular products from the Dept that is passed as a parameter
    
    k       : No. Of Recommendations
    df_items: Pass in the Dataframe with Details
    
    """
    df = df_items[df_items.reordered ==1]
    reordered_products = list(df["product_name"].value_counts().head(10).index)
    return reordered_products

In [12]:
get_reordered_prods(10, df_merged_order_products_prior)

['Banana',
 'Bag of Organic Bananas',
 'Organic Strawberries',
 'Organic Baby Spinach',
 'Organic Hass Avocado',
 'Organic Avocado',
 'Organic Whole Milk',
 'Large Lemon',
 'Organic Raspberries',
 'Strawberries']

### 4. Shop Unique Items

In [13]:
def get_k_unique(k, df_items):
    """
    Returns the `k` unique products based on purchase count in the dataset
    """
    unique_products = list(df_items["product_name"].value_counts().tail(k).index)
    return unique_products

In [14]:
get_k_unique(10,df_merged_order_products_prior)

['Piquillo & Jalapeno Bruschetta',
 'Greek Blended Cherry Fat Free Yogurt',
 'Blueberry Blast Fruit and Chia Seed Bar',
 'Imported Stout Draught Style',
 'Raspberry Blast Fruit and Chia Seed Bar',
 'Sloppy Joe Sandwich Makers',
 'Lindor Peppermint White Chocolate Truffles',
 'Multigrain Penne Rigate',
 'Orange Recharge',
 'Dynostix Rawhide Chew With Meat']

## Part 2 - Build Reommendation Engine using Collaborative Filtering for Implicit Data

**Collborative Filtering:** A technique used for Recommendations by collecting user’s past behaviors (items previously purchased or reordered) as well as similar decisions made by other users.
<br>*Assumption:* If a person A has the same opinion as a person B on a product, A is more likely to have B's opinion on a different product than that of a randomly chosen person. Hence, These predictions are specific to the user, but use information gleaned from many users. This differs from the simpler approach of giving an average (non-specific) score for each item of interest, for example based on its number of times bought/ reordered etc.
<br><br>**Implicit Data:** The data that we gather from the users behaviour, with no ratings or specific actions are Implicit Data. For example, with star ratings we know that a 1 means the user did not like the item and a 5 that they really loved it. But here, we do not have a rating for any item. So, we need to build a Recommendation Engine based on what items a customer purchased and how many times (Reordered).

### Model 1: Calculate Cosine Similarity

Mathematically **Cosine similarity** is a measure of similarity between two non-zero vectors by calculating the cosine of the angle between them.
<br>In simple words, we can use the Cosine Similarity algorithm to find similarity between two things. In this scenario, we can find similar **Users** or similar **Products**. 
<br> Let's find the similar Users and thus do some recommendations.

#### Get the list of orders that have been reordered before:

In [15]:
# get the list of orders that have been reordered before
df_reorders = df_merged_order_products_prior[df_merged_order_products_prior['reordered'] == 1]

#### Filter out the high volumn products that user reordered more than once:

In [16]:
df_reorders['high_volume'] = (df_reorders['product_id'].value_counts().sort_values(ascending=False)>1)

#### Get orderwise user details

In [17]:
orders = df_orders[['order_id', 'user_id']]

#### Merge to get user_id and product_id

In [18]:
user_orders = df_reorders.merge(orders, on='order_id')

#### Get High Volume Orders

In [19]:
df_high_volume = user_orders[user_orders['high_volume'] == True]
df_high_volume.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,department,high_volume,user_id
1,2,28985,2,1,Michigan Organic Kale,83,4,produce,True,202279
2,2,45918,4,1,Coconut Butter,19,13,pantry,True,202279
3,2,17794,6,1,Carrots,83,4,produce,True,202279
4,2,40141,7,1,Original Unflavored Gelatine Mix,105,13,pantry,True,202279
5,2,1819,8,1,All Natural No Stir Creamy Almond Butter,88,13,pantry,True,202279


#### Get a matrix of different high volume items that a particular user purchased:

In [20]:
df_high_volume_users = df_high_volume.groupby(['user_id', 'product_name']).size().sort_values(ascending=False).unstack().fillna(0)

Now, Let's take a look at the user and product matrix.

In [21]:
df_high_volume_users.head()

product_name,0% Fat Blueberry Greek Yogurt,0% Fat Free Organic Milk,0% Fat Organic Greek Vanilla Yogurt,0% Greek Strained Yogurt,0% Greek Yogurt Black Cherry on the Bottom,"0% Greek, Blueberry on the Bottom Yogurt",1 Apple + 1 Mango Fruit Bar,1 Apple + 1 Pear Fruit Bar,1 Liter,1 Ply Paper Towels,...,Zingers Cakes,Zucchini Banana & Amaranth Organic Baby Food,Zucchini Gingerbread Carrot Smart Cookies,Zucchini Noodles,in Gravy with Carrots Peas & Corn Mashed Potatoes & Meatloaf Nuggets,of Hanover 100 Calorie Pretzels Mini,smartwater® Electrolyte Enhanced Water,vitaminwater® XXX Acai Blueberry Pomegranate,with Crispy Almonds Cereal,with Olive Oil Mayonnaise
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
66,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
90,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
150,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
206,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
208,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Calculate Similarity between each User:

In [22]:
# calculate similarity between each user
cosine_dists = pd.DataFrame(cosine_similarity(df_high_volume_users),index=df_high_volume_users.index,
                            columns=df_high_volume_users.index)
cosine_dists.head()

user_id,66,90,150,206,208,222,382,451,503,508,...,205787,205794,205888,205908,205943,205970,205990,206105,206162,206206
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
66,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
90,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
150,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
206,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.25,0.0,0.0,0.353553,0.0,0.0,0.0,0.0,0.0,0.0
208,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The cosine of a 0 degree angle is 1. Therefore, if the value is closer to 1, the more similar the items are.
<br><br>Now, let's define a function for the Recommendation System. We will recommend items based on similar profiles that have similar purchase history as our target customer.

In [23]:
def Cos_Similarity_Recommender(df, user_id,count):
    
    '''
    enter user_id and return a list of some recommendations.
    '''
    p = df.groupby(['product_name','user_id']).size().sort_values(ascending=False).unstack().fillna(0)
    
    recommendations = pd.DataFrame(np.dot(p.values,cosine_dists[user_id]), index=p.index)
    recommendations = recommendations.reset_index()
    recommendations.columns = ['product_name','value']
    return recommendations.sort_values('value', ascending=False).head(count)

Now, it is time for some recommendation using the **Cosine Similarity** Model prepared above. 

#### Recommendation for customer# 90

In [24]:
# recommendation for customer id 90
user_id = 90
print("Recommended Products for User : ",user_id )
print("=======================================")
print(Cos_Similarity_Recommender(df_high_volume,user_id,10))

Recommended Products for User :  90
                                       product_name     value
1922         Flax Plus Organic Pumpkin Flax Granola  1.986900
6228          Sweet & Salty Nut Almond Granola Bars  1.519338
4998                     Peach-Pear Sparkling Water  1.411001
6022        Sparkling Water, Natural Mango Essenced  1.347096
4147                 Organic Heritage Flakes Cereal  1.336255
2505  Healthy Grains Granola Bar, Vanilla Blueberry  1.219491
4355       Organic Pink Lemonade Bunny Fruit Snacks  1.102062
1560                           Dark Chocolate Minis  1.000000
439                                          Banana  0.753806
2923                          Lemon Sparkling Water  0.480727


Now, let's define a function to find out the items actually bought by the customer.

In [25]:
def actual_purchased(df_test, user_id):
    '''
    enter user_id and return a list of the actual purchase of the user.
    '''
    df = list(df_test[df_test.user_id == user_id]['product_name'].unique())
    return df

#### Actual Products Purchased by Customer#90

In [26]:
df_test = df_merged_order_products_train.merge(orders, on='order_id')

In [27]:
user_id = 90
print("Atual Products Purchased by User: ",user_id )
print("=======================================")
print(actual_purchased(df_test,user_id))

Atual Products Purchased by User:  90
['Sea Salt Soiree Intense Dark Chocolate Squares', 'Gluten Free Honey Almond Granola', 'Organic Graham Crunch Cereal', 'Organic Heritage Flakes Cereal', "Annie's Bunny Fruit Snacks Variety"]


Interestingly, the Customer#90 has ordered some cereals and fruit snacks. 
<br>He was recommneded varieties of cereal and some granola and "Organic Pink Lemonade Bunny Fruit Snacks" by the above Model.

###  Model 2: Matrix Factorization using ALS (Alternating Least Squares)

**Alternating Least Squares (ALS):** 
Alternating Least Squares (ALS) is a the model we’ll use to fit our data and find similarities. ALS uses Matrix Factorization method for recommendations.

**Matrix Factorization:** The idea is to take a large matrix and factor it into some smaller representation of the original matrix.  
<br>We have an original matrix R of size **MxN**, where M is the number of users and N is the number of items. This matrix is quite sparse, since most users only interact with a few items each. We can factorize this matrix into two separate smaller matrices: one with dimensions **MxK** which will be our latent user feature vectors for each user (U) and a second with dimensions **KxN**, which will have our latent item feature vectors for each item (V). Multiplying these two feature matrices together approximates the original matrix, but now we have two matrices that are dense including a number of latent features K for each of our items and users.We calculate U and V so that their product approximates R as closely as possible: **R ≈ U x V.**

### Prepare the Data

In [28]:
# Training Dataset Based on Reordered Quantity
data = df_merged_order_products_prior.merge(orders, on='order_id')
data = data[["user_id", "product_id","product_name","reordered"]]
data = data.groupby(["user_id", "product_id","product_name"])['reordered'].sum().reset_index()

In [29]:
# Convert product id into category
c = data.product_id.astype('category')
d = dict(enumerate(c.cat.categories))
#print (d)
data["user_id"] = data["user_id"].astype("category").cat.codes
data["product_id"] = data["product_id"].astype("category").cat.codes
data['prev_product_id'] = data['product_id'].map(d)
data[:5]

Unnamed: 0,user_id,product_id,product_name,reordered,prev_product_id
0,0,195,Soda,9,196
1,0,10254,Pistachios,8,10258
2,0,10322,Organic Fuji Apples,0,10326
3,0,12423,Original Beef Jerky,9,12427
4,0,13028,Cinnamon Toast Crunch,2,13032


The implicit library expects data as a item-user matrix. So, we create two matricies, one for fitting the model (item-user) and one for recommendations (user-item)

#### Get a Compressed Sparse Row (CSR) Matrix

In [30]:
# item_user matrix
sparse_item_user = sparse.csr_matrix((data['reordered'].astype(float), (data['prev_product_id'], data['user_id'])))
# user_item matrix
sparse_user_item = sparse.csr_matrix((data['reordered'].astype(float), (data['user_id'], data['prev_product_id'])))

#### Get a sparse matrix in COOrdinate format

In [31]:
item_user_matrix = sparse.coo_matrix((data["reordered"],
                                            (data["product_id"],
                                             data["user_id"])))
# Contruct a sparse matrix for our users and items containing number of reordered
item_user_matrix = item_user_matrix.tocsr()

Let's check the sparsity of the matrix!

In [32]:
matrix_size = item_user_matrix.shape[0]*item_user_matrix.shape[1] # Number of possible interactions in the matrix
num_purchases = len(item_user_matrix.nonzero()[0]) # Number of items interacted with
sparsity = 100*(1 - (num_purchases/matrix_size))
sparsity

99.94801504451148

### Build an ALS Model

In [33]:
def confidence_matrix(input_matrix, alpha):
    """
    Given a utility matrix,
    Returns the given matrix converted to a confidence matrix
    """
    return (input_matrix * alpha).astype("double")

In [34]:
def implicit_als(input_matrix, **kwargs):
    """
    Given the utility matrix and model parameters,
    Builds models and writes it to disk 
    Args:
    sparse_data (csr_matrix): Our sparse user-by-item matrix
    alpha_val (int)         : The rate in which we'll increase our confidence in a preference with more interactions.
    
    """
    start = time.time()
    
    # Build model
    print("Building ALS model with alpha: {} ".format(kwargs["alpha"]))
    model = AlternatingLeastSquares(factors=20, regularization=0.1, iterations=50)
    #model.approximate_similar_items = True
    
    # Calculate the confidence by multiplying it by alpha value.
    data_conf = confidence_matrix(input_matrix, kwargs["alpha"])
    
    model.fit(data_conf)

    # Save model to disk
    filename = 'baseline_model.sav'
    pickle.dump(model, open(filename, 'wb'))
    
    print("Completed in {:.2f}s".format(time.time() - start))


#### Call the Function to build the Model

In [35]:
# Specify model params and build it
## Alpha's in the range [10, 50] with a step size of 5 were tried. alpha = 25 was found to have the best overall 
## recall value. 
model_params = {"alpha": 25} 

# Build the Model
implicit_als(item_user_matrix, **model_params)

als_model = pickle.load(open('baseline_model.sav', 'rb'))



Building ALS model with alpha: 25 


HBox(children=(IntProgress(value=0, max=50), HTML(value='')))


Completed in 1299.64s


#### Find Similar Items 

Let's see some similar items as recommneded by the ALS Model built above.

In [36]:
def find_similar_items(model,item_id,n_similar):
    """
    Given an item, prints similar items
    """
    product_id = []
    product_name = []
    scores = []
    
    similar =  model.similar_items(item_id, n_similar)
    for item in similar:
        idx, score = item

        product_id.append(idx)
        scores.append(score)
        product_name.append(df_products.product_name.loc[df_products.product_id==idx].iloc[0])
    print("Similar Items to Item: ",df_products.product_name.loc[df_products.product_id==item_id].iloc[0])
    print("----------------------------------------------------------------------")
    print(list(product_name))

In [37]:
find_similar_items(als_model,14084,10)

Similar Items to Item:  Organic Unsweetened Vanilla Almond Milk
----------------------------------------------------------------------
['Organic Unsweetened Vanilla Almond Milk', 'Organic Ranch Dressing', 'Beef Stroganoff Sauce Mix', 'Organic Cannellini Beans', 'Pomegranate Cherry Sparkling Seltzer Water', 'Chunky Hearty Bean and Ham Soup', 'Organic Garnet Sweet Potato (Yam)', 'Marinated Cabecou', "Dunkin' Dark Roast Coffee Ground", 'Rub with Love Salmon Rub']


**Conclusion:** It seems like the ALS Model returned some other organic products and Food related products which make sense to some extent.

#### Get Recommendations for Users

Let’s examine the recommendations given to a particular user and see if the user has purchased any of the recommended products!

In [38]:
def get_als_recommendations(model,user_id,n_count):
    """
    Get Recommendations for users
    """
    # Recommend items for a user 
    recommendations = model.recommend(user_id, sparse_user_item, N = n_count)
    product_id = []
    product_name = []
    scores = []
    for item in recommendations:
        idx, score = item
        pid = data.prev_product_id.loc[data.product_id == idx].iloc[0]
        product_name.append(df_products.product_name.loc[df_products.product_id==pid].iloc[0])
        #scores.append(score)
    return (list(product_name))

#### Get Recommendations for Customer# 90 using ALS

In [39]:
# Let's test for user# 90
print("Recommended Prodcucts for user_id ",user_id)
print("----------------------------------------------------------------------")   
get_als_recommendations(als_model,90,10)

Recommended Prodcucts for user_id  90
----------------------------------------------------------------------


['Organic Reduced Fat 2% Milk',
 'Uncured Genoa Salami',
 'Organic Large Brown Grade AA Cage Free Eggs',
 'Organic Whole String Cheese',
 'Organic Milk',
 'Milk, Organic, Vitamin D',
 'Hass Avocados',
 'Mini Original Babybel Cheese',
 'Organic Raspberries',
 'Organic Half & Half']

## Compare & Analyze Recommendation Results

| Actual Purchase   |      Model1_Recommendations      |  Model2_Recommendations |
|----------|:-------------:|------:|
| Organic Graham Crunch Cereal|Organic Heritage Flakes Cereal | Organic Reduced Fat 2% Milk |
| Organic Heritage Flakes Cereal|Flax Plus Organic Pumpkin Flax Granola   |Organic Milk |
| Gluten Free Honey Almond Granola | Healthy Grains Granola Bar, Vanilla Blueberry |  Milk, Organic, Vitamin D |
| Sea Salt Soiree Intense Dark Chocolate Squares| Dark Chocolate Minis |Hass Avocados|
| Annie's Bunny Fruit Snacks Variety| Organic Pink Lemonade Bunny Fruit Snacks|Organic Raspberries|
|  NULL|      Sparkling Water, Natural Mango Essenced         |    Organic Large Brown Grade AA Cage Free Eggs| 
| NULL|Sweet & Salty Nut Almond Granola Bars |Organic Whole String Cheese|
|NULL | Lemon Sparkling Water| Mini Original Babybel Cheese |
|NULL|Banana|Uncured Genoa Salami|
|NULL|Peach-Pear Sparkling Water|Organic Half & Half|

Okay, in the above table, we can see the Actual Purcase of Customer# 90 and the Recommendations from Model1(Cosine Similarity ) and Model2(Alernating Least Sqaures).
<br>The Customer seems to buy Cereals and Granola and some Chocolates and fruit snacks.
<br><br> **Model1** has recommended some other variety of cereals and granolas and even some chocolate and fruit snacks which seems quite appropriate.
<br><br> Interestingly, **Model2** has recommended different varieties of Milk which goes perfectly with the Cereals and Granolas and some other breakfast items like eggs, salami and fruits.
<br><br> Thus, in my opinion both the recommendations are appropriate.
Now, let's see if we can evaluate the performance of both the Models with some Metrics.

## Evalutaion Metrics

Evaluating implicit feedback based recommendations is always tricky. Below is the approach that I have followed here:

- Used the high volumn dataset (products that users reordered more than once)
- Got the top 20 recommendations using both the models 
- Calculated Recall by using # of user actions (products bought) that were captured by top 20 recommendations.
- Calculated these for all the users and average them.

In [42]:
# filter high volume users
df_users = df_high_volume.user_id.unique().tolist()

In [40]:
def calc_recall():
    model1_score = []
    model2_score = []
    for user in sorted(df_users):
        ordered_items = data[data.user_id == user].product_name.value_counts().sort_values(ascending= False)
        model1_recommendations = Cos_Similarity_Recommender(df_high_volume,user,20)["product_name"].reset_index()
        model2_recommendations = get_als_recommendations(als_model,user,20)

        ordered_items_list = ordered_items.index.tolist()
        model1_recommendations_list = model1_recommendations.product_name.tolist()
        
        model1_score.append((len(set(model1_recommendations_list) & set(ordered_items_list)))/20)
        model2_score.append((len(set(model2_recommendations) & set(ordered_items_list)))/20)
    return np.mean(model1_score),np.mean(model2_score)


In [43]:
score1, score2 = calc_recall()

In [44]:
print("Score of Cosine Similarity Model: ", score1)
print("Score of ALS Model              : ", score2)

Score of Cosine Similarity Model:  0.1220662347012239
Score of ALS Model              :  0.22048236141108712


## Conclusion

As per the above metrics, it seems like the **ALS Model** did a little better than the **Cosine Similarity Model**.Let's take a few examples and understnad how the metrics actually worked.

In [48]:
df_users[0:5]

[202279, 205970, 178520, 156122, 3107]

#### Let's check for User# 202279

**check actual ordered items**

In [49]:
ordered_items = data[data.user_id == 202279].product_name.value_counts().sort_values(ascending= False)
ordered_items_list = ordered_items.index.tolist()
ordered_items_list

['Gala Apples',
 'Organic Hearts Of Romaine',
 'Classic Chevre Crumbled Goat Cheese',
 'Whipped Sweet Potatoes',
 'Organic Heavy Whipping Cream',
 'Classic Recipe Milk Chocolate Bar',
 'Red Vine Tomato',
 'Hot & Sour Soup Bowl',
 'Gluten Free White Cheddar Popped Corn Chips',
 'French Batard Bread',
 'Organic Italian Herb Pasta Sauce',
 'Apricot Preserves',
 'Cornichons Extra Fine Gherkins Hand-Picked',
 'Organic Russet Potato',
 'Organic Orange Soda',
 'Bunched Cilantro',
 'Couscous Original',
 'Orecchiette Dry Pasta',
 'Roasted Unsalted Almonds',
 'Organic Yellow Onion',
 'Strawberries',
 'Cheese Pizza Snacks',
 'Hazelnut Spread With Skim Milk & Cocoa',
 'Organic Blackberries',
 'Limes',
 'Chicken Bouillon Cubes',
 'Birthday Candles',
 'Gluten Free Chocolate Chip Cookies',
 'European Style Salted Butter',
 'Red Grapes',
 'Raspberry Preserves',
 'Lady Alice Apple',
 'Live Butter Lettuce',
 'Spanish Rice',
 'Cage Free Grade A Large Brown Eggs',
 'Healthy Weight Cat Food',
 'Extra Virgi

**Check Recommendationd from Model1 - Cosine Similarity Model**

In [51]:
model1_recommendations = Cos_Similarity_Recommender(df_high_volume,202279,20)["product_name"].reset_index()
model1_recommendations_list = model1_recommendations.product_name.tolist()  
model1_recommendations_list

['Carrots',
 'Michigan Organic Kale',
 'Bag of Organic Bananas',
 'Banana',
 'Organic Hass Avocado',
 'Organic Strawberries',
 'All Natural No Stir Creamy Almond Butter',
 'Organic Baby Spinach',
 'Organic Avocado',
 'Organic Yellow Onion',
 'Organic Red Bell Pepper',
 'Organic Lemon',
 'Cucumber Kirby',
 'Organic Large Extra Fancy Fuji Apple',
 'Limes',
 'Organic Small Bunch Celery',
 'Honeycrisp Apple',
 'Fresh Cauliflower',
 'Organic Zucchini',
 'Organic Whole Milk']

**Check Recommendationd from Model2 - ALS Model**

In [52]:
model2_recommendations = get_als_recommendations(als_model,202279,20)
model2_recommendations

['Banana',
 'Bag of Organic Bananas',
 'Organic Avocado',
 'Organic Strawberries',
 'Organic Baby Spinach',
 'Large Lemon',
 'Strawberries',
 'Organic Whole Milk',
 'Limes',
 'Organic Hass Avocado',
 'Cucumber Kirby',
 'Organic Fuji Apple',
 'Organic Blueberries',
 'Organic Raspberries',
 'Honeycrisp Apple',
 'Red Vine Tomato',
 'Organic Zucchini',
 'Seedless Red Grapes',
 'Asparagus',
 'Yellow Onions']

Now, let's take a look at the items actually bought by the user which were recommended by Model1.

In [53]:
set(model1_recommendations_list) & set(ordered_items_list)

{'Carrots', 'Limes', 'Organic Yellow Onion', 'Organic Zucchini'}

Similarly, let's take a look at the items actually bought by the user which were recommended by Model2

In [54]:
set(model2_recommendations) & set(ordered_items_list)

{'Large Lemon',
 'Limes',
 'Organic Zucchini',
 'Red Vine Tomato',
 'Strawberries',
 'Yellow Onions'}

For the above example, out of 20 recommendations, 4 items recommended by Model1 are actually purchased by the User. Similarly, out of 20 recommendations, 6 items recommended by Model2 are purchased by the User. 
Clearly,for this example, Model2 did a better job compare to Model1.