# Instacart Recommender System - Recommender
---

## Table of Contents

* [1. Defining Recommender](#chapter1)
* [2. Recommender Evaluation](#chapter2)
    * [2.1 Top 10 Aisles](#chapter2_1)
    * [2.2 Cluster Differentiation](#chapter2_2)
* [3. Conclusion](#chapter3)
    * [3.1 Next Steps](#chapter3_1)
    * [3.2 Limitations and Points for Further Improvement](#chapter3_2)

In [225]:
import pandas as pd
import numpy as np
import random

In [226]:
cluster_0_items = pd.read_csv('../data/cluster_0_item_rules.csv')
cluster_1_items = pd.read_csv('../data/cluster_1_item_rules.csv')
cluster_2_items = pd.read_csv('../data/cluster_2_item_rules.csv')
cluster_3_items = pd.read_csv('../data/cluster_3_item_rules.csv')
cluster_4_items = pd.read_csv('../data/cluster_4_item_rules.csv')
cluster_5_items = pd.read_csv('../data/cluster_5_item_rules.csv')

cluster_0_aisle = pd.read_csv('../data/cluster_0_aisle_rules.csv')
cluster_1_aisle = pd.read_csv('../data/cluster_1_aisle_rules.csv')
cluster_2_aisle = pd.read_csv('../data/cluster_2_aisle_rules.csv')
cluster_3_aisle = pd.read_csv('../data/cluster_3_aisle_rules.csv')
cluster_4_aisle = pd.read_csv('../data/cluster_4_aisle_rules.csv')
cluster_5_aisle = pd.read_csv('../data/cluster_5_aisle_rules.csv')


In [227]:
products = pd.read_csv('../data/products.csv')
aisles = pd.read_csv('../data/aisles.csv')
cluster_df = pd.read_csv('../data/cluster_data.csv')
orders = pd.read_csv('../data/complete_orders.csv')

## 1. Defining Recommender <a class="anchor" id="chapter1"></a>
---

The recommender will be constructed based on the following logic:
- Based on a item that a user has added to his/her cart, the recommender will push 5 other items across to the user
- The 5 items pushed will be based on the association rules, specifically the 5 items with the greatest lift for that particular cluster of customers
- Lift needs to be greater than 1 for a positive correlation between the 2 items
- If there are not enough items that fulfil the criteria to be recommended, the associated aisles with the greatest lifts will be used instead
- Highest frequency sold products within the associated aisles will then be used as recommendations

Inputs:
- `user_id`: To determine the cluster the user belongs to and use the metrics based on association rules for that particular cluster
- `product_id`: To find closely associated products with the item the customer has just added to their cart
- `num_rec`: To determine the number of recommendations to be pushed. This has been set as 5.
- `lift`: Lift threshold.

In [228]:
cluster_items_dict = {}
cluster_items_dict[0] = cluster_0_items
cluster_items_dict[1] = cluster_1_items
cluster_items_dict[2] = cluster_2_items
cluster_items_dict[3] = cluster_3_items
cluster_items_dict[4] = cluster_4_items
cluster_items_dict[5] = cluster_5_items

cluster_aisle_dict = {}
cluster_aisle_dict[0] = cluster_0_aisle
cluster_aisle_dict[1] = cluster_1_aisle
cluster_aisle_dict[2] = cluster_2_aisle
cluster_aisle_dict[3] = cluster_3_aisle
cluster_aisle_dict[4] = cluster_4_aisle
cluster_aisle_dict[5] = cluster_5_aisle


In [229]:
def prod_recommender(user_id, product_id, num_rec, lift):
    # Get product name for item just added to cart
    p_name = products[products['product_id']==product_id].product_name[product_id-1]
    # Get aisle_id for item just added to cart
    aisle_id = products[products['product_id']==product_id].aisle_id[product_id-1]
    # Get aisle name for item just added to cart
    aisle_name = aisles[aisles['aisle_id']==aisle_id].aisle[aisle_id-1]
    # Get cluster the customer belongs to
    cluster = cluster_df[cluster_df['user_id']==user_id].cluster[user_id-1]
    
    print(f"Product added to cart: {product_id} - {p_name}.")
    print(f"Customer is grouped in cluster: {cluster}.")
    
    # Define data for items based on the customer's cluster
    data_items = cluster_items_dict[cluster]
    # Filtering products associated with item added to cart and lift > 1
    data_items = data_items[(data_items['item_A']==product_id) & (data_items['lift']>lift)][['item_A',
                                                                                             'item_B',
                                                                                             'product_name_A',
                                                                                             'product_name_B',
                                                                                             'confidenceAtoB',
                                                                                             'lift']]
    # Sorting products based on lift values
    data_items = data_items.sort_values('lift', ascending=False)
    rec_items = data_items[['item_B', 'product_name_B']].rename(columns= {'item_B':'product_id',
                                                                          'product_name_B': 'product_name'})
    
    # If item association does not have enough products to product 5 recommendations, look at closely associated aisles instead
    if len(rec_items) < num_rec:
        # Define required number to fulfil specified number of recommendatations
        n_aisles = num_rec - len(rec_items)
        # Define data to associated aisles for cluster
        data_aisles = cluster_aisle_dict[cluster]
        # Filter aisles associated with aisle of product added to cart and lift
        data_aisles = data_aisles[(data_aisles['item_A']==aisle_id) & (data_aisles['lift']>lift)]
        data_aisles = data_aisles.item_B.values[:n_aisles].tolist()
        
        aisle_recs = []
        for n in data_aisles:
            # Get top product based on cluster data for associated aisle
            top_product = orders[(orders.cluster==cluster) & (orders.aisle_id==n)]['product_id'].value_counts().index[random.choice([0,1,2,3,4])]
            aisle_recs.append(top_product)
        add_recs = pd.DataFrame(aisle_recs, columns=['product_id'])
        # Merge with product data to get product name
        add_recs = add_recs.merge(products, on='product_id')[['product_id', 'product_name']]
    
    # Join with original recommendations
        rec_items = pd.concat([rec_items, add_recs])
        display(rec_items)
        return rec_items
    

In [231]:
test_1 = prod_recommender(user_id=5332, product_id=32478, num_rec=5, lift=1)

Product added to cart: 32478 - Reduced Fat 2% Milk.
Customer is grouped in cluster: 3.


Unnamed: 0,product_id,product_name
1631,47766,Organic Avocado
29748,46979,Asparagus
24111,49683,Cucumber Kirby
13388,45007,Organic Zucchini
0,12456,Organic Sunday Bacon


Based on `test_1`, when `customer_id 145958`, who is clustered in cluster 1, added Milk to his cart, the recommender pushed items:
- Organic Avocados
- Asparagus
- Cucumber Kirby
- Organix Zucchini
- Organic Sunday Bacon

## 2. Recommender Evaluation <a class="anchor" id="chapter2"></a>
---

To evaluate the results of the recommender, recommender will be run on random products within the top 10 aisles and also comparing the results across clusters for the same product.

### 2.1 Top 10 Aisles <a class="anchor" id="chapter2_1"></a>

A random product will be selected from the top 10 aisles in the data based on volume. The recommender will then be run on these products for the same user.

In [232]:
# Get top 10 aisles by number of products sold
top10_aisles = list(orders.groupby('aisle_id').count()['order_id'].sort_values(ascending=False).index[:10])

# Get a random product each from the top 10 aisles
rand_prod = []
for n in top10_aisles:
    prod = random.choice(products[products.aisle_id==n]['product_id'].values)
    rand_prod.append(prod)

# Select a random user
user = random.choice(cluster_df.user_id.values)

In [233]:
# Run recommender for selected user on randomly selected products
for n in rand_prod:
    prod_recommender(user_id=user, product_id=n, num_rec=5, lift=1)

Product added to cart: 45231 - Bag Of Oranges.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
0,26728,Compostable Tall Trash Bags 13 gallon
1,8230,100% Recycled Bathroom Tissue
2,31268,Sandwich Bags
3,11819,Laundry Detergent Free & Clear
4,29195,Wild Sardines in Extra Virgin Olive Oil


Product added to cart: 7286 - Broccoli Crowns.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
0,17836,Crisp Cucumber & Melon Liquid Hand Soap
1,6210,Kitchen Scrubber Sponge
2,44643,Gallon Freezer Bags
3,48023,Wild Alaskan Pink Salmon
4,7821,Olive Oil & Vinegar Dressing


Product added to cart: 48907 - Garden Salad Mix.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
0,35168,Ezekiel 4:9 Sprouted Grain Tortillas
1,5240,Spaghetti No 12
2,11440,Chicken Breast Tenders Breaded
3,44422,Organic Old Fashioned Rolled Oats
4,25072,Organic Cheddar Snack Mix


Product added to cart: 26275 - Crunch Lemon Shortbread Flavor 0% Fat with Toppings Greek Yogurt.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
0,35168,Ezekiel 4:9 Sprouted Grain Tortillas
1,37825,Organic Whole Wheat Fusilli
2,6187,Raisin Bran Cereal
3,2452,Naturals Chicken Nuggets
4,27966,Organic Raspberries


Product added to cart: 121 - Sharp Cheddar.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
1705,24852,Banana
0,33439,Deep Moisture Body Wash
1,7076,Grain Free Chicken Formula Cat Food
2,17471,Small Dog Biscuits
3,13085,Organic Hot Cocoa Mix


Product added to cart: 34594 - DHA Omega-3 Vanilla Lowfat Milk.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
0,47865,Quart Storage Bags
1,33043,Crescent Rolls
2,11941,"Tortillas, Corn, Organic"
3,32655,Organic Large Grade AA Brown Eggs
4,23734,Sour Cream


Product added to cart: 40939 - Drinking Water.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
0,20704,Brut
1,45190,Vodka
2,35601,Sensitive Skin Moisturizing Cream Soap Bars
3,38827,Organic Traditional Flour Tortillas
4,37825,Organic Whole Wheat Fusilli


Product added to cart: 17676 - White Fudge Covered Pretzels.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
0,35419,"Handmade Vodka From Austin, Texas"
1,35601,Sensitive Skin Moisturizing Cream Soap Bars
2,14872,All Natural Powder Cleanser
3,8006,Chopped Organic Garlic
4,35168,Ezekiel 4:9 Sprouted Grain Tortillas


Product added to cart: 21937 - Original Almond Milk Creamer.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
0,23115,Disinfecting Multi-Surface Cleaner Lemongrass ...
1,4317,Solid White Albacore Tuna In Water
2,27881,"Olives, Organic, Kalamata, Pitted"
3,7559,Cinnamon Rolls with Icing
4,19816,Sonoma Traditional Flour Tortillas 10 Count


Product added to cart: 666 - Whole Wheat Loaves.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
0,27240,Classic Lavender & Chamomile Liquid Hand Soap
1,43194,Natural Cellulose Scrub Sponge
2,38827,Organic Traditional Flour Tortillas
3,37825,Organic Whole Wheat Fusilli
4,35221,Lime Sparkling Water


While some products recommended do not seem to have relation to the item added to cart (eg, Apples & Spaghetti), this is due to the incidence of these 2 products being bought together being very high. This also gives better chances for other products to be exposed to gather interest from the customer (instead of just being recommended more fruits when an apple has been added to the cart).

### 2.2 Cluster Differentiation <a class="anchor" id="chapter2_1"></a>

A user will be randomly selected from each of the 6 different clusters. The recommender will then push products based on the same product to these users to identify any differences in recommendations across the clusters

In [234]:
rand_users = []
for n in range(0,6):
    data = cluster_df[['cluster','user_id']]
    user = random.choice(data[data.cluster==n].user_id.values)
    rand_users.append(user)

In [258]:
for user in rand_users:
    prod_recommender(user_id=user, product_id=3235, num_rec=5, lift=1)

Product added to cart: 3235 - Sriracha Flavor Tortilla Chips.
Customer is grouped in cluster: 0.


Unnamed: 0,product_id,product_name
0,14462,Sliced Black Olives
1,31801,9 Inch Plates
2,33439,Deep Moisture Body Wash
3,22952,Farfalle Pasta
4,21333,Original Whipped Cream Cheese


Product added to cart: 3235 - Sriracha Flavor Tortilla Chips.
Customer is grouped in cluster: 1.


Unnamed: 0,product_id,product_name
0,2855,Organic Good Seed Bread
1,37710,Trail Mix
2,20448,Organic Raisins
3,32689,Romaine Hearts
4,20019,Lowfat Kefir Smoothie Blueberry


Product added to cart: 3235 - Sriracha Flavor Tortilla Chips.
Customer is grouped in cluster: 2.


Unnamed: 0,product_id,product_name
0,28199,"Clementines, Bag"
1,31651,Extra Fancy Unsalted Mixed Nuts
2,43889,Dark Chocolate Covered Banana
3,2855,Organic Good Seed Bread
4,19767,Old Fashioned Oatmeal


Product added to cart: 3235 - Sriracha Flavor Tortilla Chips.
Customer is grouped in cluster: 3.


Unnamed: 0,product_id,product_name
0,42701,Organic Sour Cream
1,19508,Corn Tortillas
2,33198,Sparkling Natural Mineral Water
3,12872,Penne Rigate
4,21267,Sourdough Bread


Product added to cart: 3235 - Sriracha Flavor Tortilla Chips.
Customer is grouped in cluster: 4.


Unnamed: 0,product_id,product_name
0,35419,"Handmade Vodka From Austin, Texas"
1,17836,Crisp Cucumber & Melon Liquid Hand Soap
2,23115,Disinfecting Multi-Surface Cleaner Lemongrass ...
3,8006,Chopped Organic Garlic
4,19816,Sonoma Traditional Flour Tortillas 10 Count


Product added to cart: 3235 - Sriracha Flavor Tortilla Chips.
Customer is grouped in cluster: 5.


Unnamed: 0,product_id,product_name
0,21883,Irish Whiskey
1,19660,Spring Water
2,24489,Organic Whole Strawberries
3,21903,Organic Baby Spinach
4,46676,Total 0% Nonfat Greek Yogurt


Despite the same product being added to cart, due to the cluster differentiation, the recommender pushes different products to the customers from different clusters. This seems to cater to the preferences of that specific cluster.

## 3. Conclusion <a class="anchor" id="chapter3"></a>

### 3.1 Next Steps <a class="anchor" id="chapter3_1"></a>

To better assess the performance of the recommender, A/B testing is required. It is proposed to put the recommender into place and monitor the following metrics:
- Average order value: Hypothsize that recommender is effective in pushing products to customers, effectively raising the average order value
- Number of new products per order: Observe the average number of new products (never before purchased by the customer)
- Customer churn: Effective recommenders may lead to better customer experience that ultimately reduces customer churn

### 3.2 Limitations and Points for Further Improvement <a class="anchor" id="chapter3_2"></a>
- Clustering is based on product basket mixes, with no additional information such as order frequency or average order value. Such information may be useful in further defining the clusters.
- If no items within the item matrix satisfy lift criteria, the recommender resorts to looking at associated aisles and the top products within those aisles. Instead priority can be given to control these recommendations to be items that have higher margin or items that are currently on sale, in order to better incentivize click through rate of the recommended items.