## Instacart Market Basket

### Project Overview:
Many companies are looking for ways to earn more money and in today's technological world, businesses are trying to use data science to understand how customers are buying products. Most customers when going on a grocery run, will not just buy one product. They will buy multiple items especially if it is an online grocery ordering and delivery app such as Instacart. We hope to use a dataset from Instacart to develop an association and correlation analysis with Market Basket Analysis. We will try to predict what products a customer will buy next from the previous orders and items they currently have in their cart. We also will find patterns among customers by the products they bought previously. 

### Business Problem
Use this anonymized data on customer orders over time. We will be using this data to apply the methods of Data Mining such as Market Basket Analysis and report the findings.

In [32]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25ldone
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=f4f0dcbd4b1c1abcdf7531f4990ad3d34fb64b2821e2400e3115fd540869809b
  Stored in directory: /Users/destinee/Library/Caches/pip/wheels/cb/f6/e1/57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [33]:
import numpy as np
import pandas as pd
from collections import Counter
from itertools import combinations
from itertools import groupby
import sys
from apyori import apriori

In [34]:
orders = pd.read_csv("order_products__prior.csv")
orders.head(5)

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered
0,2,33120,1,1
1,2,28985,2,1
2,2,9327,3,0
3,2,45918,4,1
4,2,30035,5,0


In [38]:
products = pd.read_csv("products.csv")
products.head(5)

Unnamed: 0,product_id,product_name,aisle_id,department_id
0,1,Chocolate Sandwich Cookies,61,19
1,2,All-Seasons Salt,104,13
2,3,Robust Golden Unsweetened Oolong Tea,94,7
3,4,Smart Ones Classic Favorites Mini Rigatoni Wit...,38,1
4,5,Green Chile Anytime Sauce,5,13


In [41]:
sale = orders.merge(products, left_on="product_id", right_on="product_id")

In [42]:
sale.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id
0,2,33120,1,1,Organic Egg Whites,86,16
1,26,33120,5,0,Organic Egg Whites,86,16
2,120,33120,13,0,Organic Egg Whites,86,16
3,327,33120,5,1,Organic Egg Whites,86,16
4,390,33120,28,1,Organic Egg Whites,86,16


In [44]:
#The products that only order_id 2 contained
sale.loc[sale['order_id']==2]

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id
0,2,33120,1,1,Organic Egg Whites,86,16
19400,2,28985,2,1,Michigan Organic Kale,83,4
86849,2,9327,3,0,Garlic Powder,104,13
93148,2,45918,4,1,Coconut Butter,19,13
93892,2,30035,5,0,Natural Sweetener,17,13
94461,2,17794,6,1,Carrots,83,4
167197,2,40141,7,1,Original Unflavored Gelatine Mix,105,13
168386,2,1819,8,1,All Natural No Stir Creamy Almond Butter,88,13
170810,2,43668,9,0,Classic Blend Cole Slaw,123,4


In [45]:
#Just to make sure that only product_id 33120 is Organic Egg Whites
sale.loc[sale['product_id']==33120]

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id
0,2,33120,1,1,Organic Egg Whites,86,16
1,26,33120,5,0,Organic Egg Whites,86,16
2,120,33120,13,0,Organic Egg Whites,86,16
3,327,33120,5,1,Organic Egg Whites,86,16
4,390,33120,28,1,Organic Egg Whites,86,16
...,...,...,...,...,...,...,...
19395,3420280,33120,6,1,Organic Egg Whites,86,16
19396,3420373,33120,11,0,Organic Egg Whites,86,16
19397,3420587,33120,1,1,Organic Egg Whites,86,16
19398,3420711,33120,8,0,Organic Egg Whites,86,16


In [48]:
#Reduce the dataframe to only order_id and product_name
reduce_sale = sale[['order_id', 'product_name']]
reduce_sale.head()

Unnamed: 0,order_id,product_name
0,2,Organic Egg Whites
1,26,Organic Egg Whites
2,120,Organic Egg Whites
3,327,Organic Egg Whites
4,390,Organic Egg Whites


In [50]:
#This is what order_id 2 contained
reduce_sale.loc[reduce_sale['order_id']==2]

Unnamed: 0,order_id,product_name
0,2,Organic Egg Whites
19400,2,Michigan Organic Kale
86849,2,Garlic Powder
93148,2,Coconut Butter
93892,2,Natural Sweetener
94461,2,Carrots
167197,2,Original Unflavored Gelatine Mix
168386,2,All Natural No Stir Creamy Almond Butter
170810,2,Classic Blend Cole Slaw


In [79]:
#Group by order_id number
#This takes a bit of time to run

sale_list= reduce_sale.groupby("order_id")["product_name"].agg(",".join)
sale_list.head(5)

order_id
2    Organic Egg Whites,Michigan Organic Kale,Garli...
3    Total 2% with Strawberry Lowfat Greek Strained...
4    Plain Pre-Sliced Bagels,Honey/Lemon Cough Drop...
5    Bag of Organic Bananas,Just Crisp, Parmesan,Fr...
6    Cleanse,Dryer Sheets Geranium Scent,Clean Day ...
Name: product_name, dtype: object

In [81]:
#It matches with the reduce_sale dataframe

sale_list.values[0]

'Organic Egg Whites,Michigan Organic Kale,Garlic Powder,Coconut Butter,Natural Sweetener,Carrots,Original Unflavored Gelatine Mix,All Natural No Stir Creamy Almond Butter,Classic Blend Cole Slaw'

In [91]:
#Back into DataFrame
sale_list = pd.DataFrame(sale_list)
sale_list.head()

Unnamed: 0_level_0,product_name
order_id,Unnamed: 1_level_1
2,"Organic Egg Whites,Michigan Organic Kale,Garli..."
3,Total 2% with Strawberry Lowfat Greek Strained...
4,"Plain Pre-Sliced Bagels,Honey/Lemon Cough Drop..."
5,"Bag of Organic Bananas,Just Crisp, Parmesan,Fr..."
6,"Cleanse,Dryer Sheets Geranium Scent,Clean Day ..."


In [105]:
split_data = sale_list["product_name"].str.split(",")
data = split_data.to_list()
final_sale = pd.DataFrame(data)

In [107]:
#All products in the row are in one order_id
final_sale.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,140,141,142,143,144,145,146,147,148,149
0,Organic Egg Whites,Michigan Organic Kale,Garlic Powder,Coconut Butter,Natural Sweetener,Carrots,Original Unflavored Gelatine Mix,All Natural No Stir Creamy Almond Butter,Classic Blend Cole Slaw,,...,,,,,,,,,,
1,Total 2% with Strawberry Lowfat Greek Strained...,Unsweetened Almondmilk,Lemons,Organic Baby Spinach,Unsweetened Chocolate Almond Breeze Almond Milk,Organic Ginger Root,Air Chilled Organic Boneless Skinless Chicken ...,Organic Ezekiel 49 Bread Cinnamon Raisin,,,...,,,,,,,,,,
2,Plain Pre-Sliced Bagels,Honey/Lemon Cough Drops,Chewy 25% Low Sugar Chocolate Chip Granola,Oats & Chocolate Chewy Bars,Kellogg's Nutri-Grain Apple Cinnamon Cereal,Nutri-Grain Soft Baked Strawberry Cereal Break...,Kellogg's Nutri-Grain Blueberry Cereal,Tiny Twists Pretzels,Traditional Snack Mix,Goldfish Cheddar Baked Snack Crackers,...,,,,,,,,,,
3,Bag of Organic Bananas,Just Crisp,Parmesan,Fresh Fruit Salad,Organic Raspberries,2% Reduced Fat Milk,Sensitive Toilet Paper,Natural Artesian Water,Mini & Mobile,Matzos,...,,,,,,,,,,
4,Cleanse,Dryer Sheets Geranium Scent,Clean Day Lavender Scent Room Freshener Spray,,,,,,,,...,,,,,,,,,,


In [108]:
#Replace the None with 0
final_sale.fillna(0, inplace=True)

In [109]:
final_sale.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,140,141,142,143,144,145,146,147,148,149
0,Organic Egg Whites,Michigan Organic Kale,Garlic Powder,Coconut Butter,Natural Sweetener,Carrots,Original Unflavored Gelatine Mix,All Natural No Stir Creamy Almond Butter,Classic Blend Cole Slaw,0,...,0,0,0,0,0,0,0,0,0,0
1,Total 2% with Strawberry Lowfat Greek Strained...,Unsweetened Almondmilk,Lemons,Organic Baby Spinach,Unsweetened Chocolate Almond Breeze Almond Milk,Organic Ginger Root,Air Chilled Organic Boneless Skinless Chicken ...,Organic Ezekiel 49 Bread Cinnamon Raisin,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Plain Pre-Sliced Bagels,Honey/Lemon Cough Drops,Chewy 25% Low Sugar Chocolate Chip Granola,Oats & Chocolate Chewy Bars,Kellogg's Nutri-Grain Apple Cinnamon Cereal,Nutri-Grain Soft Baked Strawberry Cereal Break...,Kellogg's Nutri-Grain Blueberry Cereal,Tiny Twists Pretzels,Traditional Snack Mix,Goldfish Cheddar Baked Snack Crackers,...,0,0,0,0,0,0,0,0,0,0
3,Bag of Organic Bananas,Just Crisp,Parmesan,Fresh Fruit Salad,Organic Raspberries,2% Reduced Fat Milk,Sensitive Toilet Paper,Natural Artesian Water,Mini & Mobile,Matzos,...,0,0,0,0,0,0,0,0,0,0
4,Cleanse,Dryer Sheets Geranium Scent,Clean Day Lavender Scent Room Freshener Spray,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [110]:
#Convert into a single list for apriori
transactions = []
for i in range(0, len(final_sale)):
    transactions.append([str(final_sale.values[i,j]) for j in range(0,20) if str(final_sale.values[i,j])!=0])

KeyboardInterrupt: 

In [113]:
data[0]

['Organic Egg Whites',
 'Michigan Organic Kale',
 'Garlic Powder',
 'Coconut Butter',
 'Natural Sweetener',
 'Carrots',
 'Original Unflavored Gelatine Mix',
 'All Natural No Stir Creamy Almond Butter',
 'Classic Blend Cole Slaw']

In [138]:
#Rules were created from these parameters
rules = apriori(data, min_support=0.003, min_confidance=0.2, min_lift=3, min_length=2)

In [139]:
#Generated apriori object
rules

<generator object apriori at 0x7fd379722f50>

In [140]:
results = list(rules)
results

[RelationRecord(items=frozenset({' 2% Milkfat', 'Organic Milk Reduced Fat'}), support=0.003961897107009482, ordered_statistics=[OrderedStatistic(items_base=frozenset({' 2% Milkfat'}), items_add=frozenset({'Organic Milk Reduced Fat'}), confidence=0.6110924531017609, lift=154.24238353404021), OrderedStatistic(items_base=frozenset({'Organic Milk Reduced Fat'}), items_add=frozenset({' 2% Milkfat'}), confidence=1.0, lift=154.24238353404021)]),
 RelationRecord(items=frozenset({' Baby Bok Choy', ' Sweet Baby Kale'}), support=0.004532681529664926, ordered_statistics=[OrderedStatistic(items_base=frozenset({' Baby Bok Choy'}), items_add=frozenset({' Sweet Baby Kale'}), confidence=1.0, lift=220.61995608015374), OrderedStatistic(items_base=frozenset({' Sweet Baby Kale'}), items_add=frozenset({' Baby Bok Choy'}), confidence=1.0, lift=220.61995608015374)]),
 RelationRecord(items=frozenset({' Baby Bok Choy', 'Super Spinach! Baby Spinach'}), support=0.004532681529664926, ordered_statistics=[OrderedSta

In [141]:
#Generate dataframe
df_results = pd.DataFrame(results)

In [142]:
df_results.head()

Unnamed: 0,items,support,ordered_statistics
0,"( 2% Milkfat, Organic Milk Reduced Fat)",0.003962,"[(( 2% Milkfat), (Organic Milk Reduced Fat), 0..."
1,"( Baby Bok Choy, Sweet Baby Kale)",0.004533,"[(( Baby Bok Choy), ( Sweet Baby Kale), 1.0, 2..."
2,"( Baby Bok Choy, Super Spinach! Baby Spinach)",0.004533,"[(( Baby Bok Choy), (Super Spinach! Baby Spina..."
3,"( Bag, Clementines)",0.012405,"[(( Bag), (Clementines), 0.8679056406685236, 3..."
4,"( Butter, Bibb) Lettuce)",0.003922,"[(( Bibb) Lettuce), ( Butter), 1.0, 245.879464..."


In [144]:
#Keep the support numbers
support = df_results.support

In [146]:
#Extract the confidence and lift from order_statistics
one = []
two = []
three = []
four = []

for i in range(df_results.shape[0]):
    single_list = df_results["ordered_statistics"][i][0]
    one.append(list(single_list[0]))
    two.append(list(single_list[1]))
    three.append(single_list[2])
    four.append(single_list[3])


In [147]:
ante = pd.DataFrame(one)
cons = pd.DataFrame(two)
confidence = pd.DataFrame(three, columns = ["Confidence"])
lift = pd.DataFrame(four, columns = ["Lift"])

In [152]:
#Removed the None with blanks
df_final = pd.concat([ante,cons,support,confidence,lift],axis=1)
df_final

Unnamed: 0,0,0.1,1,support,Confidence,Lift
0,2% Milkfat,Organic Milk Reduced Fat,,0.003962,0.611092,154.242384
1,Baby Bok Choy,Sweet Baby Kale,,0.004533,1.0,220.619956
2,Baby Bok Choy,Super Spinach! Baby Spinach,,0.004533,1.0,220.619956
3,Bag,Clementines,,0.012405,0.867906,39.486673
4,Bibb) Lettuce,Butter,,0.003922,1.0,245.879465
5,Bibb) Lettuce,Organic Butterhead (Boston,,0.003775,0.962566,254.966611
6,Bunch,Flat Parsley,,0.004161,0.294022,70.656571
7,Bunch,Organic Red Radish,,0.008469,0.598374,70.656571
8,Butter,Organic Butterhead (Boston,,0.003775,0.92826,245.879465
9,Coconut,Strained Low-Fat,,0.003771,0.98369,260.862869


In [153]:
df_final.fillna(value=" ", inplace=True)

In [154]:
#Add column names
df_final.columns = ["Item1", "Item2","Item3", "Support", "Confidence", "Lift"]

In [155]:
df_final.head()

Unnamed: 0,Item1,Item2,Item3,Support,Confidence,Lift
0,2% Milkfat,Organic Milk Reduced Fat,,0.003962,0.611092,154.242384
1,Baby Bok Choy,Sweet Baby Kale,,0.004533,1.0,220.619956
2,Baby Bok Choy,Super Spinach! Baby Spinach,,0.004533,1.0,220.619956
3,Bag,Clementines,,0.012405,0.867906,39.486673
4,Bibb) Lettuce,Butter,,0.003922,1.0,245.879465


In [158]:
#Sort by highest confidence levels

df_final.sort_values(["Confidence"], ascending =[0])

Unnamed: 0,Item1,Item2,Item3,Support,Confidence,Lift
18,Sweet Baby Kale,Super Spinach! Baby Spinach,,0.004533,1.0,220.619956
2,Baby Bok Choy,Super Spinach! Baby Spinach,,0.004533,1.0,220.619956
4,Bibb) Lettuce,Butter,,0.003922,1.0,245.879465
1,Baby Bok Choy,Sweet Baby Kale,,0.004533,1.0,220.619956
48,Baby Bok Choy,Super Spinach! Baby Spinach,Sweet Baby Kale,0.004533,1.0,220.619956
15,Strained Low-Fat,Yogurt,,0.003771,1.0,126.525011
11,Grade A,Cage Free Brown Eggs-Large,,0.003865,0.998473,258.368078
10,Coconut,Yogurt,,0.003771,0.983772,124.4717
51,Coconut,Strained Low-Fat,Yogurt,0.003771,0.98369,260.862869
9,Coconut,Strained Low-Fat,,0.003771,0.98369,260.862869


### Findings
- Majority of customers buy fruits and vegetables
- If a customer has fruit in their basket, they will most likely buy more fruits