# Objective:

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’.

That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item.  The data set is attached.

Your assignment is to use Python to mine the data for association rules. You should report support, confidence and lift and your top 10 rules by lift. 
.  

## Load Libraries

In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns

## Read Data

The raw data is in a format where each row is a list of items. In order to perform a market basket analysis, we need to transform the data where each item is a column and each row is represent a transaction with boolean values(True/False or 0/1) that way we use the apriori() and association_rules() from [mlxtend library](https://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/).

In [50]:
data = pd.read_csv("GroceryDataSet.csv",header=None)
data.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31
0,citrus fruit,semi-finished bread,margarine,ready soups,,,,,,,...,,,,,,,,,,
1,tropical fruit,yogurt,coffee,,,,,,,,...,,,,,,,,,,
2,whole milk,,,,,,,,,,...,,,,,,,,,,
3,pip fruit,yogurt,cream cheese,meat spreads,,,,,,,...,,,,,,,,,,
4,other vegetables,whole milk,condensed milk,long life bakery product,,,,,,,...,,,,,,,,,,
5,whole milk,butter,yogurt,rice,abrasive cleaner,,,,,,...,,,,,,,,,,
6,rolls/buns,,,,,,,,,,...,,,,,,,,,,
7,other vegetables,UHT-milk,rolls/buns,bottled beer,liquor (appetizer),,,,,,...,,,,,,,,,,
8,pot plants,,,,,,,,,,...,,,,,,,,,,
9,whole milk,cereals,,,,,,,,,...,,,,,,,,,,


In [16]:
transactions = data.values.tolist()

In [20]:
transactions = [[item for item in transaction if pd.notna(item)] for transaction in transactions]

In [115]:
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()

te_data = te.fit(transactions).transform(transactions)

df = pd.DataFrame(te_data, columns=te.columns_)
df

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,baby food,bags,baking powder,bathroom cleaner,beef,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9830,False,False,False,False,False,False,False,False,False,True,...,False,False,False,True,False,False,False,True,False,False
9831,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
9832,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
9833,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [155]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

frequent_itemsets = apriori(df, min_support=0.03, use_colnames=True) 

rules = association_rules(frequent_itemsets, metric='lift')  

rules.head()


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(bottled water),(whole milk),0.110524,0.255516,0.034367,0.310948,1.21694,0.006126,1.080446,0.200417
1,(whole milk),(bottled water),0.255516,0.110524,0.034367,0.134501,1.21694,0.006126,1.027703,0.23945
2,(whole milk),(citrus fruit),0.255516,0.082766,0.030503,0.119379,1.442377,0.009355,1.041577,0.411963
3,(citrus fruit),(whole milk),0.082766,0.255516,0.030503,0.36855,1.442377,0.009355,1.179008,0.334375
4,(rolls/buns),(other vegetables),0.183935,0.193493,0.042603,0.23162,1.197047,0.007013,1.04962,0.201713


In [156]:
rules_sorted = rules.sort_values(by = 'lift',axis = 0, ascending = False).head(20)[['antecedents','consequents','support','confidence','lift']]

top_10_rules = rules_sorted[::2].reset_index().drop('index', axis = 1)

top_10_rules

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(other vegetables),(root vegetables),0.047382,0.244877,2.246605
1,(sausage),(rolls/buns),0.030605,0.325758,1.771048
2,(tropical fruit),(other vegetables),0.035892,0.342054,1.76779
3,(whipped/sour cream),(whole milk),0.032232,0.449645,1.759754
4,(whole milk),(root vegetables),0.048907,0.191405,1.756031
5,(yogurt),(other vegetables),0.043416,0.311224,1.608457
6,(tropical fruit),(whole milk),0.042298,0.403101,1.577595
7,(yogurt),(whole milk),0.056024,0.401603,1.571735
8,(whole milk),(pip fruit),0.030097,0.117788,1.557043
9,(other vegetables),(whole milk),0.074835,0.386758,1.513634


Overall, customers who purchase vegetables are 2.25 times more likely to add root vegetables into their baskets. Another combination is customer who add sausages in their basket are 1.77 times more likely to also add roll/buns.

These association rules suggests that the store's layout designer should consider moving these items in close proximity or at least make it easier for the customers to find these items on their way to checkout. 