# Market Basket Analysis

You are again provided with a dataset  (MarketBasketAnalysis_Data.sas7bdat ) (in datasets folder)<br> 

This is a SAS dataset (  notice the ‘’sas7bdat’’ extension of the file).<br>


The Dataset contains transactions again. What rules can you mine from the data?<br>

Which rules have the greatest lift?

In [25]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

In [13]:
df= pd.read_sas("Datasets/MarketBasketAnalysis_Data.sas7bdat")

In [14]:
df.head()

Unnamed: 0,Quantity,Transaction,Store,Product
0,1.0,12359.0,2.0,b'Candy Bar'
1,2.0,12362.0,9.0,b'Pain Reliever'
2,2.0,12362.0,9.0,b'Pain Reliever'
3,1.0,12365.0,5.0,b'Toothpaste'
4,2.0,12371.0,2.0,b'Bow'


In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 459258 entries, 0 to 459257
Data columns (total 4 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   Quantity     459258 non-null  float64
 1   Transaction  459258 non-null  float64
 2   Store        459258 non-null  float64
 3   Product      459258 non-null  object 
dtypes: float64(3), object(1)
memory usage: 14.0+ MB


In [16]:
df['Product'] = df['Product'].astype('str')
df['Product'] = df['Product'].replace(to_replace='^b', value='', regex = True)
df['Product'] = df['Product'].replace(to_replace="'$", value="", regex = True)
df['Product'] = df['Product'].replace(to_replace="^'", value="", regex = True)
df.head()


Unnamed: 0,Quantity,Transaction,Store,Product
0,1.0,12359.0,2.0,Candy Bar
1,2.0,12362.0,9.0,Pain Reliever
2,2.0,12362.0,9.0,Pain Reliever
3,1.0,12365.0,5.0,Toothpaste
4,2.0,12371.0,2.0,Bow


In [20]:
# we need to create for each produck a column
basket = (df
          .groupby(['Transaction', 'Product'])['Quantity']
          .sum().unstack().reset_index().fillna(0)
          .set_index('Transaction'))

In [21]:
basket.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
12359.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12362.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12365.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
12371.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12380.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [22]:
# we care only if the product exist or not so we convert all numbers to 0 or 1
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

basket_sets = basket.applymap(encode_units)

In [23]:
basket_sets.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
12359.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
12362.0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
12365.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
12371.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
12380.0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0


In [26]:
frequent_itemsets = apriori(basket_sets, min_support=0.01, use_colnames=True,max_len=4)



In [27]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.sort_values('lift',ascending=False).head(20)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
19,(Perfume),(Toothbrush),0.08996,0.06735,0.02182,0.242552,3.60137,0.015761,1.231306
18,(Toothbrush),(Perfume),0.06735,0.08996,0.02182,0.323979,3.60137,0.015761,1.346172
0,(Toothbrush),(Bow),0.06735,0.054645,0.01134,0.168374,3.081236,0.00766,1.136755
1,(Bow),(Toothbrush),0.054645,0.06735,0.01134,0.207521,3.081236,0.00766,1.176877
23,(Greeting Cards),"(Candy Bar, Magazine)",0.146885,0.040535,0.016665,0.113456,2.798966,0.010711,1.082253
22,"(Candy Bar, Magazine)",(Greeting Cards),0.040535,0.146885,0.016665,0.411126,2.798966,0.010711,1.448723
45,"(Pencils, Toothpaste)",(Candy Bar),0.02456,0.171005,0.01139,0.463762,2.71198,0.00719,1.545947
48,(Candy Bar),"(Pencils, Toothpaste)",0.171005,0.02456,0.01139,0.066606,2.71198,0.00719,1.045047
24,(Candy Bar),"(Greeting Cards, Magazine)",0.171005,0.036335,0.016665,0.097453,2.682078,0.010452,1.067718
21,"(Greeting Cards, Magazine)",(Candy Bar),0.036335,0.171005,0.016665,0.458649,2.682078,0.010452,1.531344
