In [52]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

# Association Rule for Store Dataset

In this case study, we will explore how association rule can be used to analyze the items that are usualy purcased together.

you can refer to this article to find out about apriori and association rule:
https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/
https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/

## Load Data

We will use the dataset of the transaction in a certain store. You can get the dataset here: 
https://gist.githubusercontent.com/Harsh-Git-Hub/2979ec48043928ad9033d8469928e751/raw/72de943e040b8bd0d087624b154d41b2ba9d9b60/retail_dataset.csv

In [53]:
# load the data set and show the first five transaction
dt = pd.read_csv('https://gist.githubusercontent.com/Harsh-Git-Hub/2979ec48043928ad9033d8469928e751/raw/72de943e040b8bd0d087624b154d41b2ba9d9b60/retail_dataset.csv')
dt.head()

Unnamed: 0,0,1,2,3,4,5,6
0,Bread,Wine,Eggs,Meat,Cheese,Pencil,Diaper
1,Bread,Cheese,Meat,Diaper,Wine,Milk,Pencil
2,Cheese,Meat,Eggs,Milk,Wine,,
3,Cheese,Meat,Eggs,Milk,Wine,,
4,Meat,Pencil,Wine,,,,


# Get the set of product that has been purchased


Get the unique product that has been purchased

In [54]:
unique_items = set()
for i in range(0, len(dt)):
    for j in range(0, len(dt.columns)):
        unique_items.add(dt.values[i,j])
unique_items = list(unique_items)
print(unique_items)

['Bagel', 'Wine', 'Eggs', 'Milk', 'Pencil', 'Diaper', 'Cheese', nan, 'Bread', 'Meat']


## Preprocess Data

In this step, we will transform our dataset so that we will have a one hot encoding based on the purchased products.

In [55]:
#create an itemset based on the products
itemset = []
for i in range(0, len(dt)):
    itemset.append([str(dt.values[i,j]) for j in range(0, len(dt.columns))])
print(itemset)
# encoding the feature
te = TransactionEncoder()
te_ary = te.fit(itemset).transform(itemset)

[['Bread', 'Wine', 'Eggs', 'Meat', 'Cheese', 'Pencil', 'Diaper'], ['Bread', 'Cheese', 'Meat', 'Diaper', 'Wine', 'Milk', 'Pencil'], ['Cheese', 'Meat', 'Eggs', 'Milk', 'Wine', 'nan', 'nan'], ['Cheese', 'Meat', 'Eggs', 'Milk', 'Wine', 'nan', 'nan'], ['Meat', 'Pencil', 'Wine', 'nan', 'nan', 'nan', 'nan'], ['Eggs', 'Bread', 'Wine', 'Pencil', 'Milk', 'Diaper', 'Bagel'], ['Wine', 'Pencil', 'Eggs', 'Cheese', 'nan', 'nan', 'nan'], ['Bagel', 'Bread', 'Milk', 'Pencil', 'Diaper', 'nan', 'nan'], ['Bread', 'Diaper', 'Cheese', 'Milk', 'Wine', 'Eggs', 'nan'], ['Bagel', 'Wine', 'Diaper', 'Meat', 'Pencil', 'Eggs', 'Cheese'], ['Cheese', 'Meat', 'Eggs', 'Milk', 'Wine', 'nan', 'nan'], ['Bagel', 'Eggs', 'Meat', 'Bread', 'Diaper', 'Wine', 'Milk'], ['Bread', 'Diaper', 'Pencil', 'Bagel', 'Meat', 'nan', 'nan'], ['Bagel', 'Cheese', 'Milk', 'Meat', 'nan', 'nan', 'nan'], ['Bread', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan'], ['Pencil', 'Diaper', 'Bagel', 'nan', 'nan', 'nan', 'nan'], ['Meat', 'Bagel', 'Bread', 'nan',

In [56]:
  # create new dataframe from the encoded features
itemset_te_encoded = pd.DataFrame(te_ary, columns=te.columns_)
  # show the new dataframe
itemset_te_encoded.head()

Unnamed: 0,Bagel,Bread,Cheese,Diaper,Eggs,Meat,Milk,Pencil,Wine,nan
0,False,True,True,True,True,True,False,True,True,False
1,False,True,True,True,False,True,True,True,True,False
2,False,False,True,False,True,True,True,False,True,True
3,False,False,True,False,True,True,True,False,True,True
4,False,False,False,False,False,True,False,True,True,True


Since, the encoded dataframe consist of the empty column. We will drop the NaN column or select all columns other than the first column.

In [57]:
# drop the nan column
itemset_te_encoded.drop(['nan'], axis=1, inplace=True)
itemset_te_encoded.head()

Unnamed: 0,Bagel,Bread,Cheese,Diaper,Eggs,Meat,Milk,Pencil,Wine
0,False,True,True,True,True,True,False,True,True
1,False,True,True,True,False,True,True,True,True
2,False,False,True,False,True,True,True,False,True
3,False,False,True,False,True,True,True,False,True
4,False,False,False,False,False,True,False,True,True


## Apriori Algorithm

We will use appriori algorithm to determine the frequently purchased products. 
For this case study, we will min_support=0.2

In [58]:
frequent_itemsets = apriori(itemset_te_encoded, min_support=0.2, use_colnames=True)
frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.425397,(Bagel)
1,0.504762,(Bread)
2,0.501587,(Cheese)
3,0.406349,(Diaper)
4,0.438095,(Eggs)


Then, we will generate association rule of the frequent itemset based on confidence level with the threshold=0.6

In [59]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.6)
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Bagel),(Bread),0.425397,0.504762,0.279365,0.656716,1.301042,0.064641,1.44265,0.402687
1,(Eggs),(Cheese),0.438095,0.501587,0.298413,0.681159,1.358008,0.07867,1.563203,0.469167
2,(Cheese),(Meat),0.501587,0.47619,0.32381,0.64557,1.355696,0.084958,1.477891,0.526414
3,(Meat),(Cheese),0.47619,0.501587,0.32381,0.68,1.355696,0.084958,1.55754,0.500891
4,(Cheese),(Milk),0.501587,0.501587,0.304762,0.607595,1.211344,0.053172,1.270148,0.350053


Provide explanation about __antecedent support__, __consequent support__, __support__, __confidence__, __lift__, __leverage__ and __conviction__

Antecedent Support: For the rule with "Bagel," about 43% of the shopping trips include buying a bagel.

Consequent Support: Thinking about the rule with "Bread," roughly half of the shopping trips involve buying bread.

Support: In the rule with "Bagel" and "Bread," almost 28% of the shopping trips have both bagels and bread in the cart.

Confidence: Looking at the "Bagel" and "Bread" rule, when someone buys bagels, there's a 66% chance they'll also buy bread.

Lift: For the "Bagel" and "Bread" rule, the chance of buying bread when you buy bagels is 30% higher compared to when they're bought independently.

Leverage: In the "Bagel" and "Bread" rule, there's a small boost in the number of times bagels and bread are bought together compared to what we'd expect if they were bought separately.

Conviction: For the "Bagel" and "Bread" rule, the likelihood of buying bagels without buying bread is 1.44 times higher than what we'd expect if there wasn't a connection between them.