# ASSOCIATION RULES - APRIORI - Grocery Store

# Business Problem

What are Association Rules?

It is a rule-based machine learning technique used to find patterns (relationships, structures) in data.

You may have come across these applications in the following ways: "those who bought that product also bought this product" or 
"those who viewed that ad also looked at these ads" or "we created a playlist for you" or 
"the recommended video for the next video".

These scenarios are the most common scenarios we will encounter within the scope of e-commerce data science data mining studies.

Many e-commerce companies around the world or companies, platforms like Spotify, Amazon, Netflix use recommendation systems.

So what do these association analyzes do?

**Apriori Algorithm:**

It is the most used method in this field.

Association rule analysis is performed by examining some metrics:

X: item Y: item N: total purchase

* **Support:**

Support(X, Y) = Freq(X,Y)/N

* **Confidence:**

Confidence(X, Y) = Freq(X,Y) / Freq(X)

* **Lift:**

Lift = Support (X, Y) / ( Support(X) * Support(Y) )

In [2]:
# The Mlxtend library must be downloaded to use the apriori algorithm.
!pip install mlxtend



In [3]:
# Import Libraries
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

In [4]:
# load the dataset
df = pd.read_csv("GroceryStoreDataSet.csv", names = ["products"], header=None)
df

Unnamed: 0,products
0,"MILK,BREAD,BISCUIT"
1,"BREAD,MILK,BISCUIT,CORNFLAKES"
2,"BREAD,TEA,BOURNVITA"
3,"JAM,MAGGI,BREAD,MILK"
4,"MAGGI,TEA,BISCUIT"
5,"BREAD,TEA,BOURNVITA"
6,"MAGGI,TEA,CORNFLAKES"
7,"MAGGI,BREAD,TEA,BISCUIT"
8,"JAM,MAGGI,BREAD,TEA"
9,"BREAD,MILK"


In [5]:
# shape
df.shape

(20, 1)

In [6]:
df.values

array([['MILK,BREAD,BISCUIT'],
       ['BREAD,MILK,BISCUIT,CORNFLAKES'],
       ['BREAD,TEA,BOURNVITA'],
       ['JAM,MAGGI,BREAD,MILK'],
       ['MAGGI,TEA,BISCUIT'],
       ['BREAD,TEA,BOURNVITA'],
       ['MAGGI,TEA,CORNFLAKES'],
       ['MAGGI,BREAD,TEA,BISCUIT'],
       ['JAM,MAGGI,BREAD,TEA'],
       ['BREAD,MILK'],
       ['COFFEE,COCK,BISCUIT,CORNFLAKES'],
       ['COFFEE,COCK,BISCUIT,CORNFLAKES'],
       ['COFFEE,SUGER,BOURNVITA'],
       ['BREAD,COFFEE,COCK'],
       ['BREAD,SUGER,BISCUIT'],
       ['COFFEE,SUGER,CORNFLAKES'],
       ['BREAD,SUGER,BOURNVITA'],
       ['BREAD,COFFEE,SUGER'],
       ['BREAD,COFFEE,SUGER'],
       ['TEA,MILK,COFFEE,CORNFLAKES']], dtype=object)

In [7]:
# Separating items using commas
df_sep = list(df["products"].apply(lambda x:x.split(',')))
df_sep

[['MILK', 'BREAD', 'BISCUIT'],
 ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['JAM', 'MAGGI', 'BREAD', 'MILK'],
 ['MAGGI', 'TEA', 'BISCUIT'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['MAGGI', 'TEA', 'CORNFLAKES'],
 ['MAGGI', 'BREAD', 'TEA', 'BISCUIT'],
 ['JAM', 'MAGGI', 'BREAD', 'TEA'],
 ['BREAD', 'MILK'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'COCK'],
 ['BREAD', 'SUGER', 'BISCUIT'],
 ['COFFEE', 'SUGER', 'CORNFLAKES'],
 ['BREAD', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['TEA', 'MILK', 'COFFEE', 'CORNFLAKES']]

In [10]:
# Let's encode items using TransactionEncoder (True - False)
te = TransactionEncoder()
te_data = te.fit(df_sep).transform(df_sep)

In [14]:
# control
te_data[:5]

array([[ True, False,  True, False, False, False, False, False,  True,
        False, False],
       [ True, False,  True, False, False,  True, False, False,  True,
        False, False],
       [False,  True,  True, False, False, False, False, False, False,
        False,  True],
       [False, False,  True, False, False, False,  True,  True,  True,
        False, False],
       [ True, False, False, False, False, False, False,  True, False,
        False,  True]])

In [15]:
# Now let's get the data ready for apriori
df = pd.DataFrame(te_data,columns=te.columns_)
df

Unnamed: 0,BISCUIT,BOURNVITA,BREAD,COCK,COFFEE,CORNFLAKES,JAM,MAGGI,MILK,SUGER,TEA
0,True,False,True,False,False,False,False,False,True,False,False
1,True,False,True,False,False,True,False,False,True,False,False
2,False,True,True,False,False,False,False,False,False,False,True
3,False,False,True,False,False,False,True,True,True,False,False
4,True,False,False,False,False,False,False,True,False,False,True
5,False,True,True,False,False,False,False,False,False,False,True
6,False,False,False,False,False,True,False,True,False,False,True
7,True,False,True,False,False,False,False,True,False,False,True
8,False,False,True,False,False,False,True,True,False,False,True
9,False,False,True,False,False,False,False,False,True,False,False


In [16]:
# set a support value and select those above that value. (Support = 0.2)
freq_items = apriori(df, min_support = 0.2, use_colnames = True, verbose = 1)
freq_items

Processing 42 combinations | Sampling itemset size 3


Unnamed: 0,support,itemsets
0,0.35,(BISCUIT)
1,0.2,(BOURNVITA)
2,0.65,(BREAD)
3,0.4,(COFFEE)
4,0.3,(CORNFLAKES)
5,0.25,(MAGGI)
6,0.25,(MILK)
7,0.3,(SUGER)
8,0.35,(TEA)
9,0.2,"(BREAD, BISCUIT)"


In [17]:
# Sorting by support values
freq_items.sort_values(by = "support", ascending = False)

Unnamed: 0,support,itemsets
2,0.65,(BREAD)
3,0.4,(COFFEE)
0,0.35,(BISCUIT)
8,0.35,(TEA)
4,0.3,(CORNFLAKES)
7,0.3,(SUGER)
5,0.25,(MAGGI)
6,0.25,(MILK)
1,0.2,(BOURNVITA)
9,0.2,"(BREAD, BISCUIT)"


In [18]:
# The lengths of itemsets
freq_items['length'] = freq_items['itemsets'].apply(lambda x:len(x))
freq_items

Unnamed: 0,support,itemsets,length
0,0.35,(BISCUIT),1
1,0.2,(BOURNVITA),1
2,0.65,(BREAD),1
3,0.4,(COFFEE),1
4,0.3,(CORNFLAKES),1
5,0.25,(MAGGI),1
6,0.25,(MILK),1
7,0.3,(SUGER),1
8,0.35,(TEA),1
9,0.2,"(BREAD, BISCUIT)",2


In [19]:
# length 1
freq_items[(freq_items['length'] == 1) & (freq_items['support'] >= 0.05)]

Unnamed: 0,support,itemsets,length
0,0.35,(BISCUIT),1
1,0.2,(BOURNVITA),1
2,0.65,(BREAD),1
3,0.4,(COFFEE),1
4,0.3,(CORNFLAKES),1
5,0.25,(MAGGI),1
6,0.25,(MILK),1
7,0.3,(SUGER),1
8,0.35,(TEA),1


In [20]:
# length 2
freq_items[(freq_items['length'] == 2) & (freq_items['support'] >= 0.05)]

Unnamed: 0,support,itemsets,length
9,0.2,"(BREAD, BISCUIT)",2
10,0.2,"(BREAD, MILK)",2
11,0.2,"(BREAD, SUGER)",2
12,0.2,"(BREAD, TEA)",2
13,0.2,"(CORNFLAKES, COFFEE)",2
14,0.2,"(COFFEE, SUGER)",2
15,0.2,"(TEA, MAGGI)",2


In [21]:
# review of all metrics
association_rules(freq_items, metric="confidence", min_threshold = 0.6)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(MILK),(BREAD),0.25,0.65,0.2,0.8,1.230769,0.0375,1.75
1,(SUGER),(BREAD),0.3,0.65,0.2,0.666667,1.025641,0.005,1.05
2,(CORNFLAKES),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,0.08,1.8
3,(SUGER),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,0.08,1.8
4,(MAGGI),(TEA),0.25,0.35,0.2,0.8,2.285714,0.1125,3.25


In [22]:
# Let's create a data set and keep this information there.
df_ar = association_rules(freq_items, metric = "confidence", min_threshold = 0.6)
df_ar

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(MILK),(BREAD),0.25,0.65,0.2,0.8,1.230769,0.0375,1.75
1,(SUGER),(BREAD),0.3,0.65,0.2,0.666667,1.025641,0.005,1.05
2,(CORNFLAKES),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,0.08,1.8
3,(SUGER),(COFFEE),0.3,0.4,0.2,0.666667,1.666667,0.08,1.8
4,(MAGGI),(TEA),0.25,0.35,0.2,0.8,2.285714,0.1125,3.25


In [23]:
df_ar[(df_ar.support < 0.3) & (df_ar.confidence > 0.7)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(MILK),(BREAD),0.25,0.65,0.2,0.8,1.230769,0.0375,1.75
4,(MAGGI),(TEA),0.25,0.35,0.2,0.8,2.285714,0.1125,3.25


# Let's interpret the table

**0 (MILK - BREAD):**
* *Support:* Milk and bread were observed together in 20% of shopping.
* *Confince:* 80% of those who bought milk also bought bread.
* *Lift:* Milk sales increase bread sales 1.23 times.

**4 (MAGGIE - TEA):**
* *Support:* Maggie and Tea were observed together in 20% of shopping.
* *Confince:* 80% of those who bought Maggie also bought Tea.
* *Lift:* Maggi sales increase Tea sales 2.29 times.