# Association Rules Checkpoint

We want to adopt new strategies to improve the profit of a clothing company.

We gonna use the dataset and the association rules mining to find new marketing plans. 

One of the strategies can be based on which items should be put together.

In [1]:
dataset = [['Skirt', 'Sneakers', 'Scarf', 'Pants', 'Hat'],
           ['Sunglasses', 'Skirt', 'Sneakers', 'Pants', 'Hat'],
           ['Dress', 'Sandals', 'Scarf', 'Pants', 'Heels'],
           ['Dress', 'Necklace', 'Earrings', 'Scarf', 'Hat', 'Heels', 'Hat'],
           ['Earrings', 'Skirt', 'Skirt', 'Scarf', 'Shirt', 'Pants']]

Every inner list represents a transaction made by a customer.

# Apriori Algorithm

Apriori is one of the algorithms that we can use for market basket analysis which is based on 3 metrics :

1. Support

2. Confidence

3. Lift

In [2]:
# First of all, we need to import the MLxtend and Pandas libraries
import mlxtend
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

In [3]:
# We need to transform our dataset into a one-hot encoded DataFrame
te = TransactionEncoder()
te_array = te.fit(dataset).transform(dataset)        # Apply one-hot-encoding on our dataset
df = pd.DataFrame(te_array, columns = te.columns_)   # Creating a new DataFrame from our Numpy array
df

Unnamed: 0,Dress,Earrings,Hat,Heels,Necklace,Pants,Sandals,Scarf,Shirt,Skirt,Sneakers,Sunglasses
0,False,False,True,False,False,True,False,True,False,True,True,False
1,False,False,True,False,False,True,False,False,False,True,True,True
2,True,False,False,True,False,True,True,True,False,False,False,False
3,True,True,True,True,True,False,False,True,False,False,False,False
4,False,True,False,False,False,True,False,True,True,True,False,False


# 1. Support

In [4]:
from mlxtend.frequent_patterns import apriori

In [5]:
frequent_itemsets = apriori(df, min_support = 0.6, use_colnames = True) # Select itemsets with a minimum of 60% support
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.6,(Hat)
1,0.8,(Pants)
2,0.8,(Scarf)
3,0.6,(Skirt)
4,0.6,"(Pants, Scarf)"
5,0.6,"(Skirt, Pants)"


(Pants) ad (Scarf) are the most frequent itemsets in the dataset.

# 2. Confidence

In [6]:
from mlxtend.frequent_patterns import association_rules

In [7]:
association_rules(frequent_itemsets,metric="confidence",min_threshold=0.7) # associate itemsets with confidence over 70%

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Pants),(Scarf),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8
1,(Scarf),(Pants),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8
2,(Skirt),(Pants),0.6,0.8,0.6,1.0,1.25,0.12,inf
3,(Pants),(Skirt),0.8,0.6,0.6,0.75,1.25,0.12,1.6


We can see that :
- We have a 100% chance to buy Pants with Skirt.
- We have a 75% chance to buy Pants with Scarf.

# 3. Lift

In [8]:
association_rules(frequent_itemsets, metric = "lift", min_threshold = 1.25) # Associating based on lift

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Skirt),(Pants),0.6,0.8,0.6,1.0,1.25,0.12,inf


It's more likely to buy Pants and Skirt rather than Skirt alone.

# Let's do the same with a bigger dataset!

In [9]:
with open("Market_Basket_Optimisation.csv", encoding = 'utf-8') as f:
    data = f.read().splitlines()

In [10]:
dataset = []
for element in data:
    dataset.append(element.split(","))

In [11]:
# We need to transform our dataset into a one-hot encoded DataFrame
te = TransactionEncoder()
te_array = te.fit(dataset).transform(dataset)        # Apply one-hot-encoding on our dataset
df = pd.DataFrame(te_array, columns = te.columns_)   # Creating a new DataFrame from our Numpy array
df

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,False,True,True,False,True,False,False,False,False,False,...,False,True,False,False,True,False,False,True,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,True,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7497,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7498,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7499,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


This dataset has 7501 transactions and 120 products.

# Apriori Algorithm

## 1. Minimum of 5% support

In [12]:
# Support
frequent_itemsets = apriori(df, min_support = 0.05, use_colnames = True) # Select itemsets with a minimum of 5% support
frequent_itemsets.sort_values(by = "support", ascending = False)

Unnamed: 0,support,itemsets
16,0.238368,(mineral water)
6,0.179709,(eggs)
21,0.17411,(spaghetti)
8,0.170911,(french fries)
3,0.163845,(chocolate)
12,0.132116,(green tea)
15,0.129583,(milk)
13,0.098254,(ground beef)
10,0.095321,(frozen vegetables)
18,0.095054,(pancakes)


**(mineral water)**, **(eggs)**, **(spaghetti)**, **(french fries)** and **(chocolate)**  are the most frequent itemsets in the dataset.

In [13]:
# Confidence
association_rules(frequent_itemsets,metric="confidence",min_threshold=0.2) # associate itemsets with confidence over 20%

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(chocolate),(mineral water),0.163845,0.238368,0.05266,0.3214,1.348332,0.013604,1.122357
1,(mineral water),(chocolate),0.238368,0.163845,0.05266,0.220917,1.348332,0.013604,1.073256
2,(mineral water),(eggs),0.238368,0.179709,0.050927,0.213647,1.188845,0.00809,1.043158
3,(eggs),(mineral water),0.179709,0.238368,0.050927,0.283383,1.188845,0.00809,1.062815
4,(spaghetti),(mineral water),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314
5,(mineral water),(spaghetti),0.238368,0.17411,0.059725,0.250559,1.439085,0.018223,1.102008


We can see that :
- We have a 34% chance to buy **mineral water** with **spaghetti**.
- We have a 32% chance to buy **mineral water** with **chocolate**.
- We have a 32% chance to buy **mineral water** with **eggs**.

In [14]:
# Lift
association_rules(frequent_itemsets, metric = "lift", min_threshold = 1) # Associating based on lift

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(chocolate),(mineral water),0.163845,0.238368,0.05266,0.3214,1.348332,0.013604,1.122357
1,(mineral water),(chocolate),0.238368,0.163845,0.05266,0.220917,1.348332,0.013604,1.073256
2,(mineral water),(eggs),0.238368,0.179709,0.050927,0.213647,1.188845,0.00809,1.043158
3,(eggs),(mineral water),0.179709,0.238368,0.050927,0.283383,1.188845,0.00809,1.062815
4,(spaghetti),(mineral water),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314
5,(mineral water),(spaghetti),0.238368,0.17411,0.059725,0.250559,1.439085,0.018223,1.102008


- It's more likely to buy **spaghetti and mineral water** rather than **Spaghetti** alone.

- It's more likely to buy **eggs and mineral water** rather than **eggs** alone.

- It's more likely to buy **chocolate and mineral water** rather than **chocolate** alone.

## 2. Minimum of 1% support

In [15]:
# Support
frequent_itemsets = apriori(df, min_support = 0.01, use_colnames = True) # Select itemsets with a minimum of 1% support
frequent_itemsets.sort_values(by = "support", ascending = False)

Unnamed: 0,support,itemsets
46,0.238368,(mineral water)
19,0.179709,(eggs)
63,0.174110,(spaghetti)
24,0.170911,(french fries)
13,0.163845,(chocolate)
...,...,...
255,0.010265,"(olive oil, spaghetti, mineral water)"
123,0.010132,"(chocolate, soup)"
246,0.010132,"(ground beef, mineral water, eggs)"
249,0.010132,"(spaghetti, french fries, mineral water)"


**(mineral water)**, **(eggs)**, **(spaghetti)**, **(french fries)** and **(chocolate)**  are the most frequent itemsets in the dataset.

In [16]:
# Confidence over 20%
association_rules(frequent_itemsets,metric="confidence",min_threshold=0.2).sort_values(by = "confidence", ascending = False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
134,"(ground beef, eggs)",(mineral water),0.019997,0.238368,0.010132,0.506667,2.125563,0.005365,1.543848
149,"(milk, ground beef)",(mineral water),0.021997,0.238368,0.011065,0.503030,2.110308,0.005822,1.532552
121,"(ground beef, chocolate)",(mineral water),0.023064,0.238368,0.010932,0.473988,1.988472,0.005434,1.447937
144,"(milk, frozen vegetables)",(mineral water),0.023597,0.238368,0.011065,0.468927,1.967236,0.005440,1.434136
100,(soup),(mineral water),0.050527,0.238368,0.023064,0.456464,1.914955,0.011020,1.401255
...,...,...,...,...,...,...,...,...,...
21,(french fries),(chocolate),0.170911,0.163845,0.034395,0.201248,1.228284,0.006393,1.046827
146,"(spaghetti, mineral water)",(frozen vegetables),0.059725,0.095321,0.011998,0.200893,2.107549,0.006305,1.132113
74,(green tea),(spaghetti),0.132116,0.174110,0.026530,0.200807,1.153335,0.003527,1.033405
34,(soup),(chocolate),0.050527,0.163845,0.010132,0.200528,1.223888,0.001853,1.045884


We can see that :
- We have a 51% chance to buy **mineral water** with **eggs and ground beef**.
- We have a 50% chance to buy **mineral water** with **milk and ground beef**.
- We have a 47% chance to buy **mineral water** with **ground beef and chocolate**.
- We have a 46% chance to buy **mineral water** with **milk and frozen vegetables**.
- We have a 45% chance to buy **mineral water** with **soup**.

In [17]:
# Lift
association_rules(frequent_itemsets, metric = "lift", min_threshold = 1).sort_values(by = "lift", ascending = False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
214,(herb & pepper),(ground beef),0.049460,0.098254,0.015998,0.323450,3.291994,0.011138,1.332860
215,(ground beef),(herb & pepper),0.098254,0.049460,0.015998,0.162822,3.291994,0.011138,1.135410
386,(ground beef),"(spaghetti, mineral water)",0.098254,0.059725,0.017064,0.173677,2.907928,0.011196,1.137902
383,"(spaghetti, mineral water)",(ground beef),0.059725,0.098254,0.017064,0.285714,2.907928,0.011196,1.262445
396,"(spaghetti, mineral water)",(olive oil),0.059725,0.065858,0.010265,0.171875,2.609786,0.006332,1.128021
...,...,...,...,...,...,...,...,...,...
155,(french fries),(low fat yogurt),0.170911,0.076523,0.013332,0.078003,1.019340,0.000253,1.001605
131,(eggs),(olive oil),0.179709,0.065858,0.011998,0.066766,1.013783,0.000163,1.000973
130,(olive oil),(eggs),0.065858,0.179709,0.011998,0.182186,1.013783,0.000163,1.003029
144,(spaghetti),(escalope),0.174110,0.079323,0.013998,0.080398,1.013557,0.000187,1.001169


- It's more likely to buy **herb, pepper and ground beef** rather than **ground beef** or **herf and pepper** alone.
- It's more likely to buy **spaghetti, mineral water and ground beef** rather than **ground beef** alone.
- It's more likely to buy **spaghetti, mineral water and olive oil** rather than **olive oil** or **spaghetti and mineral water** alone.