# Association Rule Learning

Apriori
1. Generate association rules for dataset given in the url

Apriori algorithm to find out which items are commonly sold together, so that store owner can take action to place the related items together or advertise them together in order to have increased profit.

In [1]:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sb

# association rule algorithm file
from apyori import apriori

In [2]:
# Load the dataset
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)

In [3]:
print("Dataset has {} rows and {} Columns".format(dataset.shape[0],dataset.shape[1])) 

Dataset has 7501 rows and 20 Columns


In [4]:
# check dataset information
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7501 entries, 0 to 7500
Data columns (total 20 columns):
0     7501 non-null object
1     5747 non-null object
2     4389 non-null object
3     3345 non-null object
4     2529 non-null object
5     1864 non-null object
6     1369 non-null object
7     981 non-null object
8     654 non-null object
9     395 non-null object
10    256 non-null object
11    154 non-null object
12    87 non-null object
13    47 non-null object
14    25 non-null object
15    8 non-null object
16    4 non-null object
17    4 non-null object
18    3 non-null object
19    1 non-null object
dtypes: object(20)
memory usage: 1.1+ MB


In [5]:
# descibe the dataset
dataset.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
count,7501,5747,4389,3345,2529,1864,1369,981,654,395,256,154,87,47,25,8,4,4,3,1
unique,115,117,115,114,110,106,102,98,88,80,66,50,43,28,19,8,3,3,3,1
top,mineral water,mineral water,mineral water,mineral water,green tea,french fries,green tea,green tea,green tea,green tea,low fat yogurt,green tea,green tea,green tea,magazines,magazines,frozen smoothie,protein bar,cereals,olive oil
freq,577,484,375,201,153,107,96,67,57,31,22,15,8,4,3,1,2,2,1,1


In [6]:
# check for duplicate values
dataset.duplicated().sum()

2325

In [7]:
# Apriory expecting list of list as an input so we required two loops
transactions = []
for i in range(len(dataset)):
    transactions.append([str(dataset.values[i,j]) for j in range(0, dataset.shape[1])])

### There are three major components of Apriori algorithm: 
    
1.Support -> Support is the basic probability of an event to occur.

    -Support refers to the default popularity of an item
    Support(B) = (Transactions containing (B))/(Total Transactions)  

2.Confidence -> The confidence of an event is the conditional probability of the occurrence;
    
    -Confidence refers to the likelihood that an item B is also bought if item A is bought.
    Confidence(A→B) = (Transactions containing both (A and B))/(Transactions containing A)  

3.Lift -> This is the ratio of confidence to expected confidence

    -Lift(A -> B) refers to the increase in the ratio of sale of B when A is sold
    Lift(A→B) = (Confidence (A→B))/(Support (B))  


In [8]:
# Training Apriori on the dataset
rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)

In [9]:
# Visualising the results
association_rules = list(rules)

In [10]:
print(len(association_rules)) 

154


In [11]:
print(association_rules[0])  

RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])


In [12]:
for item in association_rules:

    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("________________________________________________")

Rule: chicken -> light cream
Support: 0.004532728969470737
Confidence: 0.29059829059829057
Lift: 4.84395061728395
________________________________________________
Rule: escalope -> mushroom cream sauce
Support: 0.005732568990801226
Confidence: 0.3006993006993007
Lift: 3.790832696715049
________________________________________________
Rule: escalope -> pasta
Support: 0.005865884548726837
Confidence: 0.3728813559322034
Lift: 4.700811850163794
________________________________________________
Rule: fromage blanc -> honey
Support: 0.003332888948140248
Confidence: 0.2450980392156863
Lift: 5.164270764485569
________________________________________________
Rule: herb & pepper -> ground beef
Support: 0.015997866951073192
Confidence: 0.3234501347708895
Lift: 3.2919938411349285
________________________________________________
Rule: ground beef -> tomato sauce
Support: 0.005332622317024397
Confidence: 0.3773584905660377
Lift: 3.840659481324083
________________________________________________
Rule: