<a href="https://colab.research.google.com/github/Venture-Coding/Machine-Learning-under-Kiril-E/blob/main/ARL/Apriori_Association_Rule_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Apriori Algo

Association Rule Mining

## Importing the libraries

In [13]:
!pip install apyori



In [14]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data Preprocessing

In [15]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None) #since there's no column names 
dataset.tail()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,
7500,eggs,frozen smoothie,yogurt cake,low fat yogurt,,,,,,,,,,,,,,,,


In [16]:
transactions = [] #since apriori needs a list of transactions, not a DF.
for i in range(0, 7501): #of transactions
  transactions.append([str(dataset.values[i,j]) for j in range(0, 20)]) #20 max items , list of lists

## Training the Apriori model on the dataset

In [17]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

Assuming association to occur atleast 3 times a day, 21 times a week, hence min-support = 21/7500 = 0.003 (No. of times given products occur together in a transaction over given period of time divided by total trxn in same period of time)

min-confidence needs to played around to understand what works based on volume and learnability from data available.

min_lift is the likelihood of how more probable is the buying of associated products, compared to each of them being bought alone. hence, considering 3, viz. likelihood of buying together is 3x more than buying single product of those.

In current problem, we are trying to find a BOGO(BUY ONE GET ONE) Combo, hence keeping min_length and max_length to 2, respectively. This limits the number of products to the left and right hand side.


## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [18]:
results = list(rules)

In [19]:
results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

### Putting the results well organised into a Pandas DataFrame

In [20]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
resultingDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

### Displaying the results non sorted

In [21]:
resultingDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


### Displaying the results sorted by descending lifts

In [22]:
resultingDataFrame.nlargest(n = 10, columns = 'Lift')

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471


If customers buy "fromage blanc" , there's a 24.5% chance they buy "honey" as well. And this stems from the fact it has happened 5x more than single buys.

###Trying for another combination

In [24]:
rules2 = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.5, min_lift = 3, min_length = 3, max_length = 5)
result2 = list(rules2)

def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))

resultingDataFrame2 = pd.DataFrame(inspect(result2), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])
resultingDataFrame2.nlargest(n = 10, columns = 'Lift')

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
14,olive oil,milk,0.003333,0.510204,3.937285
24,olive oil,milk,0.003333,0.510204,3.937285
0,cereals,spaghetti,0.003066,0.676471,3.885303
8,cereals,,0.003066,0.676471,3.885303
1,chicken,milk,0.0036,0.5,3.858539
4,frozen vegetables,milk,0.003999,0.5,3.858539
9,chicken,milk,0.0036,0.5,3.858539
16,frozen vegetables,milk,0.003999,0.5,3.858539
7,olive oil,spaghetti,0.004399,0.611111,3.509912
21,olive oil,,0.004399,0.611111,3.509912


So, milk seems to be the only recommendation if confidence is set to be above 50%.

In [25]:
rules3 = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.3, min_lift = 3, min_length = 3, max_length = 5)
result3 = list(rules3)

resultingDataFrame3 = pd.DataFrame(inspect(result3), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])
resultingDataFrame3.nlargest(n = 10, columns = 'Lift')

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
64,frozen vegetables,milk,0.003066,0.383333,7.987176
98,frozen vegetables,milk,0.003066,0.383333,7.987176
85,whole wheat pasta,olive oil,0.003866,0.402778,6.128268
38,whole wheat pasta,olive oil,0.003866,0.402778,6.115863
1,pasta,escalope,0.005866,0.372881,4.700812
16,pasta,escalope,0.005866,0.372881,4.700812
17,french fries,ground beef,0.0032,0.461538,4.697422
57,french fries,ground beef,0.0032,0.461538,4.697422
91,chocolate,shrimp,0.0032,0.328767,4.609499
50,chocolate,shrimp,0.0032,0.328767,4.6009


At a lower min_confidence we see different other combinations coming up too.