<a href="https://colab.research.google.com/github/Harshita0201/Machine_Learning/blob/main/6_1apriori.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Apriori

## Importing the libraries

In [1]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=ea6d8c0d91967adb603b01f1ec30b89bb17316ff101cae10c5adb6e0d15006d8
  Stored in directory: /root/.cache/pip/wheels/cb/f6/e1/57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Data Preprocessing

In [4]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None) #in the dataset we do not have the names of each column, all rows corresponds to different customers and the cols belongs to different products they purchased
# header = None, allows pd function read_csv to take into account the fisrt row (a person's purchase orders) into dataset.

#we do not have the independent and dependent variables hence we do not split the dataset into X and y features

#change the format of dataset as list of transactions that the apriori method accepts
transactions = [] #empty list 

#to populate the all the transactions of panda csv dataset
for i in range(0, 7501): # i for all rows or cutstomers
  transactions.append([str(dataset.values[i, j])  for j in range(0, 20)])  # j for all cols or items ..max=20
  #all items in list must be of string datatype

## Training the Apriori model on the dataset

In [5]:
#using apriori funtion from apyori package
from apyori import apriori

#the apriori funtion returns different rules (support, confidence, and lifts) therefore we store it in object rules
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2) #takes a parameter transactions as the input of list of dataset, min_support so to consider the rules only which are higher than the min_support..
# .. min_confidence to consider rules above this threshold,  min_lift that measures the relevence or quality of the rule...
# min_length and max_length the amount of products on LHS and RHS of rule...i.e A->B or A, B -> C, D (burger->french-fries)

# we take our min support as the products that appear 3 times during purchase per day * 7 (the number of days over which transaction is calculated) / total num of transaction
# min_support = 3*7/ 7501 = 0.0027.. or 0.003

## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [6]:
results = list(rules)
results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

*Interpretation*

* if a person buys a light cream (items_base) there is 29% chance of him buying chicken(items_add)
* if a person buys mushroom cream sauce  there is 30% chance that he will buy escalope....






### Putting the results well organised into a Pandas DataFrame

In [8]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

### Displaying the results non sorted

In [9]:
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


### Displaying the results sorted by descending lifts

In [10]:
resultsinDataFrame.nlargest(n = 10, columns = 'Lift') #nlargest sort in descending order, params = >number of rows toreturn, columns by which we want to sort

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471
