# Apriori

## Importing the libraries

In [1]:
!pip install apyori



In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data Preprocessing

In [3]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
print('{0} rows and {1} columns in this Dataframe.'.format(dataset.shape[0], dataset.shape[1]))
display(dataset)



# apriori function expect dataset to have certain format (not pd) --> transactions list

transactions = [] #empty liste
for i in range(0, dataset.shape[0]): #for loop to populate the list
    transactions.append([str(dataset.values[i,j]) for j in range(0, dataset.shape[1])]) #put values in

7501 rows and 20 columns in this Dataframe.


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,


## Training the Apriori model on the dataset

In [4]:
# WARNING: Make sure to upload the apyori.py file into this Colab notebook before running this cell
from apyori import apriori

# min_support = 3x item, 1week(7days ) / number of transactions
# min_confidence = proportion of rule to be respected
# Lift measure quality of a rule, min_lift = 3 recommended
# min_length = min number of element wanted in rule
rules = apriori(transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2)

## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [5]:
results = list(rules)

In [6]:
print(results)

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]), RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]), RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]), RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0.24

### Putting the results well organised into a Pandas DataFrame

In [13]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

### Displaying the results sorted by descending supports

In [16]:
resultsinDataFrame.nlargest(n = 10, columns = 'Support') #top 10 by support

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
43,herb & pepper,ground beef,0.015998,0.32345,3.291994
30,frozen vegetables,ground beef,0.008666,0.311005,3.165328
94,,ground beef,0.008666,0.311005,3.165328
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
59,,olive oil,0.007999,0.271493,4.12241
34,shrimp,frozen vegetables,0.007199,0.305085,3.200616
54,milk,olive oil,0.007199,0.203008,3.082509
101,,frozen vegetables,0.007199,0.306818,3.218802
124,milk,olive oil,0.007199,0.203008,3.082509


Association rule mining is a technique to identify underlying relations between different items. Take an example of a Super Market where customers can buy variety of items. Usually, there is a pattern in what the customers buy. For instance, mothers with babies buy baby products such as milk and diapers. Damsels may buy makeup items whereas bachelors may buy beers and chips etc. In short, transactions involve a pattern. More profit can be generated if the relationship between the items purchased in different transactions can be identified.

Different statistical algorithms have been developed to implement association rule mining, and Apriori is one such algorithm.

There are three major components of Apriori algorithm:

Support
<br>Confidence
<br>Lift

Support refers to the default popularity of an item and can be calculated by finding number of transactions containing a particular item divided by total number of transactions. 

Support(B) = (Transactions containing (B))/(Total Transactions)

Confidence refers to the likelihood that an item B is also bought if item A is bought. It can be calculated by finding the number of transactions where A and B are bought together, divided by total number of transactions where A is bought

Confidence(Burger→Ketchup) = (Transactions containing both (Burger and Ketchup))/(Transactions containing A)

ift(A -> B) refers to the increase in the ratio of sale of B when A is sold. Lift(A –> B) can be calculated by dividing Confidence(A -> B) divided by Support(B)

Lift(A→B) = (Confidence (A→B))/(Support (B))