##**Apriori Algorithm in Machine Learning**
https://www.javatpoint.com/apriori-algorithm-in-machine-learning

In [7]:
!pip install apyori



Using apriori algorithm to analyze and recognise various possible combinations of items bought together in a grocery store

In [8]:
import pandas as pd
import numpy as np
from apyori import apriori

In [9]:
# Mount the Google Drive
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


## Preprocessing the data 

In [10]:
#dataset=pd.read_csv('Market_Basket_Optimisation.csv',header=None)
dataset = pd.read_csv('gdrive/My Drive/SRM-MLP-Internship-2021/04-Unsupervised-Learning/002-Association/Data-Files/Market_Basket_Optimisation.csv', header=None)
dataset.shape

(7501, 20)

In [11]:
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


The dataset contains the items bought by a customer i.e. each row represents one customer.

Converting the dataframe into a list of lists, as required by the apriori algorithm.

In [12]:
#replacing empty value with 0.
dataset.fillna(0,inplace=True)

In [13]:
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,chutney,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,turkey,avocado,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,mineral water,milk,energy bar,whole wheat rice,green tea,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [14]:
transactions=[]
for i in range(0,7501):
    transactions.append([str(dataset.values[i,j]) for j in range(0,20) if str(dataset.values[i,j])!='0'])

In [15]:
transactions[:2]

[['shrimp',
  'almonds',
  'avocado',
  'vegetables mix',
  'green grapes',
  'whole weat flour',
  'yams',
  'cottage cheese',
  'energy drink',
  'tomato juice',
  'low fat yogurt',
  'green tea',
  'honey',
  'salad',
  'mineral water',
  'salmon',
  'antioxydant juice',
  'frozen smoothie',
  'spinach',
  'olive oil'],
 ['burgers', 'meatballs', 'eggs']]

## Training the apriori model

In [16]:
from apyori import apriori
rules=apriori(transactions,min_support=0.003,min_confidence=0.2,min_lift=3,min_length=2,max_length=2)

* **transactions:** A list of transactions.
* **min_support**= To set the minimum support float value. Here we have used 0.003 that is calculated by taking 3 transactions per customer each week to the total number of transactions.
* **min_confidence**: To set the minimum confidence value. Here we have taken 0.2. It can be changed as per the business problem.
* **min_lift**= To set the minimum lift value.
* **min_length**= It takes the minimum number of products for the association.
* **max_length** = It takes the maximum number of products for the association.

List of observed rules can be seen below:

In [17]:
results=list(rules)
len(results)

9

In [18]:
results[0]

RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])

RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])

The above output specifies the association between two items ‘light cream’ and ‘chicken’.

The two movies have the support of 0.0045. i,e.

support(‘light cream’, ‘chicken’) = 0.0045

confidence(‘light cream', 'chicken’)= 0.29


This implies that if a person has purchased ‘light cream’ he or she is 29% likely to purchase 'chicken'.

The lift of 4.84 shows the relevance of the rule.

In [19]:
results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

In [20]:
results[0]

RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)])

In [21]:
dt = pd.DataFrame(columns=['Item_Base','Item_Add','Support','Confidence','Lift'], dtype=object)
dt.shape

(0, 5)

In [22]:
for i in range(len(results)):
    dt.at[i,'Item_Base'] = list(results[i][2][0][0])
    dt.at[i,'Item_Add'] = list(results[i][2][0][1])
    dt.at[i,'Support'] = results[i][1]
    dt.at[i,'Confidence'] = results[i][2][0][2]
    dt.at[i,'Lift'] = results[i][2][0][3]

In [23]:
print(dt.shape)
dt.head (9)

(9, 5)


Unnamed: 0,Item_Base,Item_Add,Support,Confidence,Lift
0,[light cream],[chicken],0.00453273,0.290598,4.84395
1,[mushroom cream sauce],[escalope],0.00573257,0.300699,3.79083
2,[pasta],[escalope],0.00586588,0.372881,4.70081
3,[fromage blanc],[honey],0.00333289,0.245098,5.16427
4,[herb & pepper],[ground beef],0.0159979,0.32345,3.29199
5,[tomato sauce],[ground beef],0.00533262,0.377358,3.84066
6,[light cream],[olive oil],0.00319957,0.205128,3.11471
7,[whole wheat pasta],[olive oil],0.00799893,0.271493,4.12241
8,[pasta],[shrimp],0.00506599,0.322034,4.50667


## Visualization of Results

###**Sorting the relations by their lift**

In [24]:
result_df=dt.sort_values('Lift',ascending=False)
result_df

Unnamed: 0,Item_Base,Item_Add,Support,Confidence,Lift
3,[fromage blanc],[honey],0.00333289,0.245098,5.16427
0,[light cream],[chicken],0.00453273,0.290598,4.84395
2,[pasta],[escalope],0.00586588,0.372881,4.70081
8,[pasta],[shrimp],0.00506599,0.322034,4.50667
7,[whole wheat pasta],[olive oil],0.00799893,0.271493,4.12241
5,[tomato sauce],[ground beef],0.00533262,0.377358,3.84066
1,[mushroom cream sauce],[escalope],0.00573257,0.300699,3.79083
4,[herb & pepper],[ground beef],0.0159979,0.32345,3.29199
6,[light cream],[olive oil],0.00319957,0.205128,3.11471


In [25]:
from google.colab import files
result_df.to_csv("gdrive/My Drive/SRM-MLP-Internship-2021/04-Unsupervised-Learning/002-Association/Output-Files/Mkt_Opt_Apriori_Lift_Based.csv", index = False)

###**Sorting the relations by their Support**

In [26]:
result_df1=dt.sort_values('Support',ascending=False)
result_df1

Unnamed: 0,Item_Base,Item_Add,Support,Confidence,Lift
4,[herb & pepper],[ground beef],0.0159979,0.32345,3.29199
7,[whole wheat pasta],[olive oil],0.00799893,0.271493,4.12241
2,[pasta],[escalope],0.00586588,0.372881,4.70081
1,[mushroom cream sauce],[escalope],0.00573257,0.300699,3.79083
5,[tomato sauce],[ground beef],0.00533262,0.377358,3.84066
8,[pasta],[shrimp],0.00506599,0.322034,4.50667
0,[light cream],[chicken],0.00453273,0.290598,4.84395
3,[fromage blanc],[honey],0.00333289,0.245098,5.16427
6,[light cream],[olive oil],0.00319957,0.205128,3.11471


In [27]:
from google.colab import files
result_df1.to_csv("gdrive/My Drive/SRM-MLP-Internship-2021/04-Unsupervised-Learning/002-Association/Output-Files/Mkt_Opt_Apriori_Support_Based.csv", index = False)

In [28]:
from apyori import apriori
rules2=apriori(transactions,min_support=0.003,min_confidence=0.2,min_lift=4,min_length=4,max_length=3)

In [29]:
results2=list(rules2)
len(results2)

15

In [30]:
dt2 = pd.DataFrame(columns=['Item_Base','Item_Add','Support','Confidence','Lift'], dtype=object)
dt2.shape

(0, 5)

In [31]:
for i in range(len(results2)):
    dt2.at[i,'Item_Base'] = list(results2[i][2][0][0])
    dt2.at[i,'Item_Add'] = list(results2[i][2][0][1])
    dt2.at[i,'Support'] = results2[i][1]
    dt2.at[i,'Confidence'] = results2[i][2][0][2]
    dt2.at[i,'Lift'] = results2[i][2][0][3]

In [32]:
print(dt2.shape)
dt2.head (10)

(15, 5)


Unnamed: 0,Item_Base,Item_Add,Support,Confidence,Lift
0,[light cream],[chicken],0.00453273,0.290598,4.84395
1,[pasta],[escalope],0.00586588,0.372881,4.70081
2,[fromage blanc],[honey],0.00333289,0.245098,5.16427
3,[whole wheat pasta],[olive oil],0.00799893,0.271493,4.12241
4,[pasta],[shrimp],0.00506599,0.322034,4.50667
5,"[frozen vegetables, cake]",[tomatoes],0.00306626,0.298701,4.36756
6,"[spaghetti, cereals]",[ground beef],0.00306626,0.46,4.68176
7,"[chocolate, herb & pepper]",[ground beef],0.00399947,0.441176,4.49018
8,"[ground beef, eggs]",[herb & pepper],0.00413278,0.206667,4.17845
9,"[ground beef, french fries]",[herb & pepper],0.00319957,0.230769,4.66577


In [33]:
dt2.tail (10)

Unnamed: 0,Item_Base,Item_Add,Support,Confidence,Lift
5,"[frozen vegetables, cake]",[tomatoes],0.00306626,0.298701,4.36756
6,"[spaghetti, cereals]",[ground beef],0.00306626,0.46,4.68176
7,"[chocolate, herb & pepper]",[ground beef],0.00399947,0.441176,4.49018
8,"[ground beef, eggs]",[herb & pepper],0.00413278,0.206667,4.17845
9,"[ground beef, french fries]",[herb & pepper],0.00319957,0.230769,4.66577
10,"[herb & pepper, spaghetti]",[ground beef],0.00639915,0.393443,4.00436
11,[tomato sauce],"[ground beef, spaghetti]",0.00306626,0.216981,5.53597
12,"[milk, olive oil]",[soup],0.00359952,0.210938,4.17478
13,"[milk, tomatoes]",[soup],0.00306626,0.219048,4.33529
14,"[mineral water, whole wheat pasta]",[olive oil],0.00386615,0.402778,6.11586


In [34]:
result_df=dt2.sort_values('Lift',ascending=False)
result_df

Unnamed: 0,Item_Base,Item_Add,Support,Confidence,Lift
14,"[mineral water, whole wheat pasta]",[olive oil],0.00386615,0.402778,6.11586
11,[tomato sauce],"[ground beef, spaghetti]",0.00306626,0.216981,5.53597
2,[fromage blanc],[honey],0.00333289,0.245098,5.16427
0,[light cream],[chicken],0.00453273,0.290598,4.84395
1,[pasta],[escalope],0.00586588,0.372881,4.70081
6,"[spaghetti, cereals]",[ground beef],0.00306626,0.46,4.68176
9,"[ground beef, french fries]",[herb & pepper],0.00319957,0.230769,4.66577
4,[pasta],[shrimp],0.00506599,0.322034,4.50667
7,"[chocolate, herb & pepper]",[ground beef],0.00399947,0.441176,4.49018
5,"[frozen vegetables, cake]",[tomatoes],0.00306626,0.298701,4.36756
