# Binal Manoj Bariya (20MAI0075)

# Github Link : https://github.com/binalbariya/Data-Warehousing/blob/main/Assessment%202.ipynb

# Association rule mining:
It is a technique to identify underlying relations between different items. Take an example of a Super Market where customers can buy variety of items. Usually, there is a pattern in what the customers buy. In short, transactions involve a pattern. More profit can be generated if the relationship between the items purchased in different transactions can be identified.

For instance, if item A and B are bought together more frequently then several steps can be taken to increase the profit.

# Different statistical algorithms have been developed to implement association rule mining, and Apriori is one such algorithm.

There are three major components of Apriori algorithm:

Support
Confidence
Lift

Support(B) = (Transactions containing (B))/(Total Transactions)

Confidence(A→B) = (Transactions containing both (A and B))/(Transactions containing A)

Lift(A→B) = (Confidence (A→B))/(Support (B))





About the dataset:
It is a dataset of different daily life products given 7500 transactions
over the course of a week at a French retail store. 

The dataset can be downloaded from the following link:
https://drive.google.com/file/d/1y5DYn0dGoSbC22xowBq2d4po6h1JxcTQ/view
    

In [2]:
!pip install apyori



Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py): started
  Building wheel for apyori (setup.py): finished with status 'done'
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5979 sha256=d76a8cd5aaa3e16a059d7573cefcdcc4dc770ee94b7e16fda62e1545478c1566
  Stored in directory: c:\users\binal bariya\appdata\local\pip\cache\wheels\cb\f6\e1\57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [3]:
#import the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori

In [6]:
#Importing the Dataset
#Now let's import the dataset and see what we're working with. 
#Download the dataset from the given link 

store_data = pd.read_csv('store_data.csv',header=None)

A snippet of the dataset is shown in the above screenshot. 

If you carefully look at the data, we can see that the 
header is actually the first transaction. Each row corresponds 
to a transaction and each column corresponds to an item 
purchased in that specific transaction. 

The NaN tells us that the item represented by the column was not purchased in that specific transaction.

In [7]:
#Let's call the head() function to see how the dataset looks:
store_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


Data Proprocessing

The Apriori library we are going to use requires our dataset to be in the form of a list of lists, 
where the whole dataset is a big list and each transaction in the dataset 
is an inner list within the outer big list. 
Currently we have data in the form of a pandas dataframe. 
To convert our pandas dataframe into a list of lists.


In [8]:
records = []
for i in range(0, 7501):
    records.append([str(store_data.values[i,j]) for j in range(0, 20)])

Applying Apriori

The next step is to apply the Apriori algorithm on the dataset. 
To do so, we can use the apriori class that we imported from the apyori library.

The apriori class requires some parameter values to work. 
The first parameter is the list of list that you want to extract rules from. 
The second parameter is the min_support parameter. 
This parameter is used to select the items with support values greater than the value specified by the parameter. 
Next, the min_confidence parameter filters those rules that have confidence 
greater than the confidence threshold specified by the parameter. 

Similarly, the min_lift parameter specifies the minimum lift value for the short listed rules. 
Finally, the min_length parameter specifies the minimum number of items that you want in your rules.

In [14]:
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)


# Now taking hand-made dataset for implementing association rule mining 

# Using both  Apriori and F-P growth algorithm

In [18]:
!pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.18.0-py2.py3-none-any.whl (1.3 MB)
Installing collected packages: mlxtend
Successfully installed mlxtend-0.18.0


In [20]:
# Creating a hand made list with the required data

dataset = [['Milk', 'Eggs', 'Bread'],
['Milk', 'Eggs'],
['Milk', 'Bread'],
['Eggs', 'Apple']]

In [21]:
#lets look at the dataset

print(dataset)

[['Milk', 'Eggs', 'Bread'], ['Milk', 'Eggs'], ['Milk', 'Bread'], ['Eggs', 'Apple']]


In [22]:
# Convert list to dataframe with boolean values

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_array = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_array, columns=te.columns_)

In [23]:
print(df)

   Apple  Bread   Eggs   Milk
0  False   True   True   True
1  False  False   True   True
2  False   True  False   True
3   True  False   True  False


we import the apriori algorithm function from the library.

Then we apply the algorithm to our data to extract the itemsets that
have a minimum support value of 0.01 (this parameter can be changed).

In [27]:
#Now Find frequently occurring itemsets using Apriori Algorithm
from mlxtend.frequent_patterns import apriori

frequent_itemsets_ap = apriori(df, min_support=0.01, use_colnames=True)

In [28]:
#Viewing the results

print(frequent_itemsets_ap)

   support             itemsets
0     0.25              (Apple)
1     0.50              (Bread)
2     0.75               (Eggs)
3     0.75               (Milk)
4     0.25        (Eggs, Apple)
5     0.25        (Eggs, Bread)
6     0.50        (Milk, Bread)
7     0.50         (Milk, Eggs)
8     0.25  (Milk, Eggs, Bread)


First, we import the F-P growth algorithm function from the library.
Then we apply the algorithm to our data to extract the itemsets that have a minimum support value of 0.01 (this parameter can be tuned on a case-by-case basis).

In [29]:
# Find frequently occurring itemsets using F-P Growth

from mlxtend.frequent_patterns import fpgrowth

frequent_itemsets_fp=fpgrowth(df, min_support=0.01, use_colnames=True)

In [30]:
#Viewing the results

print(frequent_itemsets_fp)

   support             itemsets
0     0.75               (Milk)
1     0.75               (Eggs)
2     0.50              (Bread)
3     0.25              (Apple)
4     0.50         (Milk, Eggs)
5     0.50        (Milk, Bread)
6     0.25        (Eggs, Bread)
7     0.25  (Milk, Eggs, Bread)
8     0.25        (Eggs, Apple)



# Mine the association rules

In [31]:
from mlxtend.frequent_patterns import association_rules

rules_ap = association_rules(frequent_itemsets_ap, metric="confidence", min_threshold=0.8)
rules_fp = association_rules(frequent_itemsets_fp, metric="confidence", min_threshold=0.8)


In [32]:
#set of rules generated by apriori algorithm
print(rules_ap)

     antecedents consequents  antecedent support  consequent support  support  \
0        (Apple)      (Eggs)                0.25                0.75     0.25   
1        (Bread)      (Milk)                0.50                0.75     0.50   
2  (Eggs, Bread)      (Milk)                0.25                0.75     0.25   

   confidence      lift  leverage  conviction  
0         1.0  1.333333    0.0625         inf  
1         1.0  1.333333    0.1250         inf  
2         1.0  1.333333    0.0625         inf  


In [34]:
#set of rules generated by FP algorithm
print(rules_fp)

     antecedents consequents  antecedent support  consequent support  support  \
0        (Bread)      (Milk)                0.50                0.75     0.50   
1  (Eggs, Bread)      (Milk)                0.25                0.75     0.25   
2        (Apple)      (Eggs)                0.25                0.75     0.25   

   confidence      lift  leverage  conviction  
0         1.0  1.333333    0.1250         inf  
1         1.0  1.333333    0.0625         inf  
2         1.0  1.333333    0.0625         inf  
