<a href="https://colab.research.google.com/github/19pa1a0220/AI-LAB/blob/master/EXPERIMENT_10_Association_analysis_with_association_rules_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Association Analysis- Apriori Implementation

## Step 1: Import the packages

In [None]:
# Import numpy and pandas
import numpy as np
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder # import package to preprocess the data.

## Step 2: Import the dataset

In [None]:
# https://www.kaggle.com/shazadudwadia/supermarket#GroceryStoreDataSet.csv
# Note: I have added column name "Products" before importing into python environment
df=pd.read_csv("GroceryStoreDataSet.csv")
df.head()

Unnamed: 0,Products
0,"MILK,BREAD,BISCUIT"
1,"BREAD,MILK,BISCUIT,CORNFLAKES"
2,"BREAD,TEA,BOURNVITA"
3,"JAM,MAGGI,BREAD,MILK"
4,"MAGGI,TEA,BISCUIT"


## Step 3: Perform data pre-processing

#### In order to apply apriori algorithm, data should be converted into *one hot encoding format* . For this, "TransactionEncoder ()" function is used. But the "TransactionEncoder ()" function needs the data as list. So, "products" column is converted into list. 

In [None]:
# Convert the column in the dataset into list of lists. 

data = list(df["Products"].apply(lambda x:x.split(',')))
data 

[['MILK', 'BREAD', 'BISCUIT'],
 ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['JAM', 'MAGGI', 'BREAD', 'MILK'],
 ['MAGGI', 'TEA', 'BISCUIT'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['MAGGI', 'TEA', 'CORNFLAKES'],
 ['MAGGI', 'BREAD', 'TEA', 'BISCUIT'],
 ['JAM', 'MAGGI', 'BREAD', 'TEA'],
 ['BREAD', 'MILK'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'COCK'],
 ['BREAD', 'SUGER', 'BISCUIT'],
 ['COFFEE', 'SUGER', 'CORNFLAKES'],
 ['BREAD', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['TEA', 'MILK', 'COFFEE', 'CORNFLAKES']]

In [None]:
# Transaction encoder converts the data into "one hot encoding". Algorithm wants the data to be in this format.

te = TransactionEncoder()
te_data = te.fit(data).transform(data)
te_data

df = pd.DataFrame(te_data,columns=te.columns_)
df.head()

Unnamed: 0,BISCUIT,BOURNVITA,BREAD,COCK,COFFEE,CORNFLAKES,JAM,MAGGI,MILK,SUGER,TEA
0,True,False,True,False,False,False,False,False,True,False,False
1,True,False,True,False,False,True,False,False,True,False,False
2,False,True,True,False,False,False,False,False,False,False,True
3,False,False,True,False,False,False,True,True,True,False,False
4,True,False,False,False,False,False,False,True,False,False,True


## Step 4:  Find the frequent item sets.


###  Till now, we have completed *Data Preprocessing* . Now we will find frequent item sets. A function  *apriori()*  is imported for this . 

Here we have given support count as 20%. Now, itemsets above this minimum support will be considered as frequent itemsets.

In [None]:
# import the package to import apriori algorithm.
from mlxtend.frequent_patterns import apriori

In [None]:
# Here we can define the minimum support expected by the user.
frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.35,(BISCUIT)
1,0.2,(BOURNVITA)
2,0.65,(BREAD)
3,0.4,(COFFEE)
4,0.3,(CORNFLAKES)
5,0.25,(MAGGI)
6,0.25,(MILK)
7,0.3,(SUGER)
8,0.35,(TEA)
9,0.2,"(BISCUIT, BREAD)"


## Step 5:Now get the frequent association rules from frequent itemsets.

In [None]:
# Now get the association rules satisfing confidence defined by the user.
# import the package to find association rules
from mlxtend.frequent_patterns import association_rules

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(MILK),(BREAD),0.25,0.65,0.2,0.8,1.230769,0.0375,1.75
1,(MAGGI),(TEA),0.25,0.35,0.2,0.8,2.285714,0.1125,3.25


## The frequent association rules are:
### {Milk}-> {Bread} [s=0.2, c=0.8]
### {Maggi}-> {Tea} [s=0.2, c=0.8]

