# Apriori


The Apriori algorithm is used for mining frequent itemsets and devising association rules from a transactional database. The parameters “support” and “confidence” are used. Support refers to items’ frequency of occurrence; confidence is a conditional probability.

A key concept in Apriori algorithm is the anti-monotonicity of the support measure. It assumes that

1. All subsets of a frequent itemset must be frequent
2. Similarly, for any infrequent itemset, all its supersets must be infrequent too


###  Algorithm
The following are the main steps of the algorithm:

1. Calculate the support of item sets (of size k = 1) in the transactional database (note that support is the frequency of 
   occurrence of an itemset). This is called generating the candidate set.
2. Prune the candidate set by eliminating items with a support less than the given threshold.
3. Join the frequent itemsets to form sets of size k + 1, and repeat the above sets until no more itemsets can be formed. This 
   will happen when the set(s) formed have a support less than​ the given support.

### Libraries useful in Apriori are listed below

In [1]:
import pandas as pd
from sklearn import preprocessing
from mlxtend.frequent_patterns import apriori
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from mlxtend.frequent_patterns import association_rules

### Install library for apriori algorithm using:
!pip install mlxtend

In [2]:
import warnings
warnings.filterwarnings('ignore')
!pip3 install mlxtend



### Load the "basket" data

In [3]:
# Load dataset and display first five rows.
df1 = pd.read_csv('BASKETS1n')
df1.head()


Unnamed: 0,cardid,value,pmethod,sex,homeown,income,age,fruitveg,freshmeat,dairy,cannedveg,cannedmeat,frozenmeal,beer,wine,softdrink,fish,confectionery
0,39808,42.7123,CHEQUE,M,NO,27000,46,F,T,T,F,F,F,F,F,F,F,T
1,67362,25.3567,CASH,F,NO,30000,28,F,T,F,F,F,F,F,F,F,F,T
2,10872,20.6176,CASH,M,NO,13200,36,F,F,F,T,F,T,T,F,F,T,F
3,26748,23.6883,CARD,F,NO,12200,26,F,F,T,F,F,F,F,T,F,F,F
4,91609,18.8133,CARD,M,YES,11000,24,F,F,F,F,F,F,F,F,F,F,F


### Perform pre-processing (if required)

In [4]:
#selecting only products columns and replacing boolean values
df = df1.iloc[:,7:]
df

Unnamed: 0,fruitveg,freshmeat,dairy,cannedveg,cannedmeat,frozenmeal,beer,wine,softdrink,fish,confectionery
0,F,T,T,F,F,F,F,F,F,F,T
1,F,T,F,F,F,F,F,F,F,F,T
2,F,F,F,T,F,T,T,F,F,T,F
3,F,F,T,F,F,F,F,T,F,F,F
4,F,F,F,F,F,F,F,F,F,F,F
...,...,...,...,...,...,...,...,...,...,...,...
995,F,F,F,T,F,F,F,F,F,F,F
996,F,F,F,T,F,F,F,F,F,T,F
997,F,T,F,F,F,F,F,F,F,F,F
998,T,F,F,F,F,F,F,T,F,F,T


In [5]:
df=df.replace('T',1)
df= df.replace('F',0)
print(df)

     fruitveg  freshmeat  dairy  cannedveg  cannedmeat  frozenmeal  beer  \
0           0          1      1          0           0           0     0   
1           0          1      0          0           0           0     0   
2           0          0      0          1           0           1     1   
3           0          0      1          0           0           0     0   
4           0          0      0          0           0           0     0   
..        ...        ...    ...        ...         ...         ...   ...   
995         0          0      0          1           0           0     0   
996         0          0      0          1           0           0     0   
997         0          1      0          0           0           0     0   
998         1          0      0          0           0           0     0   
999         0          0      1          0           0           0     0   

     wine  softdrink  fish  confectionery  
0       0          0     0              1  

### Q1. Find frequent itemsets in the dataset using Apriori

In [6]:
#apriori with min support 0.1 and confidence 0.1
col = ['fruitveg','freshmeat','dairy','cannedveg','cannedmeat','frozenmeal','beer','wine','softdrink','fish','confectionery']
frequent_items = apriori(df[col],min_support=0.1,use_colnames=True)
print(frequent_items)

    support                       itemsets
0     0.299                     (fruitveg)
1     0.183                    (freshmeat)
2     0.177                        (dairy)
3     0.303                    (cannedveg)
4     0.204                   (cannedmeat)
5     0.302                   (frozenmeal)
6     0.293                         (beer)
7     0.287                         (wine)
8     0.184                    (softdrink)
9     0.292                         (fish)
10    0.276                (confectionery)
11    0.145               (fruitveg, fish)
12    0.173        (frozenmeal, cannedveg)
13    0.167              (cannedveg, beer)
14    0.170             (frozenmeal, beer)
15    0.144          (wine, confectionery)
16    0.146  (frozenmeal, cannedveg, beer)


### Q2. Find the assoiation rules in the dataset having min confidence 10%

In [7]:
# find rules
rules = association_rules(frequent_items,min_threshold = 0.1) 
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(fruitveg),(fish),0.299,0.292,0.145,0.48495,1.660787,0.057692,1.374623
1,(fish),(fruitveg),0.292,0.299,0.145,0.496575,1.660787,0.057692,1.392463
2,(frozenmeal),(cannedveg),0.302,0.303,0.173,0.572848,1.890586,0.081494,1.631736
3,(cannedveg),(frozenmeal),0.303,0.302,0.173,0.570957,1.890586,0.081494,1.626877
4,(cannedveg),(beer),0.303,0.293,0.167,0.551155,1.881075,0.078221,1.575154
5,(beer),(cannedveg),0.293,0.303,0.167,0.569966,1.881075,0.078221,1.620802
6,(frozenmeal),(beer),0.302,0.293,0.17,0.562914,1.921208,0.081514,1.61753
7,(beer),(frozenmeal),0.293,0.302,0.17,0.580205,1.921208,0.081514,1.662715
8,(wine),(confectionery),0.287,0.276,0.144,0.501742,1.817906,0.064788,1.453063
9,(confectionery),(wine),0.276,0.287,0.144,0.521739,1.817906,0.064788,1.490818


### Q3. Find association rules having minimum antecedent_len 2 & confidence greater than 0.75

In [8]:
#rules having minimum antecedent_len 2 and confidence greater than 0.75
lst=[]
for i in range(0,len(rules)):
    if(len(rules["antecedents"]) >= 2 and rules["confidence"][i] >0.75):
        lst.append(i)
rules.iloc[lst]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
10,"(frozenmeal, cannedveg)",(beer),0.173,0.293,0.146,0.843931,2.880309,0.095311,4.530037
11,"(frozenmeal, beer)",(cannedveg),0.17,0.303,0.146,0.858824,2.834401,0.09449,4.937083
12,"(cannedveg, beer)",(frozenmeal),0.167,0.302,0.146,0.874251,2.894873,0.095566,5.550762


### Load the "zoo" data

In [9]:
zoo=pd.read_csv("zoo.csv",header=None,names=['animal name','hair','feathers','eggs','milk','airborne','aquatic','predator','toothed','backbone','breathes','venomous','fins','legs','tail','domestic','catsize','type'])
zoo

Unnamed: 0,animal name,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,venomous,fins,legs,tail,domestic,catsize,type
0,aardvark,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
1,antelope,1,0,0,1,0,0,0,1,1,1,0,0,4,1,0,1,1
2,bass,0,0,1,0,0,1,1,1,1,0,0,1,0,1,0,0,4
3,bear,1,0,0,1,0,0,1,1,1,1,0,0,4,0,0,1,1
4,boar,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,wallaby,1,0,0,1,0,0,0,1,1,1,0,0,2,1,0,1,1
97,wasp,1,0,1,0,1,0,0,0,0,1,1,0,6,0,0,0,6
98,wolf,1,0,0,1,0,0,1,1,1,1,0,0,4,1,0,1,1
99,worm,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,7


### Q4. Perform pre-processing (if required)

In [10]:
zoo=zoo[zoo.columns[1:]]

zoo = pd.concat([zoo,pd.get_dummies(zoo['legs'], prefix='LEGS')],axis=1)
zoo.drop(['legs'],axis=1, inplace=True)
zoo['type'] = zoo['type'].replace({1: 'Mammal', 2: 'Bird', 3:'Reptile',4:'Fish',5:'Amphibia',6:'Bug',7:'Invertebrate'})

zoo = pd.concat([zoo,pd.get_dummies(zoo['type'], prefix='CLASS')],axis=1)
zoo.drop(['type'],axis=1, inplace=True)
zoo

Unnamed: 0,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,...,LEGS_5,LEGS_6,LEGS_8,CLASS_Amphibia,CLASS_Bird,CLASS_Bug,CLASS_Fish,CLASS_Invertebrate,CLASS_Mammal,CLASS_Reptile
0,1,0,0,1,0,0,1,1,1,1,...,0,0,0,0,0,0,0,0,1,0
1,1,0,0,1,0,0,0,1,1,1,...,0,0,0,0,0,0,0,0,1,0
2,0,0,1,0,0,1,1,1,1,0,...,0,0,0,0,0,0,1,0,0,0
3,1,0,0,1,0,0,1,1,1,1,...,0,0,0,0,0,0,0,0,1,0
4,1,0,0,1,0,0,1,1,1,1,...,0,0,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,1,0,0,1,0,0,0,1,1,1,...,0,0,0,0,0,0,0,0,1,0
97,1,0,1,0,1,0,0,0,0,1,...,0,1,0,0,0,1,0,0,0,0
98,1,0,0,1,0,0,1,1,1,1,...,0,0,0,0,0,0,0,0,1,0
99,0,0,1,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0


### Q5. Find frequent itemsets in zoo dataset having min support 0.5 

In [126]:
#apriori with min support 0.5 and confidence 0.5
frequent_items = apriori(zoo,min_support=0.5,use_colnames=True)
frequent_items

Unnamed: 0,support,itemsets
0,0.584158,(eggs)
1,0.554455,(predator)
2,0.60396,(toothed)
3,0.821782,(backbone)
4,0.792079,(breathes)
5,0.742574,(tail)
6,0.60396,"(backbone, toothed)"
7,0.514851,"(tail, toothed)"
8,0.683168,"(backbone, breathes)"
9,0.732673,"(backbone, tail)"


### Q6. Find frequent association rules having min confidence 0.5

In [127]:
# Find and display rules
rules = association_rules(frequent_items,min_threshold = 0.5) 
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(backbone),(toothed),0.821782,0.60396,0.60396,0.73494,1.216867,0.107637,1.494149
1,(toothed),(backbone),0.60396,0.821782,0.60396,1.0,1.216867,0.107637,inf
2,(tail),(toothed),0.742574,0.60396,0.514851,0.693333,1.147978,0.066366,1.291433
3,(toothed),(tail),0.60396,0.742574,0.514851,0.852459,1.147978,0.066366,1.744774
4,(backbone),(breathes),0.821782,0.792079,0.683168,0.831325,1.049548,0.032252,1.232673
5,(breathes),(backbone),0.792079,0.821782,0.683168,0.8625,1.049548,0.032252,1.29613
6,(backbone),(tail),0.821782,0.742574,0.732673,0.891566,1.200643,0.122439,2.374037
7,(tail),(backbone),0.742574,0.821782,0.732673,0.986667,1.200643,0.122439,13.366337
8,(tail),(breathes),0.742574,0.792079,0.60396,0.813333,1.026833,0.015783,1.113861
9,(breathes),(tail),0.792079,0.742574,0.60396,0.7625,1.026833,0.015783,1.083898


### Q7. Convert the dataset into two classes "Mammal" and "others"

In [130]:
# Take mammal class column as the class column and drop others.
zoo.drop(['CLASS_Bird','CLASS_Bug','CLASS_Fish','CLASS_Invertebrate','CLASS_Reptile'],axis=1,inplace=True)
zoo.head()

Unnamed: 0,hair,feathers,eggs,milk,airborne,aquatic,predator,toothed,backbone,breathes,...,domestic,catsize,LEGS_0,LEGS_2,LEGS_4,LEGS_5,LEGS_6,LEGS_8,CLASS_Amphibia,CLASS_Mammal
0,1,0,0,1,0,0,1,1,1,1,...,0,1,0,0,1,0,0,0,0,1
1,1,0,0,1,0,0,0,1,1,1,...,0,1,0,0,1,0,0,0,0,1
2,0,0,1,0,0,1,1,1,1,0,...,0,0,1,0,0,0,0,0,0,0
3,1,0,0,1,0,0,1,1,1,1,...,0,1,0,0,1,0,0,0,0,1
4,1,0,0,1,0,0,1,1,1,1,...,0,1,0,0,1,0,0,0,0,1


### Q8. Partition the dataset into training and testing part (70:30)

In [131]:
#partition the data
train,test = train_test_split(zoo,test_size = 0.30 ,random_state = 42)

### Q9. Generate association rules for "mammal" class (training data) with min support 0.4 and confidence as 1

In [132]:
# frequent itemsets 
apri = apriori(train,min_support= 0.4,use_colnames=True)
apri

Unnamed: 0,support,itemsets
0,0.642857,(eggs)
1,0.557143,(predator)
2,0.585714,(toothed)
3,0.857143,(backbone)
4,0.771429,(breathes)
5,0.785714,(tail)
6,0.442857,(catsize)
7,0.5,"(backbone, eggs)"
8,0.414286,"(eggs, breathes)"
9,0.485714,"(eggs, tail)"


In [136]:
# find frequent rules
assoc = association_rules(apri,metric='confidence',min_threshold=1)
assoc

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(toothed),(backbone),0.585714,0.857143,0.585714,1.0,1.166667,0.083673,inf
1,(tail),(backbone),0.785714,0.857143,0.785714,1.0,1.166667,0.112245,inf
2,"(eggs, tail)",(backbone),0.485714,0.857143,0.485714,1.0,1.166667,0.069388,inf
3,"(tail, predator)",(backbone),0.428571,0.857143,0.428571,1.0,1.166667,0.061224,inf
4,"(toothed, breathes)",(backbone),0.428571,0.857143,0.428571,1.0,1.166667,0.061224,inf
5,"(tail, toothed)",(backbone),0.514286,0.857143,0.514286,1.0,1.166667,0.073469,inf
6,"(tail, breathes)",(backbone),0.628571,0.857143,0.628571,1.0,1.166667,0.089796,inf


In [140]:
# selecting rules having consequents as class mammal
assoc[assoc['consequents'] == {'CLASS_Mammal'}]
assoc

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(toothed),(backbone),0.585714,0.857143,0.585714,1.0,1.166667,0.083673,inf
1,(tail),(backbone),0.785714,0.857143,0.785714,1.0,1.166667,0.112245,inf
2,"(eggs, tail)",(backbone),0.485714,0.857143,0.485714,1.0,1.166667,0.069388,inf
3,"(tail, predator)",(backbone),0.428571,0.857143,0.428571,1.0,1.166667,0.061224,inf
4,"(toothed, breathes)",(backbone),0.428571,0.857143,0.428571,1.0,1.166667,0.061224,inf
5,"(tail, toothed)",(backbone),0.514286,0.857143,0.514286,1.0,1.166667,0.073469,inf
6,"(tail, breathes)",(backbone),0.628571,0.857143,0.628571,1.0,1.166667,0.089796,inf


### Q10. Test the rules generated on testing dataset and find precision and recall for the rule based classifier

In [141]:
#applying rules on test data
#applying rules on test data
i = 0
test['predicted'] = 0
for index, row in test.iterrows():
    for rule in rules['antecedents']:
        current = 1
        for col in rule:
            if row[col] == 0: 
                current = 0
                break
        if current == 1:
            test.at[index,'predicted'] = 1
            break

In [None]:
# evaluation measures
from sklearn.metrics import classification_report,confusion_matrix
from sklearn

In [None]:
# print classification report

### Q12. Which out of the two classifiers performs better.

In [None]:
# Name of the classifier with accuracy value.