<h1> Part B</h1>

In [1]:
import math
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## Association Rule Mining

Association rule mining is a rule-based machine learning method for discovering interesting relations between variables in large databases. The cleaned and pre-processed dataset (Dataset_plants.csv) is mined to itemsets and then rules are generated using the itemsets.

The different types of itemsets are:
1. <b>Frequent Itemset Mining</b>
    <br>a. Apriori
    <br>b. FP Growth
2. <b>Closed Frequent Itemsets</b>
    <br>a. A-Close (modified version of apriori)
    <br>b. ECLAT-Close
3. <b>Maximal Frequent Itemsets</b>
    <br>a. FP-Max
    <br>b. A-Max
4. <b>Longest Frequent Itemsets</b>
<br><br>

## Various Metrics to evaluate Association Rules

The various metrics for evaluating association rules and setting selection thresholds are listed below.<br>
* A rule can be divided into two parts - antecedent and consequent.<br>
> Given a rule <b>"A -> C"</b>,  <b>A</b> stands for <b>antecedent</b> and <b>C</b> stands for <b>consequent</b>.<br>
* The antecedent refers to the condition, clause or item, and the consequent refers to what it implies or its consequence. 
* The suppoer for the antecedent as well as the consequent is also calculated.<br><br>

- **Support**:
 > support(A→C)=support(A∪C),range: [0,1]
    
* The support metric is defined for itemsets, not assocication rules.  Typically, support is used to measure the abundance or frequency (often interpreted as significance or importance) of an itemset in a database.
    
* The table produced by the association rule mining algorithm contains three different support metrics: **antecedent support**, **consequent support**, and **support**. 
* The **support** metric then computes the support of the combined itemset A ∪ C .<br><br>

- **Confidence**:
> confidence(A→C)=support(A→C)/support(A),range: [0,1]
    
* It measures how often each item in C appears in transactions that contains items in A also.
* Note that the metric is not symmetric or directed; for instance, the confidence for A->C is different than the confidence for C->A.<br><br>

- **Lift**:
> lift(A→C)=confidence(A→C)/support(C),range: [0,∞]
    
* Lift value near 1 indicates A and C almost often appear together as expected.
* Greater than 1 means they appear together more than expected.
* Less than 1 means they appear less than expected.
* Greater lift values indicate stronger association.<br><br>

- **Leverage**:
> levarage(A→C)=support(A→C)−support(A)×support(C),range: [−1,1]
    
* Leverage computes the difference between the observed frequency of A and C appearing together and the frequency that would be expected if A and C were independent. 
* A leverage value of 0 indicates independence.<br><br>

- **Conviction**:
> conviction(A→C)=1−support(C)/1−confidence(A→C),range: [0,∞]
    
* A high conviction value means that the consequent is highly depending on the antecedent. 
* For instance, in the case of a perfect confidence score, the denominator becomes 0 (due to 1 - 1) for which the conviction score is defined as *inf* . 
* Similar to lift, if items are independent, the conviction is 1.

In [2]:
data = pd.read_csv('dataset_cleaned.csv')
data.head()

Unnamed: 0,Symbol,Synonym Symbol,Scientific Name with Author,National Common Name,Family
0,ABAR,ABAB3,Abronia argillosa S.L. Welsh & Goodrich,clay sand verbena,Nyctaginaceae
1,ABCA,ABAB3,Abronia carletonii J.M. Coult. & Fisher,Carleton's sand verbena,Nyctaginaceae
2,ABCO,ABAB3,Abies concolor (Gord. & Glend.) Lindl. ex Hild...,white fir,Pinaceae
3,ABCO,ABCOC,Abies concolor (Gord. & Glend.) Lindl. ex Hild...,rubber rabbitbrush,Pinaceae
4,ABEL,ABAB3,Abronia elliptica A. Nelson,fragrant white sand verbena,Nyctaginaceae


One-hot encoding of the cleaned plants dataset (dataset_cleaned.csv)

In [3]:
from mlxtend.preprocessing import TransactionEncoder

data_list = data.values.tolist()
One_Hot_Encoder = TransactionEncoder()
One_Hot_Encoder_Values = One_Hot_Encoder.fit(data_list).transform(data_list)
data_enc = pd.DataFrame(One_Hot_Encoder_Values , columns=One_Hot_Encoder.columns_)

### Frequent Itemset Mining

Frequent patterns are patterns which appear frequently within a dataset. A frequent itemset is one which is made up of one of these patterns, which is why frequent pattern mining is often alternately referred to as frequent itemset mining.

#### 1.Apriori

In [4]:
#pip install mlxtend
from mlxtend.frequent_patterns import apriori, association_rules
freq_itemsets = apriori(data_enc,min_support=0.004,use_colnames=True)
freq_itemsets

Unnamed: 0,support,itemsets
0,0.357670,(ABAB3)
1,0.013669,(Apiaceae)
2,0.176730,(Asteraceae)
3,0.017815,(Boraginaceae)
4,0.052086,(Brassicaceae)
...,...,...
85,0.026302,"(rubber rabbitbrush, Salicaceae)"
86,0.005442,"(rubber rabbitbrush, Saxifragaceae)"
87,0.019176,"(rubber rabbitbrush, Scrophulariaceae)"
88,0.004146,"(Solanaceae, rubber rabbitbrush)"


<h4>Rules based on Apriori:</h4>

In [5]:
association_rules(freq_itemsets, metric="support", min_threshold=0.03)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(ABAB3),(Asteraceae),0.35767,0.17673,0.059925,0.167542,0.948013,-0.003286,0.988963
1,(Asteraceae),(ABAB3),0.17673,0.35767,0.059925,0.339076,0.948013,-0.003286,0.971866
2,(ABAB3),(Poaceae),0.35767,0.118295,0.037186,0.103967,0.878877,-0.005125,0.984009
3,(Poaceae),(ABAB3),0.118295,0.35767,0.037186,0.314348,0.878877,-0.005125,0.936816
4,(rubber rabbitbrush),(Asteraceae),0.643884,0.17673,0.117582,0.182614,1.033295,0.003789,1.007199
5,(Asteraceae),(rubber rabbitbrush),0.17673,0.643884,0.117582,0.665323,1.033295,0.003789,1.064056
6,(Brassicaceae),(rubber rabbitbrush),0.052086,0.643884,0.031809,0.610697,0.948457,-0.001729,0.914751
7,(rubber rabbitbrush),(Brassicaceae),0.643884,0.052086,0.031809,0.049401,0.948457,-0.001729,0.997176
8,(Fabaceae),(rubber rabbitbrush),0.062646,0.643884,0.037704,0.601861,0.934735,-0.002633,0.894451
9,(rubber rabbitbrush),(Fabaceae),0.643884,0.062646,0.037704,0.058557,0.934735,-0.002633,0.995657


In [6]:
association_rules(freq_itemsets, metric="confidence", min_threshold=0.7)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Cactaceae),(rubber rabbitbrush),0.01192,0.643884,0.009005,0.755435,1.173246,0.00133,1.456117
1,(Gentianaceae),(rubber rabbitbrush),0.007904,0.643884,0.005701,0.721311,1.12025,0.000612,1.277826
2,(MIGU),(Scrophulariaceae),0.004535,0.033687,0.004535,1.0,29.684615,0.004382,inf
3,(MIGU),(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
4,(Nyctaginaceae),(rubber rabbitbrush),0.00758,0.643884,0.005507,0.726496,1.128301,0.000626,1.302048
5,(Onagraceae),(rubber rabbitbrush),0.022609,0.643884,0.016649,0.73639,1.143667,0.002091,1.350917
6,(Potamogetonaceae),(rubber rabbitbrush),0.005701,0.643884,0.004017,0.704545,1.094211,0.000346,1.205314
7,(Salicaceae),(rubber rabbitbrush),0.030513,0.643884,0.026302,0.861996,1.338743,0.006655,2.580468
8,"(rubber rabbitbrush, MIGU)",(Scrophulariaceae),0.00447,0.033687,0.00447,1.0,29.684615,0.004319,inf
9,"(Scrophulariaceae, MIGU)",(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809


In [7]:
association_rules(freq_itemsets, metric="lift", min_threshold=1.5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(ABAB3),(Apiaceae),0.35767,0.013669,0.00758,0.021192,1.550317,0.002691,1.007685
1,(Apiaceae),(ABAB3),0.013669,0.35767,0.00758,0.554502,1.550317,0.002691,1.441825
2,(Scrophulariaceae),(MIGU),0.033687,0.004535,0.004535,0.134615,29.684615,0.004382,1.150315
3,(MIGU),(Scrophulariaceae),0.004535,0.033687,0.004535,1.0,29.684615,0.004382,inf
4,(rubber rabbitbrush),(MIGU),0.643884,0.004535,0.00447,0.006942,1.530887,0.00155,1.002424
5,(MIGU),(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
6,"(rubber rabbitbrush, Scrophulariaceae)",(MIGU),0.019176,0.004535,0.00447,0.233108,51.403668,0.004383,1.298051
7,"(rubber rabbitbrush, MIGU)",(Scrophulariaceae),0.00447,0.033687,0.00447,1.0,29.684615,0.004319,inf
8,"(Scrophulariaceae, MIGU)",(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
9,(rubber rabbitbrush),"(Scrophulariaceae, MIGU)",0.643884,0.004535,0.00447,0.006942,1.530887,0.00155,1.002424


In [8]:
association_rules(freq_itemsets, metric="leverage", min_threshold=0.003)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(rubber rabbitbrush),(Asteraceae),0.643884,0.17673,0.117582,0.182614,1.033295,0.003789,1.007199
1,(Asteraceae),(rubber rabbitbrush),0.17673,0.643884,0.117582,0.665323,1.033295,0.003789,1.064056
2,(Scrophulariaceae),(MIGU),0.033687,0.004535,0.004535,0.134615,29.684615,0.004382,1.150315
3,(MIGU),(Scrophulariaceae),0.004535,0.033687,0.004535,1.0,29.684615,0.004382,inf
4,(rubber rabbitbrush),(Poaceae),0.643884,0.118295,0.081368,0.126371,1.06827,0.0052,1.009244
5,(Poaceae),(rubber rabbitbrush),0.118295,0.643884,0.081368,0.687842,1.06827,0.0052,1.140819
6,(rubber rabbitbrush),(Salicaceae),0.643884,0.030513,0.026302,0.040849,1.338743,0.006655,1.010776
7,(Salicaceae),(rubber rabbitbrush),0.030513,0.643884,0.026302,0.861996,1.338743,0.006655,2.580468
8,"(rubber rabbitbrush, Scrophulariaceae)",(MIGU),0.019176,0.004535,0.00447,0.233108,51.403668,0.004383,1.298051
9,"(rubber rabbitbrush, MIGU)",(Scrophulariaceae),0.00447,0.033687,0.00447,1.0,29.684615,0.004319,inf


In [9]:
association_rules(freq_itemsets, metric="conviction", min_threshold=1.5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(MIGU),(Scrophulariaceae),0.004535,0.033687,0.004535,1.0,29.684615,0.004382,inf
1,(MIGU),(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
2,(Salicaceae),(rubber rabbitbrush),0.030513,0.643884,0.026302,0.861996,1.338743,0.006655,2.580468
3,"(rubber rabbitbrush, MIGU)",(Scrophulariaceae),0.00447,0.033687,0.00447,1.0,29.684615,0.004319,inf
4,"(Scrophulariaceae, MIGU)",(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
5,(MIGU),"(rubber rabbitbrush, Scrophulariaceae)",0.004535,0.019176,0.00447,0.985714,51.403668,0.004383,68.657683


#### 2. FP Growth

In [10]:
from mlxtend.frequent_patterns import fpgrowth
freq_itemsets = fpgrowth(data_enc,min_support=0.004,use_colnames=True)
freq_itemsets

Unnamed: 0,support,itemsets
0,0.357670,(ABAB3)
1,0.007580,(Nyctaginaceae)
2,0.643884,(rubber rabbitbrush)
3,0.005895,(Malvaceae)
4,0.118295,(Poaceae)
...,...,...
85,0.004470,"(rubber rabbitbrush, MIGU)"
86,0.004470,"(rubber rabbitbrush, Scrophulariaceae, MIGU)"
87,0.004211,"(ABAB3, Salicaceae)"
88,0.026302,"(rubber rabbitbrush, Salicaceae)"


In [11]:
association_rules(freq_itemsets, metric="support", min_threshold=0.03)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(ABAB3),(Poaceae),0.35767,0.118295,0.037186,0.103967,0.878877,-0.005125,0.984009
1,(Poaceae),(ABAB3),0.118295,0.35767,0.037186,0.314348,0.878877,-0.005125,0.936816
2,(rubber rabbitbrush),(Poaceae),0.643884,0.118295,0.081368,0.126371,1.06827,0.0052,1.009244
3,(Poaceae),(rubber rabbitbrush),0.118295,0.643884,0.081368,0.687842,1.06827,0.0052,1.140819
4,(ABAB3),(Asteraceae),0.35767,0.17673,0.059925,0.167542,0.948013,-0.003286,0.988963
5,(Asteraceae),(ABAB3),0.17673,0.35767,0.059925,0.339076,0.948013,-0.003286,0.971866
6,(rubber rabbitbrush),(Asteraceae),0.643884,0.17673,0.117582,0.182614,1.033295,0.003789,1.007199
7,(Asteraceae),(rubber rabbitbrush),0.17673,0.643884,0.117582,0.665323,1.033295,0.003789,1.064056
8,(Brassicaceae),(rubber rabbitbrush),0.052086,0.643884,0.031809,0.610697,0.948457,-0.001729,0.914751
9,(rubber rabbitbrush),(Brassicaceae),0.643884,0.052086,0.031809,0.049401,0.948457,-0.001729,0.997176


In [12]:
association_rules(freq_itemsets, metric="confidence", min_threshold=0.7)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Nyctaginaceae),(rubber rabbitbrush),0.00758,0.643884,0.005507,0.726496,1.128301,0.000626,1.302048
1,(Onagraceae),(rubber rabbitbrush),0.022609,0.643884,0.016649,0.73639,1.143667,0.002091,1.350917
2,(Gentianaceae),(rubber rabbitbrush),0.007904,0.643884,0.005701,0.721311,1.12025,0.000612,1.277826
3,(Cactaceae),(rubber rabbitbrush),0.01192,0.643884,0.009005,0.755435,1.173246,0.00133,1.456117
4,(MIGU),(Scrophulariaceae),0.004535,0.033687,0.004535,1.0,29.684615,0.004382,inf
5,(MIGU),(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
6,"(rubber rabbitbrush, MIGU)",(Scrophulariaceae),0.00447,0.033687,0.00447,1.0,29.684615,0.004319,inf
7,"(Scrophulariaceae, MIGU)",(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
8,(MIGU),"(rubber rabbitbrush, Scrophulariaceae)",0.004535,0.019176,0.00447,0.985714,51.403668,0.004383,68.657683
9,(Salicaceae),(rubber rabbitbrush),0.030513,0.643884,0.026302,0.861996,1.338743,0.006655,2.580468


In [13]:
association_rules(freq_itemsets, metric="lift", min_threshold=1.5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(ABAB3),(Apiaceae),0.35767,0.013669,0.00758,0.021192,1.550317,0.002691,1.007685
1,(Apiaceae),(ABAB3),0.013669,0.35767,0.00758,0.554502,1.550317,0.002691,1.441825
2,(Scrophulariaceae),(MIGU),0.033687,0.004535,0.004535,0.134615,29.684615,0.004382,1.150315
3,(MIGU),(Scrophulariaceae),0.004535,0.033687,0.004535,1.0,29.684615,0.004382,inf
4,(rubber rabbitbrush),(MIGU),0.643884,0.004535,0.00447,0.006942,1.530887,0.00155,1.002424
5,(MIGU),(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
6,"(rubber rabbitbrush, Scrophulariaceae)",(MIGU),0.019176,0.004535,0.00447,0.233108,51.403668,0.004383,1.298051
7,"(rubber rabbitbrush, MIGU)",(Scrophulariaceae),0.00447,0.033687,0.00447,1.0,29.684615,0.004319,inf
8,"(Scrophulariaceae, MIGU)",(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
9,(rubber rabbitbrush),"(Scrophulariaceae, MIGU)",0.643884,0.004535,0.00447,0.006942,1.530887,0.00155,1.002424


In [14]:
association_rules(freq_itemsets, metric="leverage", min_threshold=0.003)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(rubber rabbitbrush),(Poaceae),0.643884,0.118295,0.081368,0.126371,1.06827,0.0052,1.009244
1,(Poaceae),(rubber rabbitbrush),0.118295,0.643884,0.081368,0.687842,1.06827,0.0052,1.140819
2,(rubber rabbitbrush),(Asteraceae),0.643884,0.17673,0.117582,0.182614,1.033295,0.003789,1.007199
3,(Asteraceae),(rubber rabbitbrush),0.17673,0.643884,0.117582,0.665323,1.033295,0.003789,1.064056
4,(Scrophulariaceae),(MIGU),0.033687,0.004535,0.004535,0.134615,29.684615,0.004382,1.150315
5,(MIGU),(Scrophulariaceae),0.004535,0.033687,0.004535,1.0,29.684615,0.004382,inf
6,"(rubber rabbitbrush, Scrophulariaceae)",(MIGU),0.019176,0.004535,0.00447,0.233108,51.403668,0.004383,1.298051
7,"(rubber rabbitbrush, MIGU)",(Scrophulariaceae),0.00447,0.033687,0.00447,1.0,29.684615,0.004319,inf
8,(Scrophulariaceae),"(rubber rabbitbrush, MIGU)",0.033687,0.00447,0.00447,0.132692,29.684615,0.004319,1.147839
9,(MIGU),"(rubber rabbitbrush, Scrophulariaceae)",0.004535,0.019176,0.00447,0.985714,51.403668,0.004383,68.657683


In [15]:
association_rules(freq_itemsets, metric="conviction", min_threshold=1.5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(MIGU),(Scrophulariaceae),0.004535,0.033687,0.004535,1.0,29.684615,0.004382,inf
1,(MIGU),(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
2,"(rubber rabbitbrush, MIGU)",(Scrophulariaceae),0.00447,0.033687,0.00447,1.0,29.684615,0.004319,inf
3,"(Scrophulariaceae, MIGU)",(rubber rabbitbrush),0.004535,0.643884,0.00447,0.985714,1.530887,0.00155,24.92809
4,(MIGU),"(rubber rabbitbrush, Scrophulariaceae)",0.004535,0.019176,0.00447,0.985714,51.403668,0.004383,68.657683
5,(Salicaceae),(rubber rabbitbrush),0.030513,0.643884,0.026302,0.861996,1.338743,0.006655,2.580468


### Closed Frequent Itemset

An itemset is closed if none of its immediate supersets have same support count same as Itemset.

#### 1. A-Close

In [16]:
#pip install pyfim
from fim import apriori
freq_itemsets = pd.DataFrame(apriori(data_list,target='c',supp=0.02,report='s'),columns = ['Itemset','Support'])
freq_itemsets

Unnamed: 0,Itemset,Support
0,"(OECAN, Onagraceae)",0.000259
1,"(rosy pussytoes, Asteraceae, ABAB3)",0.000259
2,"(COSTV2, Orchidaceae)",0.000259
3,"(CHLA13, Onagraceae)",0.000259
4,"(AGAUA, Asteraceae)",0.000259
...,...,...
2532,"(Asteraceae, ABAB3, rubber rabbitbrush)",0.000777
2533,"(Asteraceae, rubber rabbitbrush)",0.117582
2534,"(ABAB3,)",0.357670
2535,"(ABAB3, rubber rabbitbrush)",0.001555


#### 2. ECLAT-Close

In [17]:
from fim import eclat
freq_itemsets = pd.DataFrame(eclat(data_list,target='c',supp=0.02,report='s'),columns = ['Itemset','Support'])
freq_itemsets

Unnamed: 0,Itemset,Support
0,"(rubber rabbitbrush,)",0.643884
1,"(ABAB3, rubber rabbitbrush)",0.001555
2,"(ABAB3,)",0.357670
3,"(Asteraceae, rubber rabbitbrush)",0.117582
4,"(Asteraceae, ABAB3, rubber rabbitbrush)",0.000777
...,...,...
2532,"(BRRA2, Poaceae)",0.000259
2533,"(COLA5, Asteraceae)",0.000259
2534,"(NULUP, Nymphaeaceae)",0.000259
2535,"(LALAL3, Fabaceae)",0.000259


### Maximal Frequent Itemset

An itemset is maximal frequent if none of its supersets are frequent.

#### 1. FP-Max

In [18]:
from mlxtend.frequent_patterns import fpmax,association_rules
freq_itemsets = fpmax(data_enc, min_support=0.0002, use_colnames=True,max_len = 10)
freq_itemsets

Unnamed: 0,support,itemsets
0,0.000259,"(POHE3, Polypodiaceae)"
1,0.000259,"(THOCO2, Ranunculaceae)"
2,0.000259,"(Fabaceae, THRH)"
3,0.000259,"(COVI5, Chenopodiaceae)"
4,0.000259,"(COUMP, Santalaceae)"
...,...,...
1453,0.007580,"(ABAB3, Apiaceae)"
1454,0.009588,"(ABAB3, Ranunculaceae)"
1455,0.004211,"(ABAB3, Salicaceae)"
1456,0.037186,"(rubber rabbitbrush, ABAB3, Poaceae)"


In [19]:
from fim import apriori
freq_itemsets = pd.DataFrame(apriori(data_list,target='m',supp=0.02,report='s'),columns = ['Itemset','Support'])
freq_itemsets

Unnamed: 0,Itemset,Support
0,"(OECAN, Onagraceae)",0.000259
1,"(rosy pussytoes, Asteraceae, ABAB3)",0.000259
2,"(COSTV2, Orchidaceae)",0.000259
3,"(CHLA13, Onagraceae)",0.000259
4,"(AGAUA, Asteraceae)",0.000259
...,...,...
1453,"(Apiaceae, ABAB3)",0.007580
1454,"(Ranunculaceae, ABAB3)",0.009588
1455,"(Salicaceae, ABAB3)",0.004211
1456,"(Poaceae, ABAB3, rubber rabbitbrush)",0.000259
