# Association Rules

#### In this section, we will learn about how to use assocition rules in python and how to filter data based on different metrics of association rules.


Following libraries are used for association rules:
- pandas
- numpy
- matplotlib
- mlxtend

In [1]:
# Import necessary modules

import numpy as np
import pandas as pd
import csv
from matplotlib import pyplot as plt

# Import Apriori module and TransactionEncoder module and association module from mlxtend

from mlxtend.frequent_patterns import apriori
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import association_rules as arule

### Read data from repair.csv

We use repair.csv file as a data set for finding frequent item sets. Apriori algorithm is used for finding frequent itemsets.


In [2]:
# Read file 'repair.csv' and change the data format for Apriori algorithm

data_set = []

with open("repair.csv") as csvFile:
    reader = csv.reader(csvFile)
    for row in reader:
        data_set.append(row)


In [3]:
# learn to use TransactionEncoder module to convert an array to DataFrame for Apriori algorithm in mlxtend

te = TransactionEncoder()
te_ary = te.fit(data_set).transform(data_set)
data = pd.DataFrame(te_ary, columns = te.columns_)
data.tail(5)
# learn to use Apriori algorithm from mlxtend

frequent_itemsets = apriori(data, min_support = 0.4, use_colnames = True)

### Filtering data based on metrics of Association rules
In python you can filter frequent itemsets based on different metrics such as support, confidence and lift.

In [4]:
# learn to use association rule algorithm from mlxtend and filter data based on one metric.

rules_association =arule(frequent_itemsets, metric = 'lift', min_threshold = 0.8)
rules_association

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Analyze Defect),(Archive Repair),1.000000,0.905797,0.905797,0.905797,1.000000,0.000000,1.000000
1,(Archive Repair),(Analyze Defect),0.905797,1.000000,0.905797,1.000000,1.000000,0.000000,inf
2,(Analyze Defect),(Inform User),1.000000,0.998188,0.998188,0.998188,1.000000,0.000000,1.000000
3,(Inform User),(Analyze Defect),0.998188,1.000000,0.998188,1.000000,1.000000,0.000000,inf
4,(Analyze Defect),(Register),1.000000,1.000000,1.000000,1.000000,1.000000,0.000000,inf
...,...,...,...,...,...,...,...,...,...
727,(Register),"(Inform User, Repair (Complex), Analyze Defect...",1.000000,0.550725,0.550725,0.550725,1.000000,0.000000,1.000000
728,(Repair (Complex)),"(Inform User, Register, Analyze Defect, Test R...",0.596920,0.905797,0.550725,0.922610,1.018561,0.010036,1.217249
729,(Analyze Defect),"(Test Repair, Register, Repair (Complex), Info...",1.000000,0.550725,0.550725,0.550725,1.000000,0.000000,1.000000
730,(Inform User),"(Register, Repair (Complex), Analyze Defect, T...",0.998188,0.550725,0.550725,0.551724,1.001815,0.000998,1.002230


#### Question:Change the metric to lift and support. Inevestigate the effect of that on the table.

### Finding qualified frequent itemsets using association rules
You can use association rules for finding qualified itemsets for different datasets.

#### Question:  Find frequent item sets with minimum support of 0.4. Store them in frequent_itemsets variable.


In [5]:
#Answer
frequent_itemsets = apriori(data, min_support = 0.4, use_colnames = True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,1.000000,(Analyze Defect)
1,0.905797,(Archive Repair)
2,0.998188,(Inform User)
3,1.000000,(Register)
4,0.596920,(Repair (Complex))
...,...,...
74,0.550725,"(Register, Repair (Complex), Analyze Defect, T..."
75,0.595109,"(Inform User, Register, Repair (Complex), Anal..."
76,0.438406,"(Inform User, Register, Analyze Defect, Repair..."
77,0.550725,"(Inform User, Register, Repair (Complex), Test..."


### Filtering itemsets based on length and metrics of assosciation rules
In this section, you will learn how to filter frequent item sets based on length of them.

In [6]:
# Add another column named 'length' in 'frequent_itemsets' which indicates the number of items in each frequent itemset.

frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))

# Filter out the frequent itemsets which have a length longer than 2 and a support bigger than 0.3. 

# Store these found itemsets in variable 'frequent_itemsets_filtered'.

frequent_itemsets_filtered = frequent_itemsets.loc[(frequent_itemsets['length'] > 2) & (frequent_itemsets['support'] > 0.3)]   
frequent_itemsets_filtered

Unnamed: 0,support,itemsets,length
26,0.905797,"(Analyze Defect, Inform User, Archive Repair)",3
27,0.905797,"(Analyze Defect, Register, Archive Repair)",3
28,0.550725,"(Analyze Defect, Repair (Complex), Archive Rep...",3
29,0.905797,"(Analyze Defect, Test Repair, Archive Repair)",3
30,0.998188,"(Analyze Defect, Inform User, Register)",3
31,0.595109,"(Analyze Defect, Inform User, Repair (Complex))",3
32,0.438406,"(Repair (Simple), Analyze Defect, Inform User)",3
33,0.998188,"(Test Repair, Analyze Defect, Inform User)",3
34,0.59692,"(Analyze Defect, Register, Repair (Complex))",3
35,0.438406,"(Repair (Simple), Analyze Defect, Register)",3


### Demonstrating selective metrics of association rules in one table
 In this section, you will learn how to show selective metrics of association rules in one table.

In [7]:
# Mine association rules from the discovered frequent itemsets stored in variable 'frequent_itemsets', set minimum confidence to 0.5.

# Store the discovered rules in variable 'rules_association'.

rules_association =arule(frequent_itemsets, metric = 'confidence', min_threshold = 0.5)

# Filter out the rules with lift larger than 1 and support larger than 0.4, store the discovered rules in variable 'filtered_rules'.

filtered_rules = rules_association.loc[(rules_association['lift'] > 1) & (rules_association['support'] > 0.4)]     

# Show the columns 'antecedents', 'consequents', 'support', 'confidence' and 'lift' of variable 'filtered_rules' 

filtered_rules[['support', 'confidence', 'lift']]

Unnamed: 0,support,confidence,lift
11,0.905797,0.907441,1.001815
12,0.905797,1.000000,1.001815
15,0.550725,0.922610,1.018561
16,0.550725,0.608000,1.018561
17,0.905797,0.907441,1.001815
...,...,...,...
660,0.550725,0.608000,1.021662
661,0.550725,0.551724,1.001815
663,0.550725,0.922610,1.018561
665,0.550725,0.551724,1.001815
