#  Frequent Itemset Mining: Apriori Alternatives

In this notebook, we will apply **apriori**, **FP-Growth**, and **maximal frequent itemset** methods on the congressional voting records dataset. You can learn more about this dataset here: https://archive.ics.uci.edu/ml/datasets/congressional+voting+records

 ### Import required Libraries

In [1]:
import pandas as pd
import numpy as np
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules, fpgrowth, fpmax
import matplotlib.pyplot as plt
%matplotlib inline

### T1: Data Loading

The data is located here: `/dsa/data/DSA-8410/association-mining/house-vote/house-votes-84.csv`


In [2]:
df = pd.read_csv('/dsa/data/DSA-8410/association-mining/house-vote/house-votes-84.csv') 
df.head()

Unnamed: 0,Class Name,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missile,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
0,republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
2,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y


### T2: Show the number of transactions

In [3]:
print(f"Num of transactions = {df.shape[0]}")
print(f"Maximum num of items per transaction = {df.shape[1]}")

Num of transactions = 435
Maximum num of items per transaction = 17


### T3: Transform the dataset to a binary incidence matrix for applying itemset mining methods

In [4]:
df = pd.get_dummies(df)
df.head()

Unnamed: 0,Class Name_democrat,Class Name_republican,handicapped-infants_?,handicapped-infants_n,handicapped-infants_y,water-project-cost-sharing_?,water-project-cost-sharing_n,water-project-cost-sharing_y,adoption-of-the-budget-resolution_?,adoption-of-the-budget-resolution_n,...,superfund-right-to-sue_y,crime_?,crime_n,crime_y,duty-free-exports_?,duty-free-exports_n,duty-free-exports_y,export-administration-act-south-africa_?,export-administration-act-south-africa_n,export-administration-act-south-africa_y
0,0,1,0,1,0,0,0,1,0,1,...,1,0,0,1,0,1,0,0,0,1
1,0,1,0,1,0,0,0,1,0,1,...,1,0,0,1,0,1,0,1,0,0
2,1,0,1,0,0,0,0,1,0,0,...,1,0,0,1,0,1,0,0,1,0
3,1,0,0,1,0,0,0,1,0,0,...,1,0,1,0,0,1,0,0,0,1
4,1,0,0,0,1,0,0,1,0,0,...,1,0,0,1,0,0,1,0,0,1


### T4: Indentify Frequent Patterns with FP-Growth Method. Use min_support = 0.3. Show the number of itemsets per itemset length.

In [5]:
freq_items_fp = fpgrowth(df, min_support=0.3, use_colnames=True)

# swap columns for readability 
freq_items_fp = freq_items_fp.reindex(columns=['itemsets', 'support'])
freq_items_fp['length'] = freq_items_fp['itemsets'].apply(lambda x: len(x))

print(f"Total number of frequent itemsets = {freq_items_fp.shape[0]}")
freq_items_fp.tail()

Total number of frequent itemsets = 973


Unnamed: 0,itemsets,support,length
968,"( physician-fee-freeze_n, religious-groups-in...",0.305747,3
969,"( physician-fee-freeze_n, religious-groups-in...",0.301149,3
970,"( physician-fee-freeze_n, religious-groups-in...",0.303448,4
971,"( export-administration-act-south-africa_y, w...",0.303448,2
972,"( synfuels-corporation-cutback_n, water-proje...",0.308046,2


### T5: Generate Association Rules from Frequent Itemsets with min 90% confidence.

* Show the total number of rules

In [6]:
rules_fp = association_rules(freq_items_fp, metric="confidence", min_threshold=0.90)
print(f"Total number of rules = {rules_fp.shape[0]}")
rules_fp.head()

Total number of rules = 2990


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,"( crime_y, duty-free-exports_n)",( religious-groups-in-schools_y),0.432184,0.625287,0.390805,0.904255,1.446144,0.120565,3.913665
1,"( duty-free-exports_n, handicapped-infants_n)",( religious-groups-in-schools_y),0.335632,0.625287,0.314943,0.938356,1.50068,0.105076,6.078672
2,"( duty-free-exports_n, handicapped-infants_n)",( crime_y),0.335632,0.570115,0.305747,0.910959,1.597851,0.114398,4.82794
3,( el-salvador-aid_y),( religious-groups-in-schools_y),0.487356,0.625287,0.452874,0.929245,1.486109,0.148136,5.295939
4,( el-salvador-aid_y),( crime_y),0.487356,0.570115,0.445977,0.915094,1.605105,0.168128,5.063091


### T6: Identify the top 5 rules with high confidence where `consequents` are only `Class Name_democrat`. Similarly, infer the top 5 rules with high confidence where `consequents` are only `Class Name_republican`. 

* Iterate over these two subsets of rules and print only antecedents, consequents, and confidence.
* Based on these rules, characterize democrat and republican congress members

In [7]:
rules_fp.values

array([[frozenset({' crime_y', ' duty-free-exports_n'}),
        frozenset({' religious-groups-in-schools_y'}), 0.432183908045977,
        ..., 1.4461436170212767, 0.12056546439423971, 3.9136653895274596],
       [frozenset({' duty-free-exports_n', ' handicapped-infants_n'}),
        frozenset({' religious-groups-in-schools_y'}), 0.335632183908046,
        ..., 1.5006798952457694, 0.10507596776324479, 6.078671775223486],
       [frozenset({' duty-free-exports_n', ' handicapped-infants_n'}),
        frozenset({' crime_y'}), 0.335632183908046, ...,
        1.5978513035793194, 0.11439820319725191, 4.827939876215731],
       ...,
       [frozenset({' physician-fee-freeze_n', ' religious-groups-in-schools_n'}),
        frozenset({' aid-to-nicaraguan-contras_y', 'Class Name_democrat'}),
        0.3080459770114943, ..., 1.9656305627824182, 0.14907121152067643,
        33.422988505747036],
       [frozenset({' religious-groups-in-schools_n', ' aid-to-nicaraguan-contras_y'}),
        frozenset(

In [8]:
rules_fp.shape

(2990, 9)

In [16]:
rules_fp['consequents'] = rules_fp['consequents'].apply(set)
rules_fp.dtypes

antecedents            object
consequents            object
antecedent support    float64
consequent support    float64
support               float64
confidence            float64
lift                  float64
leverage              float64
conviction            float64
dtype: object

In [31]:
rules_fp_dem = rules_fp[rules_fp['consequents'] ==set(['Class Name_democrat'])].sort_values(by=['conviction'], ascending=False)
rules_fp_rep = rules_fp[rules_fp['consequents'] == set(['Class Name_republican'])].sort_values(by=['conviction'], ascending=False)
print(rules_fp_dem.shape)
print(rules_fp_rep.shape)

(274, 9)
(32, 9)


In [32]:
rules_fp_dem[['antecedents', 'consequents', 'confidence']].head()

Unnamed: 0,antecedents,consequents,confidence
1357,"( physician-fee-freeze_n, duty-free-exports_y...",{Class Name_democrat},1.0
2629,"( physician-fee-freeze_n, aid-to-nicaraguan-c...",{Class Name_democrat},1.0
2678,"( physician-fee-freeze_n, superfund-right-to-...",{Class Name_democrat},1.0
2704,"( physician-fee-freeze_n, superfund-right-to-...",{Class Name_democrat},1.0
1403,"( physician-fee-freeze_n, duty-free-exports_y...",{Class Name_democrat},1.0


In [33]:
rules_fp_rep[['antecedents', 'consequents', 'confidence']].head()

Unnamed: 0,antecedents,consequents,confidence
665,"( synfuels-corporation-cutback_n, physician-f...",{Class Name_republican},0.978261
602,"( el-salvador-aid_y, physician-fee-freeze_y, ...",{Class Name_republican},0.971631
622,"( physician-fee-freeze_y, adoption-of-the-bud...",{Class Name_republican},0.97037
607,"( crime_y, physician-fee-freeze_y, adoption-...",{Class Name_republican},0.963768
594,"( physician-fee-freeze_y, adoption-of-the-bud...",{Class Name_republican},0.958904


### T7. Show the number of maximal frequent itemsets for min support = 0.3 

In [34]:
max_patterns = fpmax(df, min_support=0.3, use_colnames=True)

max_patterns = max_patterns.reindex(columns=['itemsets', 'support'])
max_patterns['length'] = max_patterns['itemsets'].apply(lambda x: len(x))

print(f"Total number of maximal frequent patterns = {max_patterns.shape[0]}")
max_patterns

Total number of maximal frequent patterns = 179


Unnamed: 0,itemsets,support,length
0,( synfuels-corporation-cutback_y),0.344828,1
1,"( education-spending_n, religious-groups-in-s...",0.301149,2
2,"( religious-groups-in-schools_n, adoption-of-...",0.303448,2
3,"( physician-fee-freeze_n, religious-groups-in...",0.301149,3
4,"( physician-fee-freeze_n, religious-groups-in...",0.303448,4
...,...,...,...
174,"( crime_y, export-administration-act-south-af...",0.340230,2
175,"( synfuels-corporation-cutback_n, crime_y, r...",0.328736,3
176,"( synfuels-corporation-cutback_n, adoption-of...",0.305747,2
177,"( synfuels-corporation-cutback_n, export-admi...",0.381609,2
