# Programming Assignment #5 

## Association rule mining 

The market basket transactions dataset (transactions_data.txt)contains list of items purchased by customer in each transaction.

- load the transaction dataset file
- use minimum support = 0.2 and use_colname=True in apriori method 
- select metric as confidence in association rules
- use minimum threshold = 0.5

Ex: If the minimum support is 0.4, the metric is confidence and minimum threshold is 0.5 then some of the outputs are: 
- the least frequency of frequent 1-itemset is ['Queso'].
- the support, confidence, and lift of rule, ['Queso'] -> ['Tortilla chips'] are:
  - consequent support = 0.7
  - support = 0.4
  - confidence = 1.00
  - lift = 1.42

In [18]:
# Import the packages
# Import the necessary packages
import numpy as np
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Loading the data
def load_dataset(path_to_data):
    transactions = []
    with open(path_to_data, 'r') as fid:
        for line in fid:
            # Split each line into items and strip whitespace
            transaction = line.strip().split(',')
            transactions.append(transaction)
    return transactions

# Set the path to the data file
path_to_data = "transactions_data.txt"  
dataset = load_dataset(path_to_data)

# Transform the data to a format suitable for the apriori function
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Apply the apriori algorithm with a minimum support threshold, e.g., 0.2
frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)
print("Frequent Itemsets:")
print(frequent_itemsets)

# Generate the association rules with a minimum confidence threshold, e.g., 0.5
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
print("\nAssociation Rules:")

# Filter 1-itemsets and find the one with the highest support
frequent_1_itemsets = frequent_itemsets[frequent_itemsets['itemsets'].apply(len) == 1]
highest_freq_itemset = frequent_1_itemsets.loc[frequent_1_itemsets['support'].idxmax()]
print("Itemset with the highest frequency for frequent 1-itemset:", highest_freq_itemset['itemsets'])

# Filter 2-itemsets and find the one with the lowest support
frequent_2_itemsets = frequent_itemsets[frequent_itemsets['itemsets'].apply(len) == 2]
least_freq_itemset = frequent_2_itemsets.loc[frequent_2_itemsets['support'].idxmin()]
print("Itemset with the least frequency for frequent 2-itemset:", least_freq_itemset['itemsets'])

# Find the support for the consequent ['Salsa', 'Tortilla chips']
consequent_support = frequent_itemsets[frequent_itemsets['itemsets'] == {'Salsa', 'Tortilla chips'}]['support']
if not consequent_support.empty:
    print("Consequent support for ['Salsa', 'Tortilla chips']:", consequent_support.values[0])
else:
    print("Consequent ['Salsa', 'Tortilla chips'] not found in frequent itemsets.")

# Filter for the specific rule: ['Queso'] -> ['Salsa', 'Tortilla chips']
specific_rule = rules[(rules['antecedents'] == {'Queso'}) & (rules['consequents'] == {'Salsa', 'Tortilla chips'})]

# Check if the rule exists and print its confidence
if not specific_rule.empty:
    confidence_value = specific_rule['confidence'].values[0]
    print("Confidence for the rule ['Queso'] -> ['Salsa', 'Tortilla chips']:", confidence_value)
else:
    print("The rule ['Queso'] -> ['Salsa', 'Tortilla chips'] was not found with the current settings.")

# Filter for the specific rule: ['Queso'] -> ['Salsa', 'Tortilla chips']
specific_rule = rules[(rules['antecedents'] == {'Queso'}) & (rules['consequents'] == {'Salsa', 'Tortilla chips'})]

# Check if the rule exists and print its lift
if not specific_rule.empty:
    lift_value = specific_rule['lift'].values[0]
    print("Lift for the rule ['Queso'] -> ['Salsa', 'Tortilla chips']:", lift_value)
else:
    print("The rule ['Queso'] -> ['Salsa', 'Tortilla chips'] was not found with the current settings.")

print(rules)


Frequent Itemsets:
    support                        itemsets
0       0.2                     (Guacamole)
1       0.2                    (Pita chips)
2       0.4                         (Queso)
3       0.3                     (Ranch dip)
4       0.6                         (Salsa)
5       0.7                (Tortilla chips)
6       0.2     (Guacamole, Tortilla chips)
7       0.3                  (Queso, Salsa)
8       0.4         (Queso, Tortilla chips)
9       0.5         (Salsa, Tortilla chips)
10      0.3  (Queso, Salsa, Tortilla chips)

Association Rules:
Itemset with the highest frequency for frequent 1-itemset: frozenset({'Tortilla chips'})
Itemset with the least frequency for frequent 2-itemset: frozenset({'Guacamole', 'Tortilla chips'})
Consequent support for ['Salsa', 'Tortilla chips']: 0.5
Confidence for the rule ['Queso'] -> ['Salsa', 'Tortilla chips']: 0.7499999999999999
Lift for the rule ['Queso'] -> ['Salsa', 'Tortilla chips']: 1.4999999999999998
                antecede