# Programming Assignment #4 

## Association rule mining 

The market basket transactions dataset (transactions_data.txt)contains list of items purchased by customer in each transaction.

- load the transaction dataset file
- use minimum support = 0.2 and use_colname=True in apriori method 
- select metric as confidence in association rules
- use minimum threshold = 0.5

Ex: If the minimum support is 0.4, the metric is confidence and minimum threshold is 0.5 then some of the outputs are: 
- the least frequency of frequent 1-itemset is ['Queso'].
- the support, confidence, and lift of rule, ['Queso'] -> ['Tortilla chips'] are:
  - consequent support = 0.7
  - support = 0.4
  - confidence = 1.00
  - lift = 1.42

In [8]:
!pip install mlxtend

Collecting mlxtend
  Downloading mlxtend-0.23.4-py3-none-any.whl.metadata (7.3 kB)
Downloading mlxtend-0.23.4-py3-none-any.whl (1.4 MB)
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   ----------------------- ---------------- 0.8/1.4 MB 5.3 MB/s eta 0:00:01
   ---------------------------------------- 1.4/1.4 MB 4.0 MB/s eta 0:00:00
Installing collected packages: mlxtend
Successfully installed mlxtend-0.23.4


In [9]:
# Import the packages 
import numpy as np

In [10]:
#load the transactions dataset 
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Loading the data
def load_dataset(path_to_data):
    transactions = []
    with open(path_to_data, 'r') as fid:
        for lines in fid:
            transaction = lines.strip().split(',')
            transactions.append(transaction)
    return transactions

path_to_data = "transactions_data.txt"  
dataset = load_dataset(path_to_data)
dataset

[['Lime', 'Queso', 'Salsa', 'Salt', 'Tortilla chips'],
 ['Ranch dip', 'Salsa', 'Tortilla chips'],
 ['Queso', 'Tortilla chips'],
 ['Potato chips', 'Ranch dip'],
 ['Salsa', 'Tortilla chips'],
 ['Queso', 'Salsa', 'Tortilla chips'],
 ['Pita chips', 'Ranch dip'],
 ['Guacamole', 'Tortilla chips'],
 ['Guacamole', 'Queso', 'Salsa', 'Tortilla chips'],
 ['Pita chips', 'Salsa']]

In [11]:
# Transform the data to a format suitable for the apriori function
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Apply the apriori algorithm
frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)  
print("Frequent Itemsets:")
print(frequent_itemsets)

# Generate the association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
    support                        itemsets
0       0.2                     (Guacamole)
1       0.2                    (Pita chips)
2       0.4                         (Queso)
3       0.3                     (Ranch dip)
4       0.6                         (Salsa)
5       0.7                (Tortilla chips)
6       0.2     (Guacamole, Tortilla chips)
7       0.3                  (Salsa, Queso)
8       0.4         (Tortilla chips, Queso)
9       0.5         (Salsa, Tortilla chips)
10      0.3  (Salsa, Tortilla chips, Queso)

Association Rules:
                antecedents              consequents  antecedent support  \
0               (Guacamole)         (Tortilla chips)                 0.2   
1                   (Salsa)                  (Queso)                 0.6   
2                   (Queso)                  (Salsa)                 0.4   
3          (Tortilla chips)                  (Queso)                 0.7   
4                   (Queso)         (Tortilla chips) 

In [12]:
# Find least frequent 1-itemset
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
itemsets_1 = frequent_itemsets[frequent_itemsets['length'] == 1].copy()
itemsets_1_sorted = itemsets_1.sort_values('support')
least_frequent = itemsets_1_sorted.iloc[0]
least_frequent_item = list(least_frequent['itemsets'])[0]

print(f"\nThe least frequency of frequent 1-itemset is ['{least_frequent_item}']")

# Find specific rule metrics
for idx, rule in rules.iterrows():
    antecedent_list = list(rule['antecedents'])
    consequent_list = list(rule['consequents'])
    
    if antecedent_list == [least_frequent_item] and consequent_list == ['Tortilla chips']:
        print(f"\nThe support, confidence, and lift of rule, ['{least_frequent_item}'] -> ['Tortilla chips'] are:")
        print(f"  • consequent support = {rule['consequent support']:.1f}")
        print(f"  • support = {rule['support']:.1f}")
        print(f"  • confidence = {rule['confidence']:.2f}")
        print(f"  • lift = {rule['lift']:.2f}")
        break


The least frequency of frequent 1-itemset is ['Guacamole']

The support, confidence, and lift of rule, ['Guacamole'] -> ['Tortilla chips'] are:
  • consequent support = 0.7
  • support = 0.2
  • confidence = 1.00
  • lift = 1.43
