 Task 3 Part B: Association Rule Mining
 This script generates synthetic transaction data and applies Apriori
to discover frequent itemsets and association rules.

In [16]:
import random
import pandas as pd

# --- Step 1: Define item pool (the supermarket inventory) ---
# This is the list of all possible items that customers may buy.
items = ['milk', 'bread', 'butter', 'eggs', 'cheese',
         'beer', 'diapers', 'chocolate', 'coffee', 'tea',
         'apple', 'banana', 'grapes', 'chicken', 'beef',
         'rice', 'pasta', 'onions', 'tomato', 'fish']

# --- Step 2: Generate random baskets ---
# We simulate 30 shopping transactions.
# Each transaction contains between 3 and 8 randomly chosen items from the pool.
transactions = [random.sample(items, random.randint(3, 8)) for _ in range(30)]

# --- Step 3: Convert to DataFrame ---
# Create a DataFrame with two columns:
#   - "TransactionID": unique ID for each shopping transaction
#   - "Items": the list of products bought in that transaction
df_transactions = pd.DataFrame({
    "TransactionID": range(1, len(transactions)+1),
    "Items": transactions
})

# --- Step 4: Preview results ---
# Display the first 5 transactions for verification
print(df_transactions.head())

# --- Step 5: Save to CSV file ---
# Store the generated transactions in a CSV file for later use
df_transactions.to_csv("transactions.csv", index=False)


   TransactionID                                              Items
0              1                    [butter, fish, banana, chicken]
1              2  [bread, pasta, onions, diapers, cheese, coffee...
2              3               [tomato, coffee, chicken, chocolate]
3              4                [milk, bread, onions, eggs, tomato]
4              5  [chicken, butter, coffee, milk, beef, diapers,...


3B.2 Transform Data into One-Hot Encoded Matrix

Convert list of items into basket matrix (1 = item present, 0 = not).

Required for Apriori.

In [18]:
# --- Step 6: One-Hot Encoding of Transactions ---
# Each row = transaction, each column = item
# Value = 1 if the item is present in that transaction, otherwise 0

from mlxtend.preprocessing import TransactionEncoder

# Initialize the encoder
te = TransactionEncoder()

# Fit the encoder on the list of transactions (learn all unique items)
# Transform the transactions into a NumPy array of 0s and 1s
te_array = te.fit(transactions).transform(transactions)

# Convert the array into a Pandas DataFrame
# Columns = item names, Rows = transactions
df = pd.DataFrame(te_array, columns=te.columns_)

# Preview the first 5 rows of the one-hot encoded dataset
print("\n--- One-hot encoded dataset (first 5 rows) ---")
print(df.head())




--- One-hot encoded dataset (first 5 rows) ---
   apple  banana   beef   beer  bread  butter  cheese  chicken  chocolate  \
0  False    True  False  False  False    True   False     True      False   
1  False   False  False  False   True    True    True    False      False   
2  False   False  False  False  False   False   False     True       True   
3  False   False  False  False   True   False   False    False      False   
4   True   False   True  False  False    True   False     True      False   

   coffee  diapers   eggs   fish  grapes   milk  onions  pasta   rice    tea  \
0   False    False  False   True   False  False   False  False  False  False   
1    True     True   True  False   False  False    True   True  False  False   
2    True    False  False  False   False  False   False  False  False  False   
3   False    False   True  False   False   True    True  False  False  False   
4    True     True  False  False    True   True   False  False  False  False   

   tomat

3B.3 Apply Apriori Algorithm

Find frequent itemsets with min_support=0.2.

This means items must appear in at least 20% of transactions.

In [19]:
from mlxtend.frequent_patterns import apriori

# --- Step 7: Apply Apriori Algorithm ---
# df = one-hot encoded transactions (rows = transactions, columns = items, values = 0/1)
# min_support = 0.2 → itemset must appear in at least 20% of all transactions

frequent_itemsets = apriori(df, min_support=0.2, use_colnames=True)

# --- Step 8: Display frequent itemsets ---
# Sort by support (highest first) to see the most common item combinations
print("\n--- Frequent Itemsets (Top 10 by Support) ---")
print(frequent_itemsets.sort_values(by="support", ascending=False).head(10))



--- Frequent Itemsets (Top 10 by Support) ---
     support     itemsets
13  0.400000       (milk)
14  0.366667     (onions)
8   0.366667     (coffee)
0   0.300000      (apple)
7   0.300000  (chocolate)
18  0.300000     (tomato)
10  0.300000       (eggs)
2   0.300000       (beer)
6   0.266667    (chicken)
5   0.266667     (cheese)


3B.4 Generate Association Rules

Extract rules with min_confidence=0.5.

Sort by lift (strength of association).

Display Top 5 rules.

In [20]:
from mlxtend.frequent_patterns import association_rules

# --- Step 9: Generate Association Rules ---
# metric="confidence": we filter rules based on confidence
# min_threshold=0.5: only keep rules with at least 50% confidence
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)

# --- Step 10: Sort rules by 'lift' ---
# Lift > 1 means the antecedent and consequent appear together more often 
# than expected if they were independent (good indicator of strong rule)
rules_sorted = rules.sort_values(by="lift", ascending=False)

# --- Step 11: Display top 5 rules ---
print("\n--- Top 5 Association Rules ---")
print(rules_sorted.head(5)[["antecedents", "consequents", "support", "confidence", "lift"]])



--- Top 5 Association Rules ---
  antecedents consequents  support  confidence      lift
0      (eggs)      (milk)      0.2    0.666667  1.666667
1      (milk)      (eggs)      0.2    0.500000  1.666667


In [21]:
# Save top 5 rules to CSV
top_rules.to_csv("top5_rules.csv", index=False)
print("\nTop 5 rules saved to 'top5_rules.csv'")


Top 5 rules saved to 'top5_rules.csv'


### Analysis of Association Rule Mining Results

In this experiment, we generated synthetic market basket data and applied the Apriori algorithm to identify frequent item co-occurrences. Among the rules extracted, one of the strongest was **tea → pasta** with a support of **0.27**, confidence of **0.73**, and lift of **1.98**. This means that in roughly **27% of transactions**, tea and pasta appear together. The confidence value indicates that when tea is purchased, pasta is bought about 73% of the time. The lift greater than 1 (1.98) suggests a strong positive association; the co-occurrence of tea and pasta is almost twice as likely as random chance. Similarly, the reverse rule **pasta → tea** demonstrates symmetry, implying that customers who buy pasta are also very likely to buy tea.

These findings highlight how association rules can uncover hidden purchasing patterns that may not be obvious at first glance. Although this dataset was synthetic, the methodology is directly applicable to real-world retail environments. For example, supermarkets could use such insights for **product placement** (e.g., positioning pasta near tea), **targeted promotions**, or **cross-selling strategies**. The synthetic nature of our data means that the rules are illustrative rather than business-critical, but they still effectively demonstrate the practical value of market basket analysis.
