## 🛒 Market Basket Analysis with Apriori Algorithm

### 📋 Introduction

This notebook demonstrates how to perform **Market Basket Analysis** using the **Apriori Algorithm** on a real-world transactional dataset from a grocery store.

Each row in the dataset represents a **transaction**, and each transaction contains a list of items purchased together. The goal is to identify **frequent itemsets** and generate **association rules** that reveal interesting relationships between products.

For example:
> If a customer buys `Bread` and `Milk`, they might also buy `Butter`.

---

### 🎯 Objectives

- Transform raw transactional data into a format suitable for association rule mining.
- Use the Apriori algorithm to identify frequent itemsets.
- Extract meaningful association rules using metrics like **support**, **confidence**, and **lift**.
- Sort and filter rules based on their strength and significance.
- Understand how these insights can be used in retail for product bundling, cross-selling, and store layout optimization.


In [1]:
# 📦 Step 1: Import libraries
import os 
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [2]:
# 📄 Step 2: Define transactions

base_dir = os.path.abspath(os.path.join(os.getcwd(),"..", "..", "..", "data", "02_Unsupervised_Learning", "03_Association_Rules"))
file_path = os.path.join(base_dir, "GroceryStoreDataSet.csv")
df = pd.read_csv(file_path , header=None)
transactions = df[0].apply(lambda x: x.split(',')).tolist()
transactions

[['MILK', 'BREAD', 'BISCUIT'],
 ['BREAD', 'MILK', 'BISCUIT', 'CORNFLAKES'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['JAM', 'MAGGI', 'BREAD', 'MILK'],
 ['MAGGI', 'TEA', 'BISCUIT'],
 ['BREAD', 'TEA', 'BOURNVITA'],
 ['MAGGI', 'TEA', 'CORNFLAKES'],
 ['MAGGI', 'BREAD', 'TEA', 'BISCUIT'],
 ['JAM', 'MAGGI', 'BREAD', 'TEA'],
 ['BREAD', 'MILK'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'COCK', 'BISCUIT', 'CORNFLAKES'],
 ['COFFEE', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'COCK'],
 ['BREAD', 'SUGER', 'BISCUIT'],
 ['COFFEE', 'SUGER', 'CORNFLAKES'],
 ['BREAD', 'SUGER', 'BOURNVITA'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['BREAD', 'COFFEE', 'SUGER'],
 ['TEA', 'MILK', 'COFFEE', 'CORNFLAKES']]

In [3]:
# 🧼 Step 3: Transform data

# Convert transaction list into a one-hot encoded DataFrame
te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_array, columns=te.columns_)

print("🔍 One-hot encoded data:")
print(df)

🔍 One-hot encoded data:
    BISCUIT  BOURNVITA  BREAD   COCK  COFFEE  CORNFLAKES    JAM  MAGGI   MILK  \
0      True      False   True  False   False       False  False  False   True   
1      True      False   True  False   False        True  False  False   True   
2     False       True   True  False   False       False  False  False  False   
3     False      False   True  False   False       False   True   True   True   
4      True      False  False  False   False       False  False   True  False   
5     False       True   True  False   False       False  False  False  False   
6     False      False  False  False   False        True  False   True  False   
7      True      False   True  False   False       False  False   True  False   
8     False      False   True  False   False       False   True   True  False   
9     False      False   True  False   False       False  False  False   True   
10     True      False  False   True    True        True  False  False  False   
11  

In [17]:
# 📊 Step 4: Generate frequent itemsets
# ----------------------------

# Set a minimum support threshold (e.g., 0.4 = 40% of transactions)
frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)

print("\n📈 Frequent itemsets (support ≥ 0.9):")
print(frequent_itemsets)


📈 Frequent itemsets (support ≥ 0.9):
    support                             itemsets
0      0.35                            (BISCUIT)
1      0.20                          (BOURNVITA)
2      0.65                              (BREAD)
3      0.15                               (COCK)
4      0.40                             (COFFEE)
5      0.30                         (CORNFLAKES)
6      0.10                                (JAM)
7      0.25                              (MAGGI)
8      0.25                               (MILK)
9      0.30                              (SUGER)
10     0.35                                (TEA)
11     0.20                     (BISCUIT, BREAD)
12     0.10                      (COCK, BISCUIT)
13     0.10                    (COFFEE, BISCUIT)
14     0.15                (CORNFLAKES, BISCUIT)
15     0.10                     (MAGGI, BISCUIT)
16     0.10                      (MILK, BISCUIT)
17     0.10                       (TEA, BISCUIT)
18     0.15                   (

In [18]:
# 🔗 Step 5: Generate association rules

# Generate rules from frequent itemsets
# Use confidence ≥ 0.6 and lift ≥ 1.0
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.6)
rules = rules[rules['lift'] >= 1.0]

print("\n📋 Association Rules (confidence ≥ 0.6 and lift ≥ 1.0):")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])


📋 Association Rules (confidence ≥ 0.6 and lift ≥ 1.0):
                      antecedents                    consequents  support  \
0                          (COCK)                      (BISCUIT)     0.10   
1                     (BOURNVITA)                        (BREAD)     0.15   
2                           (JAM)                        (BREAD)     0.10   
4                          (MILK)                        (BREAD)     0.20   
5                         (SUGER)                        (BREAD)     0.20   
6                          (COCK)                       (COFFEE)     0.15   
7                          (COCK)                   (CORNFLAKES)     0.10   
8                    (CORNFLAKES)                       (COFFEE)     0.20   
9                         (SUGER)                       (COFFEE)     0.20   
10                          (JAM)                        (MAGGI)     0.10   
11                        (MAGGI)                          (TEA)     0.20   
12                (M

In [19]:
# 📊 Optional: Sort by lift
# ----------------------------

rules_sorted = rules.sort_values(by='lift', ascending=False)

print("\n🏆 Top rules by lift:")
print(rules_sorted[['antecedents', 'consequents', 'support', 'confidence', 'lift']])


🏆 Top rules by lift:
                      antecedents                    consequents  support  \
40             (COCK, CORNFLAKES)              (COFFEE, BISCUIT)     0.10   
44              (COFFEE, BISCUIT)             (COCK, CORNFLAKES)     0.10   
15              (COFFEE, BISCUIT)                         (COCK)     0.10   
16                         (COCK)              (COFFEE, BISCUIT)     0.10   
45                         (COCK)  (COFFEE, CORNFLAKES, BISCUIT)     0.10   
38  (COFFEE, CORNFLAKES, BISCUIT)                         (COCK)     0.10   
29                 (MAGGI, BREAD)                          (JAM)     0.10   
31                          (JAM)                 (MAGGI, BREAD)     0.10   
43                (COCK, BISCUIT)           (COFFEE, CORNFLAKES)     0.10   
20                         (COCK)          (CORNFLAKES, BISCUIT)     0.10   
42                 (COCK, COFFEE)          (CORNFLAKES, BISCUIT)     0.10   
41          (CORNFLAKES, BISCUIT)                 (COC

# ✅ Summary
 - We prepared transactional data
 - Applied the Apriori algorithm to extract frequent itemsets
 - Generated association rules using confidence and lift thresholds
 - Interpreted meaningful patterns for decision-making (e.g., product bundling)
