<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Machine%20Learning%20Interview%20Prep%20Questions/Unsupervised%20Learning%20Algorithms/Association%20Rules/Apriori%20Algorithm/apriori_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Apriori Algorithm from Scratch

This notebook implements the **Apriori algorithm** from scratch using pure Python. It’s a classic algorithm used in **association rule mining**, commonly applied in **market basket analysis**, **recommendation systems**, and **retail analytics**.

We'll identify frequent itemsets and generate association rules like:
> If a customer buys bread and butter, they are likely to buy jam.

## Sample Dataset



In [2]:
# Sample transactions (each transaction is a list of items)
transactions = [
    ['milk', 'bread', 'butter'],
    ['bread', 'butter'],
    ['milk', 'bread'],
    ['milk', 'bread', 'butter', 'jam'],
    ['bread', 'jam'],
    ['milk', 'bread', 'jam'],
]

transactions

[['milk', 'bread', 'butter'],
 ['bread', 'butter'],
 ['milk', 'bread'],
 ['milk', 'bread', 'butter', 'jam'],
 ['bread', 'jam'],
 ['milk', 'bread', 'jam']]

## Apriori Core Functions

### Helper: Get Unique Items

In [3]:
def get_unique_items(transactions):
    items = set()
    for tx in transactions:
        items.update(tx)
    return sorted(list(items))

### Generate Candidate Itemsets

In [4]:
from itertools import combinations

def generate_candidates(prev_frequent, k):
    candidates = set()
    prev_items = list(prev_frequent)
    for i in range(len(prev_items)):
        for j in range(i + 1, len(prev_items)):
            combo = sorted(set(prev_items[i]) | set(prev_items[j]))
            if len(combo) == k:
                candidates.add(tuple(combo))
    return candidates

### Calculate Support for Itemsets


In [5]:
def calculate_support(candidates, transactions):
    support_count = {}
    for candidate in candidates:
        count = sum(1 for tx in transactions if set(candidate).issubset(set(tx)))
        support_count[candidate] = count
    return support_count

### Filter Frequent Itemsets



In [6]:
def filter_frequent(support_count, min_support):
    n = len(transactions)
    return {itemset: count / n for itemset, count in support_count.items() if count / n >= min_support}

## Run Apriori Algorithm

In [7]:
def apriori(transactions, min_support=0.4):
    items = get_unique_items(transactions)
    frequent_itemsets = {}
    k = 1

    # Initial 1-itemsets
    candidates = [(item,) for item in items]
    support_count = calculate_support(candidates, transactions)
    Lk = filter_frequent(support_count, min_support)
    frequent_itemsets.update(Lk)

    while Lk:
        k += 1
        candidates = generate_candidates(Lk.keys(), k)
        support_count = calculate_support(candidates, transactions)
        Lk = filter_frequent(support_count, min_support)
        frequent_itemsets.update(Lk)

    return frequent_itemsets

## Example Usage

In [8]:
frequent_itemsets = apriori(transactions, min_support=0.4)

print("Frequent Itemsets (Support ≥ 0.4):")
for itemset, support in frequent_itemsets.items():
    print(f"{itemset}: {support:.2f}")

Frequent Itemsets (Support ≥ 0.4):
('bread',): 1.00
('butter',): 0.50
('jam',): 0.50
('milk',): 0.67
('bread', 'jam'): 0.50
('bread', 'milk'): 0.67
('bread', 'butter'): 0.50


## Association rules

Perfect! Let’s extend the notebook to generate association rules with:


*   **Confidence**: Measures how often rule holds true

*   **Lift**: Measures how much more likely consequent is, given the antecedent


### Association Rules from Frequent Itemsets
We’ll generate rules like:

```
milk → bread   (support=0.67, confidence=1.00, lift=1.50)
```

### Definitions
* Support `(A → B) = P(A ∪ B)`
* Confidence `(A → B) = P(A ∪ B) / P(A)`
* Lift `(A → B) = Confidence(A → B) / P(B)`

### Add Rule Generator





In [9]:
def generate_association_rules(frequent_itemsets, min_confidence=0.6, min_lift=1.0):
    rules = []
    itemsets = list(frequent_itemsets.keys())
    support_lookup = frequent_itemsets

    for itemset in itemsets:
        if len(itemset) < 2:
            continue  # Can't split 1-itemset into rule

        # All possible antecedents (A) and consequents (B)
        for i in range(1, len(itemset)):
            antecedents = combinations(itemset, i)
            for A in antecedents:
                A = tuple(sorted(A))
                B = tuple(sorted(set(itemset) - set(A)))

                support_A = support_lookup.get(A, 0)
                support_AB = support_lookup.get(itemset, 0)
                support_B = support_lookup.get(B, 0)

                if support_A == 0 or support_B == 0:
                    continue

                confidence = support_AB / support_A
                lift = confidence / support_B

                if confidence >= min_confidence and lift >= min_lift:
                    rules.append({
                        'antecedent': A,
                        'consequent': B,
                        'support': round(support_AB, 2),
                        'confidence': round(confidence, 2),
                        'lift': round(lift, 2)
                    })

    return rules

##  Example Usage

In [10]:
rules = generate_association_rules(frequent_itemsets, min_confidence=0.6, min_lift=1.0)

print("Association Rules:")
for r in rules:
    A = ', '.join(r['antecedent'])
    B = ', '.join(r['consequent'])
    print(f"{A} → {B}  (support={r['support']}, confidence={r['confidence']}, lift={r['lift']})")

Association Rules:
jam → bread  (support=0.5, confidence=1.0, lift=1.0)
bread → milk  (support=0.67, confidence=0.67, lift=1.0)
milk → bread  (support=0.67, confidence=1.0, lift=1.0)
butter → bread  (support=0.5, confidence=1.0, lift=1.0)
