<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Machine%20Learning%20Interview%20Prep%20Questions/Unsupervised%20Learning%20Algorithms/Association%20Rules/Equivalence%20Class%20Clustering%20and%20bottom-up%20Lattice%20Traversal/eclat_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ECLAT Algorithm from Scratch (with Python)

**ECLAT** (Equivalence Class Clustering and bottom-up Lattice Traversal) is an algorithm used for:
- Frequent itemset mining
- Association rule learning (pre-rule generation)

Unlike Apriori (which uses horizontal data format), ECLAT uses **vertical data format**:
- Items → Set of transaction IDs (TIDs)
- Frequent itemsets are found via **TID-set intersections**

## Sample Transactions

In [1]:
transactions = [
    ['milk', 'bread', 'butter'],
    ['bread', 'butter'],
    ['milk', 'bread'],
    ['milk', 'bread', 'butter', 'jam'],
    ['bread', 'jam'],
    ['milk', 'bread', 'jam']
]

## Convert to Vertical Format

In [2]:
from collections import defaultdict

def vertical_format(transactions):
    vertical_db = defaultdict(set)
    for tid, transaction in enumerate(transactions):
        for item in transaction:
            vertical_db[item].add(tid)
    return vertical_db

## Recursive ECLAT Function

In [3]:
def eclat(prefix, items, min_support, frequent_itemsets, total_transactions):
    while items:
        item, tids = items.pop()
        support = len(tids) / total_transactions

        if support >= min_support:
            new_itemset = prefix + [item]
            frequent_itemsets[tuple(new_itemset)] = round(support, 2)

            # Generate new suffixes by intersecting TID sets
            suffixes = []
            for other_item, other_tids in items:
                intersection = tids & other_tids
                if intersection:
                    suffixes.append((other_item, intersection))

            eclat(new_itemset, suffixes, min_support, frequent_itemsets, total_transactions)

## Run ECLAT

In [4]:
vertical_db = vertical_format(transactions)
min_support = 0.4
frequent_itemsets = {}

items = sorted(vertical_db.items())
eclat([], items, min_support, frequent_itemsets, len(transactions))

print("Frequent Itemsets (Support ≥ 0.4):")
for itemset, support in frequent_itemsets.items():
    print(f"{itemset}: {support}")

Frequent Itemsets (Support ≥ 0.4):
('milk',): 0.67
('milk', 'bread'): 0.67
('jam',): 0.5
('jam', 'bread'): 0.5
('butter',): 0.5
('butter', 'bread'): 0.5
('bread',): 1.0


## Summary
* ECLAT is efficient for dense datasets with many frequent itemsets
* Uses TID set intersections instead of scanning transactions repeatedly
* Easier to optimize with set operations in memory