# 6.3.2 Eclat Algorithm
## Introduction

The Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a popular method for frequent itemset mining in large transactional databases. Unlike the Apriori algorithm, which uses a breadth-first search strategy, Eclat uses a depth-first search strategy to discover frequent itemsets.

### Key Points:
- **Depth-First Search**: Eclat uses a depth-first search approach, making it efficient for mining large datasets with many transactions.
- **Intersection-based**: The algorithm relies on the intersection of transaction IDs (TID) lists to find frequent itemsets.
- **Memory Usage**: Eclat can be more memory-intensive compared to Apriori because it requires keeping track of transaction IDs for each itemset.

___
___
### Readings:
- [The Eclat algorithm](https://readmedium.com/en/https:/towardsdatascience.com/the-eclat-algorithm-8ae3276d2d17)
- [The Eclat Algorithm (pdf)](https://www.philippe-fournier-viger.com/COURSES/Pattern_mining/Eclat.pdf)
___
___

## Scenarios where the Eclat Algorithm is Beneficial

Eclat is particularly useful in scenarios where:
- **Large Datasets**: The dataset contains a large number of transactions, making depth-first search more efficient.
- **Sparse Data**: The data is sparse, meaning that there are many items but only a few transactions contain a particular item.
- **Frequent Itemsets Needed**: The goal is to find frequent itemsets without generating candidate sets iteratively as in Apriori.

## Methods for Implementing the Eclat Algorithm

The Eclat algorithm can be implemented using the following steps:

1. **Transform the Dataset**: Convert the transactional database into a vertical data format, where each item is associated with a list of transaction IDs.
2. **Depth-First Search**: Perform a depth-first search on the vertical data to find frequent itemsets by intersecting TID lists.
3. **Pruning**: Use support thresholds to prune infrequent itemsets and optimize the search process.

In [1]:
import pandas as pd
from collections import defaultdict

In [2]:
# Load the dataset
groceries = pd.read_csv("Groceries_dataset.csv")
print(groceries.shape)
print(groceries.head())

(38765, 3)
   Member_number        Date   itemDescription
0           1808  21-07-2015    tropical fruit
1           2552  05-01-2015        whole milk
2           2300  19-09-2015         pip fruit
3           1187  12-12-2015  other vegetables
4           3037  01-02-2015        whole milk


In [3]:
# Get all the transactions as a list of lists
all_transactions = [transaction[1]['itemDescription'].tolist() for transaction in list(groceries.groupby(['Member_number', 'Date']))]

In [6]:
# Create a vertical data format
vertical_format = defaultdict(set)
for tid, transaction in enumerate(all_transactions):
    for item in transaction:
        vertical_format[item].add(tid)

In [7]:
# Function to find frequent itemsets using Eclat
def eclat(prefix, items, min_support, vertical_data):
    if len(items) == 0:
        return []
    
    itemsets = []
    while items:
        item = items.pop()
        new_prefix = prefix + [item]
        itemsets.append((new_prefix, len(vertical_data[item])))
        
        new_items = []
        new_vertical_data = {}
        for other_item in items:
            intersection = vertical_data[item] & vertical_data[other_item]
            if len(intersection) >= min_support:
                new_items.append(other_item)
                new_vertical_data[other_item] = intersection
        
        itemsets.extend(eclat(new_prefix, new_items, min_support, new_vertical_data))
    
    return itemsets

In [8]:
# Set minimum support
min_support = 50

# Run Eclat algorithm
frequent_itemsets = eclat([], list(vertical_format.keys()), min_support, vertical_format)
frequent_itemsets = [itemset for itemset in frequent_itemsets if len(itemset[0]) > 1]

# Display frequent itemsets
print("Frequent Itemsets:")
for itemset in frequent_itemsets:
    print(itemset)

Frequent Itemsets:
(['pork', 'other vegetables'], 59)
(['pork', 'rolls/buns'], 51)
(['pork', 'whole milk'], 75)
(['fruit/vegetable juice', 'rolls/buns'], 56)
(['fruit/vegetable juice', 'whole milk'], 66)
(['brown bread', 'rolls/buns'], 50)
(['brown bread', 'whole milk'], 67)
(['citrus fruit', 'other vegetables'], 72)
(['citrus fruit', 'rolls/buns'], 70)
(['citrus fruit', 'soda'], 56)
(['citrus fruit', 'yogurt'], 69)
(['citrus fruit', 'whole milk'], 107)
(['coffee', 'whole milk'], 57)
(['newspapers', 'other vegetables'], 55)
(['newspapers', 'whole milk'], 84)
(['domestic eggs', 'other vegetables'], 53)
(['domestic eggs', 'rolls/buns'], 51)
(['domestic eggs', 'whole milk'], 79)
(['bottled beer', 'other vegetables'], 70)
(['bottled beer', 'rolls/buns'], 60)
(['bottled beer', 'yogurt'], 51)
(['bottled beer', 'whole milk'], 107)
(['bottled beer', 'sausage'], 50)
(['chicken', 'whole milk'], 51)
(['bottled water', 'root vegetables'], 52)
(['bottled water', 'tropical fruit'], 53)
(['bottled wa

### Conclusion

The Eclat algorithm is an efficient method for mining frequent itemsets in large and sparse transactional databases. By using a depth-first search strategy and intersection of transaction ID lists, Eclat can quickly identify frequent itemsets without the need for iterative candidate generation. This makes it particularly useful for applications requiring frequent pattern discovery in large datasets, such as market basket analysis and web usage mining. The provided Python implementation demonstrates how to apply the Eclat algorithm to a sample dataset, showcasing the identification of frequent itemsets based on a specified minimum support threshold.
