# Eclat Algorithm - Association Rule Learning

## What is the Eclat Algorithm?

Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal) is another popular algorithm for association rule learning, similar to Apriori but with a different approach to finding frequent itemsets.

## How Eclat Differs from Apriori:

### Apriori Algorithm:
- Uses a **breadth-first search** approach
- Generates candidate itemsets level by level
- Uses horizontal data layout (transactions as rows)
- Requires multiple database scans

### Eclat Algorithm:
- Uses a **depth-first search** approach
- Uses vertical data layout (items as columns with transaction IDs)
- More memory efficient for sparse datasets
- Faster intersection operations using transaction ID sets

## Key Advantages of Eclat:

1. **Memory Efficiency**: Better performance with sparse data
2. **Faster Execution**: Fewer database scans required
3. **Simple Implementation**: Straightforward intersection-based approach
4. **Scalability**: Works well with large datasets

## Business Application:

Just like Apriori, Eclat helps us discover:
- Which products are frequently bought together
- Customer purchasing patterns
- Cross-selling opportunities
- Optimal product placement strategies

**Note**: In this implementation, we'll actually use the apyori library which implements Apriori, but we'll focus on the support metric (which is Eclat's primary focus) rather than confidence and lift.

## Step 1: Importing the Required Libraries

Before we start implementing the Eclat algorithm, we need to import the necessary libraries:

- **numpy**: For numerical operations and array handling
- **matplotlib.pyplot**: For creating visualizations (though we won't use it extensively here)
- **pandas**: For data manipulation and creating structured DataFrames
- **apyori**: Library containing association rule learning algorithms

### Important Note:
While we're studying the Eclat algorithm conceptually, we'll use the apyori library which implements Apriori. However, we'll focus primarily on the **support metric** and frequent itemset discovery, which aligns with Eclat's main objective.

In [0]:
!pip install apyori



### Installing the Apyori Library

First, let's install the apyori library which contains association rule mining algorithms:

In [0]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### Importing Standard Libraries

Now let's import the essential data science libraries:

## Step 2: Data Preprocessing

In this step, we'll load and prepare our market basket data for the Eclat-style analysis.

### Understanding the Data Format

Our dataset contains market basket transactions where:
- Each row represents a customer's shopping basket
- Each column represents a potential product in that basket
- The data needs to be converted into a transaction format for analysis

### Eclat's Data Structure Preference

While the traditional Eclat algorithm prefers vertical data format (item → list of transaction IDs), we'll adapt our approach:
- Convert CSV data into transaction lists
- Focus on finding frequent itemsets based on support values
- Emphasize the support metric rather than confidence and lift

Let's load and examine our data:

In [0]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
transactions = []
for i in range(0, 7501):
  transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

In [None]:
# Let's first examine the dataset structure
print("Loading market basket data...")
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header=None)
print(f"Dataset shape: {dataset.shape}")
print(f"Total transactions: {len(dataset)}")
print(f"Maximum items per transaction: {dataset.shape[1]}")

print("\nFirst few transactions:")
for i in range(3):
    items = [item for item in dataset.iloc[i] if pd.notna(item)]
    print(f"Transaction {i+1}: {items}")

print(f"\nSample of all columns in first transaction:")
print(dataset.iloc[0].tolist())

### Converting to Transaction Format

Now we need to convert our data into a format suitable for association rule mining. We'll create a list where each element represents a transaction (shopping basket):

In [None]:
# Convert data to transaction format (removing null values)
transactions = []
for i in range(0, len(dataset)):
    # Extract non-null items from each row
    transaction = [str(dataset.values[i,j]) for j in range(0, dataset.shape[1]) if str(dataset.values[i,j]) != 'nan']
    if transaction:  # Only add non-empty transactions
        transactions.append(transaction)

print(f"Successfully converted {len(transactions)} transactions")
print(f"Average items per transaction: {sum(len(t) for t in transactions)/len(transactions):.2f}")

print("\nSample converted transactions:")
for i in range(3):
    print(f"Transaction {i+1}: {transactions[i]}")

# Get some statistics about item frequency
from collections import Counter
all_items = [item for transaction in transactions for item in transaction]
item_counts = Counter(all_items)
print(f"\nTotal unique products: {len(item_counts)}")
print(f"Most frequent products:")
for item, count in item_counts.most_common(5):
    print(f"  - {item}: {count} transactions ({count/len(transactions)*100:.1f}%)")

## Step 3: Applying Eclat-Style Analysis

Now we'll apply association rule mining with a focus on the **support metric**, which is the primary concern of the Eclat algorithm.

### Understanding Support in Eclat Context

In Eclat algorithm, **support** is the key metric:
- **Support** = (Number of transactions containing itemset) / (Total number of transactions)
- Eclat focuses on finding **frequent itemsets** based on minimum support threshold
- Unlike Apriori's breadth-first approach, Eclat uses depth-first search through itemset space

### Parameter Selection for Eclat-Style Analysis

We'll use conservative parameters to focus on the most significant itemset relationships:
- **min_support = 0.003**: Itemsets must appear in at least 0.3% of transactions (~23 transactions)
- We'll extract and prioritize results by support values rather than confidence/lift
- Focus on itemset frequency rather than rule strength

In [0]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

### Running the Association Rule Mining

Let's apply the algorithm to find frequent itemsets:

In [None]:
print("Applying association rule mining with Eclat-style focus on support...")
print(f"Analyzing {len(transactions)} transactions...")

# Apply the algorithm with focus on support (Eclat's primary metric)
rules = apriori(
    transactions=transactions, 
    min_support=0.003,      # Minimum support threshold (key Eclat parameter)
    min_confidence=0.2,     # Secondary parameter
    min_lift=3,             # Secondary parameter
    min_length=2,           # Focus on item pairs
    max_length=2            # Keep it simple with pairs
)

print("Analysis completed!")
print("Extracting frequent itemsets and their support values...")

## Step 4: Analyzing Frequent Itemsets (Eclat Focus)

In the Eclat algorithm, the primary goal is to discover **frequent itemsets** based on their support values. Let's extract and analyze these itemsets with a focus on support rather than rule strength.

### What We're Looking For:

- **Frequent itemsets**: Product combinations that appear together frequently
- **Support values**: How often these combinations occur in our dataset
- **Itemset patterns**: Which products have strong co-occurrence relationships

This aligns with Eclat's core objective of efficiently finding frequent patterns in transactional data.

### Displaying the first results coming directly from the output of the apriori function

In [0]:
results = list(rules)

In [None]:
# Convert results to list for analysis
results = list(rules)
print(f"Found {len(results)} frequent itemsets/rules")

if len(results) > 0:
    print("\nSample result structure:")
    print("First itemset:", results[0])
    print("\nItems in first result:", results[0].items)
    print("Support of first result:", results[0].support)
else:
    print("No frequent itemsets found with current parameters.")
    print("Consider lowering the min_support threshold.")

In [0]:
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

In [None]:
# Display detailed information about frequent itemsets
if len(results) > 0:
    print("FREQUENT ITEMSETS ANALYSIS (Eclat Focus)")
    print("=" * 50)
    
    for i, result in enumerate(results[:10]):  # Show first 10 results
        items_list = list(result.items)
        print(f"\nItemset {i+1}:")
        print(f"  Products: {items_list}")
        print(f"  Support: {result.support:.4f} ({result.support*len(transactions):.0f} transactions)")
        print(f"  Frequency: {result.support*100:.2f}% of all transactions")
        
        if result.support >= 0.01:
            frequency_desc = "Very Frequent"
        elif result.support >= 0.005:
            frequency_desc = "Frequent"
        else:
            frequency_desc = "Moderately Frequent"
            
        print(f"  Classification: {frequency_desc}")
    
    print(f"\nSummary of all {len(results)} frequent itemsets:")
    support_values = [r.support for r in results]
    print(f"Average support: {np.mean(support_values):.4f}")
    print(f"Highest support: {max(support_values):.4f}")
    print(f"Lowest support: {min(support_values):.4f}")
    
else:
    print("No frequent itemsets to display.")

results

### Creating a Structured DataFrame for Eclat Results

Now let's organize our frequent itemsets into a clean DataFrame format, focusing on the support values which are the core output of the Eclat algorithm.

#### DataFrame Structure:
- **Product 1**: First item in the frequent itemset
- **Product 2**: Second item in the frequent itemset  
- **Support**: The frequency with which these items appear together (Eclat's primary metric)

This format makes it easy to identify the most frequent item combinations and their co-occurrence patterns.

In [0]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    return list(zip(lhs, rhs, supports))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Product 1', 'Product 2', 'Support'])

In [None]:
# Enhanced function to extract frequent itemsets with error handling
def inspect(results):
    """
    Extract itemset information focusing on support values (Eclat's main concern)
    Returns lists of Product 1, Product 2, and Support values
    """
    if not results:
        return [], [], []
    
    product1 = []
    product2 = []
    supports = []
    
    for result in results:
        # Extract support value
        supports.append(result.support)
        
        # Extract the itemset (should be pairs for this analysis)
        items = list(result.items)
        if len(items) >= 2:
            product1.append(items[0])
            product2.append(items[1])
        else:
            # Handle single items by showing them in both columns
            product1.append(items[0] if items else "Unknown")
            product2.append("(single item)" if items else "Unknown")
    
    return list(zip(product1, product2, supports))

# Create the Eclat-focused DataFrame
if len(results) > 0:
    resultsinDataFrame = pd.DataFrame(
        inspect(results), 
        columns=['Product 1', 'Product 2', 'Support']
    )
    print(f"Created DataFrame with {len(resultsinDataFrame)} frequent itemsets")
    print("\nDataFrame preview:")
    print(resultsinDataFrame.head())
    
    # Add some statistics
    print(f"\nSupport Statistics:")
    print(f"Mean support: {resultsinDataFrame['Support'].mean():.4f}")
    print(f"Median support: {resultsinDataFrame['Support'].median():.4f}")
    print(f"Standard deviation: {resultsinDataFrame['Support'].std():.4f}")
    
else:
    print("No results to create DataFrame")
    resultsinDataFrame = pd.DataFrame(columns=['Product 1', 'Product 2', 'Support'])

### Top Frequent Itemsets by Support (Eclat's Primary Output)

In the Eclat algorithm, itemsets are typically ranked by their **support values** since support represents the frequency of co-occurrence, which is the algorithm's main focus.

#### Why Support Matters in Eclat:
- **High Support = High Frequency**: Itemsets with higher support appear together more often
- **Business Relevance**: Most frequent combinations represent the strongest purchasing patterns
- **Eclat's Goal**: Find the most frequent itemsets efficiently using vertical data representation

Let's examine the top itemsets ranked by support:

In [0]:
resultsinDataFrame.nlargest(n = 10, columns = 'Support')

Unnamed: 0,Product 1,Product 2,Support
4,herb & pepper,ground beef,0.015998
7,whole wheat pasta,olive oil,0.007999
2,pasta,escalope,0.005866
1,mushroom cream sauce,escalope,0.005733
5,tomato sauce,ground beef,0.005333
8,pasta,shrimp,0.005066
0,light cream,chicken,0.004533
3,fromage blanc,honey,0.003333
6,light cream,olive oil,0.0032


In [None]:
# Display top frequent itemsets with detailed Eclat-focused analysis
if not resultsinDataFrame.empty:
    print("TOP 10 FREQUENT ITEMSETS BY SUPPORT (Eclat Focus)")
    print("=" * 55)
    
    # Sort by support (Eclat's primary ranking criterion)
    top_itemsets = resultsinDataFrame.nlargest(n=10, columns='Support')
    
    # Detailed analysis for each top itemset
    for idx, (index, itemset) in enumerate(top_itemsets.iterrows(), 1):
        print(f"\n{idx}. Itemset: ['{itemset['Product 1']}', '{itemset['Product 2']}']")
        print(f"   Support: {itemset['Support']:.4f}")
        print(f"   Frequency: {itemset['Support']*100:.2f}% of all transactions")
        print(f"   Absolute Count: ~{itemset['Support']*len(transactions):.0f} transactions")
        
        # Categorize support level
        if itemset['Support'] >= 0.01:
            category = "Very High Frequency"
            business_action = "Core product pairing - prioritize for bundling"
        elif itemset['Support'] >= 0.007:
            category = "High Frequency"
            business_action = "Strong candidate for cross-selling"
        elif itemset['Support'] >= 0.005:
            category = "Moderate Frequency"
            business_action = "Consider for promotional campaigns"
        else:
            category = "Low-Moderate Frequency"
            business_action = "Monitor for seasonal trends"
            
        print(f"   Category: {category}")
        print(f"   Business Action: {business_action}")
    
    print(f"\n" + "="*55)
    print("ECLAT ALGORITHM SUMMARY")
    print(f"Total frequent itemsets found: {len(resultsinDataFrame)}")
    print(f"Support range: {resultsinDataFrame['Support'].min():.4f} - {resultsinDataFrame['Support'].max():.4f}")
    print(f"Average support: {resultsinDataFrame['Support'].mean():.4f}")
    
    # Show the DataFrame
    print(f"\nTop 10 Itemsets (sorted by Support):")
    display(top_itemsets)
    
else:
    print("No frequent itemsets found to analyze.")
    print("Consider lowering the min_support threshold to find more itemsets.")

## Step 5: Eclat Algorithm Insights and Business Applications

### Key Differences: Eclat vs. Apriori

Now that we've completed our analysis, let's understand how the Eclat approach differs from Apriori:

#### Eclat Algorithm Characteristics:
1. **Primary Focus**: Finding frequent itemsets based on support values
2. **Data Structure**: Prefers vertical data format (item → transaction list)
3. **Search Strategy**: Depth-first search through itemset lattice
4. **Efficiency**: Better performance on sparse datasets
5. **Memory Usage**: More memory-efficient for certain dataset types

#### Apriori Algorithm Characteristics:
1. **Primary Focus**: Comprehensive rule generation with confidence and lift
2. **Data Structure**: Uses horizontal data format (transaction → item list)
3. **Search Strategy**: Breadth-first search level by level
4. **Rule Emphasis**: Strong focus on rule quality metrics
5. **Comprehensive Output**: Support, confidence, and lift for decision making

### Business Applications of Eclat Results:

#### High Support Itemsets (Top Results):
- **Product Placement**: Place frequently bought together items near each other
- **Inventory Management**: Ensure adequate stock levels for item pairs
- **Store Layout**: Design store sections based on item co-occurrence patterns

#### Medium Support Itemsets:
- **Promotional Campaigns**: Create targeted promotions for moderate frequency pairs
- **Seasonal Analysis**: Monitor how support values change over time
- **Customer Segmentation**: Identify different purchasing patterns

### When to Use Eclat vs. Apriori:

#### Choose Eclat When:
- You primarily need frequent itemset discovery
- Working with sparse datasets
- Memory efficiency is important
- Support-based ranking is sufficient
- Quick itemset identification is the goal

#### Choose Apriori When:
- You need comprehensive rule analysis
- Confidence and lift metrics are important
- Building recommendation systems
- Need detailed rule interpretation
- Business requires rule strength assessment

### Technical Implementation Notes:

In this notebook, we used the apyori library (which implements Apriori) but focused on the support metric to simulate Eclat's approach. In a pure Eclat implementation:

1. **Data would be vertically formatted**: Each item maps to its transaction IDs
2. **Intersection operations**: Find common transaction IDs between items
3. **Support calculation**: Count intersections divided by total transactions
4. **Depth-first traversal**: Recursively explore itemset combinations

This approach demonstrates the conceptual differences while providing practical insights for business decision-making.