In [None]:

# Eclat (Equivalence Class Clustering and Bottom-Up Lattice Traversal) is an algorithm used for frequent itemset mining in data mining 
# and association rule learning. It differs from the Apriori algorithm in its approach and methodology.
# Here’s an overview of how Eclat works and its key characteristics:

# Methodology:
# Vertical Data Structure:

# Eclat uses a vertical data format (also known as Transaction ID or tidset intersection) rather than a horizontal format used by Apriori.
# In this format, each itemset is associated with a list of transactions (TID) where it appears.
# Depth-First Search (DFS):

# Eclat employs a depth-first search strategy to explore itemsets and their intersections efficiently. 
# It does not generate candidate itemsets explicitly like Apriori does. Instead, it recursively combines itemsets based on their tidsets.
# Transaction ID Intersection:

# To find frequent itemsets, Eclat intersects tidsets (lists of transactions where each itemset appears) of itemsets.
# It combines itemsets that share common transactions iteratively to form larger frequent itemsets.
# Lattice Traversal:

# Eclat uses a bottom-up approach to traverse the lattice of itemsets.
#     It starts with frequent itemsets of size 1 and recursively combines them to find larger frequent itemsets until
#     no more frequent itemsets can be found.
# Advantages:
# Efficiency: Eclat is generally more efficient than Apriori, especially for datasets with a large number of transactions
# but a smaller number of distinct items. This efficiency comes from its vertical data structure and depth-first search approach,
# which can reduce memory usage and computation time.

# Memory Usage: The use of tidsets allows Eclat to be memory-efficient because it avoids generating and storing large numbers of candidate itemsets.

# Implementation Considerations:
# Complexity: Eclat can be slightly more complex to implement compared to Apriori due to its use of tidsets and recursive DFS approach.
# However, well-implemented libraries and algorithms exist to facilitate its use.

# Scalability: Eclat is well-suited for scalability to larger datasets due to its efficient use of memory and computational resources.

# Use Cases:
# Eclat is commonly used in applications where the dataset is characterized by a large number of transactions (e.g., retail sales, web clickstream data) 
# and where memory efficiency and scalability are important considerations.
# In summary, Eclat is a powerful algorithm for frequent itemset mining that leverages a vertical data structure 
#     and depth-first search strategy to efficiently discover frequent itemsets in large datasets.
#     Its approach differs significantly from Apriori, offering advantages in terms of efficiency and scalability under certain conditions.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
transactions = []
for i in range(0, 7501):
  transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

In [3]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

In [4]:
results = list(rules)

In [5]:
results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

In [6]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    return list(zip(lhs, rhs, supports))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Product 1', 'Product 2', 'Support'])

In [7]:
resultsinDataFrame.nlargest(n = 10, columns = 'Support')

Unnamed: 0,Product 1,Product 2,Support
4,herb & pepper,ground beef,0.015998
7,whole wheat pasta,olive oil,0.007999
2,pasta,escalope,0.005866
1,mushroom cream sauce,escalope,0.005733
5,tomato sauce,ground beef,0.005333
8,pasta,shrimp,0.005066
0,light cream,chicken,0.004533
3,fromage blanc,honey,0.003333
6,light cream,olive oil,0.0032
