# what is Eclat:

Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal) is an algorithm used for frequent itemset mining in data science. It is a depth-first search algorithm that discovers frequent itemsets in a transactional dataset. Here's a step-by-step explanation of how Eclat works along with a real-time example problem:

Step 1: Data Preprocessing
First, we need to preprocess the transactional data, i.e., convert the data into a suitable format for analysis. We can represent the transactional data in the form of a binary matrix where each row represents a transaction, and each column represents an item. If an item appears in a transaction, we mark it with a '1'; otherwise, we mark it with a '0.'

Consider the following transactional dataset:

Transaction	Items
T1	A, B
T2	A, C
T3	A, C
T4	A, B, C
T5	B, C
The binary matrix for the above transactional dataset would look like:

Items	T1	T2	T3	T4	T5
A	1	1	1	1	0
B	1	0	0	1	1
C	0	1	1	1	1
Step 2: Finding frequent itemsets
In this step, we find all the frequent itemsets in the dataset. An itemset is considered frequent if it appears in at least a minimum number of transactions (called the support). Eclat uses a depth-first search approach to traverse the itemset lattice to find frequent itemsets.

For example, let's set the minimum support to 3. We start by finding all the frequent 1-itemsets (itemsets with a single item). Since the minimum support is 3, we only consider those items that appear in at least 3 transactions. From the binary matrix, we can see that items A, B, and C appear in at least 3 transactions. Therefore, the frequent 1-itemsets are {A}, {B}, and {C}.

Next, we generate candidate 2-itemsets by combining the frequent 1-itemsets. The generated 2-itemsets are {A,B}, {A,C}, and {B,C}. We then count the support of each 2-itemset by examining the transactions containing both items. For example, {A,B} appears in T1, T4, and T5. Hence, the support of {A,B} is 3. We discard any 2-itemset that does not meet the minimum support threshold. In this case, all 2-itemsets meet the minimum support threshold, so all of them are considered frequent.

We continue generating higher-order itemsets until we can no longer find any frequent itemsets. In this example, we can generate 3-itemsets {A,B,C} by combining {A,B} and {B,C}. Since {A,B,C} appears in at least 3 transactions, it is a frequent itemset.

Step 3: Rule Generation
In this step, we generate association rules from the frequent itemsets. An association rule is an implication of the form X -> Y, where X and Y are itemsets. The rule indicates that if a transaction contains X, it is likely to contain Y as well.

For example, from the frequent item

# Data Preprocessing:

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [4]:
data = pd.read_csv("C:/Users/thiru/Downloads/Resume/Machine Learning A-Z (Codes and Datasets)/Part 5 - Association Rule Learning/Section 29 - Eclat/Python/Market_Basket_Optimisation.csv",header = None)
transactions = []
for i in range(0,7501):
    transactions.append([str(data.values[i,j]) for j in range(0,20)])

# Training the Eclat Model to the whole dataset:

In [5]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2,min_lift = 3, min_length = 2,max_length = 2)

In [6]:
results =list(rules)

In [7]:
results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

In [9]:
def inspect(results):
    lhs = [tuple(result[2][0][0])[0] for result in results ]
    rhs = [tuple(result[2][0][1])[0] for result in results]
    supports = [result[1] for result in results]
    return list(zip(lhs,rhs,supports))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Product 1','Product 2','Support'])

# Displaing the Reults Sorted by Descending Supports

In [11]:
resultsinDataFrame.nlargest(n = 10, columns = 'Support')

Unnamed: 0,Product 1,Product 2,Support
4,herb & pepper,ground beef,0.015998
7,whole wheat pasta,olive oil,0.007999
2,pasta,escalope,0.005866
1,mushroom cream sauce,escalope,0.005733
5,tomato sauce,ground beef,0.005333
8,pasta,shrimp,0.005066
0,light cream,chicken,0.004533
3,fromage blanc,honey,0.003333
6,light cream,olive oil,0.0032
