# ECLAT in python

Eclat stands for **Equivalence Class Clustering and Bottom-Up Lattice Traversal** and it is an algorithm for association rule mining (which also regroups frequent itemset mining).

Association rule mining and frequent itemset mining are easiest to understand in their applications for basket analysis: the goal here is to understand which products are often bought together by shoppers.
These association rules can then be used for example for recommender engines (in case of online shopping) or for store improvement for offline shopping.
The ECLAT algorithm works in a vertical manner just like the **Depth-First Search of a graph**.

## How does the ECLAT algorithm work?
* Step 1 — List the Transaction ID (TID) set of each product
* Step 2 — Filter with minimum support
* Step 3 — Compute the Transaction ID set of each product pair. The interesting thing about the ECLAT algorithm is that this step is done using the Intersection of the two original sets. This makes it different from the Apriori algorithm.
* Step 4 — Filter out the pairs that do not reach minimum support
* Step 5— Continue as long as you can make new pairs above support

The ECLAT algorithm is faster because it is much simpler to identify the intersection of the set of transactions IDs than to scan each individual transaction for the presence of pairs of products (as Apriori does)

In [1]:
!pip install pyECLAT

Collecting pyECLAT
  Downloading pyECLAT-1.0.2-py3-none-any.whl (6.3 kB)
Installing collected packages: pyECLAT
Successfully installed pyECLAT-1.0.2


In [3]:
import pandas as pd

In [2]:
# store the item sets as lists of strings in a list
transactions = [
    ['beer', 'wine', 'cheese'],
    ['beer', 'potato chips'],
    ['eggs', 'flower', 'butter', 'cheese'],
    ['eggs', 'flower', 'butter', 'beer', 'potato chips'],
    ['wine', 'cheese'],
    ['potato chips'],
    ['eggs', 'flower', 'butter', 'wine', 'cheese'],
    ['eggs', 'flower', 'butter', 'beer', 'potato chips'],
    ['wine', 'beer'],
    ['beer', 'potato chips'],
    ['butter', 'eggs'],
    ['beer', 'potato chips'],
    ['flower', 'eggs'],
    ['beer', 'potato chips'],
    ['eggs', 'flower', 'butter', 'wine', 'cheese'],
    ['beer', 'wine', 'potato chips', 'cheese'],
    ['wine', 'cheese'],
    ['beer', 'potato chips'],
    ['wine', 'cheese'],
    ['beer', 'potato chips']
]

In [4]:
# We simply convert the transaction list into a dataframe
data = pd.DataFrame(transactions)
data

Unnamed: 0,0,1,2,3,4
0,beer,wine,cheese,,
1,beer,potato chips,,,
2,eggs,flower,butter,cheese,
3,eggs,flower,butter,beer,potato chips
4,wine,cheese,,,
5,potato chips,,,,
6,eggs,flower,butter,wine,cheese
7,eggs,flower,butter,beer,potato chips
8,wine,beer,,,
9,beer,potato chips,,,


## Setting the parameters for the algorithm and running the algorithm
* **min_support:** this is the support threshold, A small difference is that it is expressed as a percentage here rather than a number.
* **min_n_products:** we need to specify the smallest itemset size that we are interested in. In this case, we are interested in product associations, so we want to leave out individual (1-item) itemsets: the minimum size needs to be 2.
* **max_length:** we would be interested in large product associations as well). Therefore, we take the maximum transaction size.

In [5]:
# we are looking for itemSETS
# we do not want to have any individual products returned
min_n_products = 2

# we want to set min support to 7
# but we have to express it as a percentage
min_support = 7/len(transactions)

# we have no limit on the size of association rules
# so we set it to the longest transaction
max_length = max([len(x) for x in transactions])

In [6]:
from pyECLAT import ECLAT

# create an instance of eclat
my_eclat = ECLAT(data=data, verbose=True)

# fit the algorithm
rule_indices, rule_supports = my_eclat.fit(min_support=min_support,
                                           min_combination=min_n_products,
                                           max_combination=max_length)

100%|███████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 130.89it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<?, ?it/s]
100%|███████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 169.04it/s]


Combination 2 by 2


10it [00:00, 78.10it/s]


Combination 3 by 3


10it [00:00, 90.25it/s]


Combination 4 by 4


5it [00:00, 106.46it/s]


Combination 5 by 5


1it [00:00, 65.81it/s]


In [7]:
print(rule_supports)

{'potato chips & beer': 0.45, 'cheese & wine': 0.35}


# Interpretation
The interpretation of this is that within the transactions of our night store, there are two product combinations that are relatively strong. People often buy **Wine and Cheese together**. People also often buy **Potato Chips and Beer together.** Clearly, it could be a good idea to put those products together so that people can easily get to both of them. Or maybe the shop owner could think about packaging the products in an attractive offer to boost sales of those products even more.