## Find association rules between data using ECLAT algorithm. 

ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a data mining 
algorithm for association rule mining while solving customer’s basket analysis problem: the goal is to 
understand which products are often bought together. The ECLAT algorithm can’t be applied to the 
data represented in horizontal format, and you have to convert it into the vertical format before using 
it.


In [18]:
pip install pyECLAT


Note: you may need to restart the kernel to use updated packages.


# Importing and exploring the dataset
We will use the built-in dataset available in the pyECLAT module. Let us first import the pyECLAT module and the build-in dataset.

In [19]:
# importing dataset ( example 1 are datasets in pyECLAT)
from pyECLAT import Example1

# storing the dataset in a variable
dataset = Example1().get()

# printing the dataset
dataset.head()

Unnamed: 0,0,1,2,3
0,milk,beer,bread,butter
1,coffe,bread,butter,
2,coffe,bread,butter,
3,milk,coffe,bread,butter
4,beer,,,


Each row represents a customer’s purchase at a supermarket in this dataset. For example, in row 1, the customer purchased only milk, beer, bread, butter.
Let’s get more information about the dataset by printing more details.

In [20]:
# printing the info
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       10 non-null     object
 1   1       5 non-null      object
 2   2       4 non-null      object
 3   3       2 non-null      object
dtypes: object(4)
memory usage: 448.0+ bytes


# Visualizing the frequent items
To visualize the frequent items, let’s load the dataset to the ECLAT class and generate binary DataFrame:

In [21]:
# importing the ECLAT module
from pyECLAT import ECLAT

# loading transactions DataFrame to ECLAT class
eclat = ECLAT(data=dataset)

# DataFrame of binary values
eclat.df_bin

Unnamed: 0,milk,coffe,bread,rice,butter,beer,bean
0,1,0,1,0,1,1,0
1,0,1,1,0,1,0,0
2,0,1,1,0,1,0,0
3,1,1,1,0,1,0,0
4,0,0,0,0,0,1,0
5,0,0,0,0,1,0,0
6,0,0,1,0,0,0,0
7,0,0,0,0,0,0,1
8,0,0,0,1,0,0,1
9,0,0,0,1,0,0,0


In this binary dataset, every row represents a transaction. Columns are possible products that might appear in every transaction. Every cell contains one of two possible values:

0 – the product was not included in the transaction

1 – the transaction contains the product

Now, we need to count items for every column in the DataFrame:

In [22]:
# count items in each column
items_total = eclat.df_bin.astype(int).sum(axis=0)
items_total

milk      2
coffe     3
bread     5
rice      2
butter    5
beer      2
bean      2
dtype: int64

In [23]:
# count items in each row
items_per_transaction = eclat.df_bin.astype(int).sum(axis=1)

items_per_transaction

0    4
1    3
2    3
3    4
4    1
5    1
6    1
7    1
8    2
9    1
dtype: int64

In [24]:
import pandas as pd

# Loading items per column stats to the DataFrame
df = pd.DataFrame({'items': items_total.index, 'transactions': items_total.values}) 

# cloning pandas DataFrame for visualization purpose  
df_table = df.sort_values("transactions", ascending=False)

#  Top 5 most popular products/items
df_table.head(5).style.background_gradient(cmap='Blues')

Unnamed: 0,items,transactions
2,bread,5
4,butter,5
1,coffe,3
0,milk,2
3,rice,2


In [25]:
# importing required module
import plotly.express as px

# to have a same origin
df_table["all"] = "Tree Map" 

# creating tree map using plotly
fig = px.treemap(df_table.head(50), path=['all', "items"], values='transactions',
                  color=df_table["transactions"].head(50), hover_data=['items'],
                  color_continuous_scale='Blues',
                )
# ploting the treemap
fig.show()

# Generating association rules
To generate association rules, we need to define:

Minimum support – should be provided as a percentage of the overall items from the dataset

Minumum combinations – the minimum amount of items in the transaction

Maximum combinations – the minimum amount of items in the transaction

Note: the higher the value of the maximum combinations the longer the calculation will take.

In [26]:
# the item shoud appear at least at 5% of transactions
min_support = 5/100

# start from transactions containing at least 2 items
min_combination = 2

# up to maximum items per transaction
max_combination = max(items_per_transaction)

rule_indices, rule_supports = eclat.fit(min_support=min_support,
                                                 min_combination=min_combination,
                                                 max_combination=max_combination,
                                                 separator=' & ',
                                                 verbose=True)

Combination 2 by 2


21it [00:00, 126.77it/s]


Combination 3 by 3


35it [00:00, 166.92it/s]


Combination 4 by 4


35it [00:00, 157.37it/s]


In [27]:
import pandas as pd

result = pd.DataFrame(rule_supports.items(),columns=['Item', 'Support'])
result.sort_values(by=['Support'], ascending=False)

Unnamed: 0,Item,Support
6,bread & butter,0.4
4,coffe & bread,0.3
5,coffe & butter,0.3
15,coffe & bread & butter,0.3
12,milk & bread & butter,0.2
2,milk & butter,0.2
1,milk & bread,0.2
0,milk & coffe,0.1
17,milk & coffe & bread & butter,0.1
16,bread & butter & beer,0.1
