<a href="https://colab.research.google.com/github/NainaniJatinZ/MachineLearningRepo/blob/main/AssociationRuleLearning/ARL_eclat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Association Rule Learning: Eclat
Equivalence class clustering and bottom up lattice transversal algorithm

--> While the Apriori algorithm works in a horizontal sense imitating the Breadth-First Search of a graph, the ECLAT algorithm works in a vertical manner just like the Depth-First Search of a graph.

--> works with support of pairs of transaction ids (tidsets).

--> Eclat algorithm is used to generate frequent item sets in a database

![picture](https://d1zx6djv3kb1v7.cloudfront.net/wp-content/media/2020/03/Eclat-Algorithm-1-i2tutorials-284x300.png)

Stores data in a vertical format: (lhs is apriori, rhs is eclat)

![picture](https://d1zx6djv3kb1v7.cloudfront.net/wp-content/media/2020/03/Eclat-Algorithm-2-i2tutorials-300x225.jpg)

- Tidsets are used to calculate the value of Support of a dataset and also avoiding the generation of subsets which does not exist in the prefix tree. 
- In the first call of function, all single items or data are used along with their respective tidsets. 
- Then the function is called recursively, in each recursive call, each item in tidsets pair is verified and combined with other item in tidsets pairs. 
- This process is repeated until no candidate item in tidsets pairs can be combined.
- each depth we use superposition of transactions in a lexicographic ordering.

## Advantages over Apriori 

1. Memory Requirements: Since the ECLAT algorithm uses a Depth-First Search approach, it uses less memory than Apriori algorithm.
2. Speed: The ECLAT algorithm is typically faster than the Apriori algorithm.
3. Number of Computations: The ECLAT algorithm does not involve the repeated scanning of the data to compute the individual support values.


# References:

--> General Information
- https://www.i2tutorials.com/machine-learning-tutorial/eclat-algorithm/


--> Solved problem to understand the working:
- https://youtu.be/IwbnylEzp0w


--> More depth into prefix tree using graphs and bit vectors
- https://youtu.be/ecPEXnZQok0

# Code

Link to dataset: https://drive.google.com/file/d/16wlKvgyHvsXU96rLd-j2WHrN52thrp7-/view?usp=sharing

In [3]:
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
!pip install apyori

Collecting apyori
  Downloading https://files.pythonhosted.org/packages/5e/62/5ffde5c473ea4b033490617ec5caa80d59804875ad3c3c57c0976533a21a/apyori-1.1.2.tar.gz
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-cp37-none-any.whl size=5975 sha256=6fdf98c6755b38c42c7ad394245c44582341f8ec0bb7a814b3dad1481da601fd
  Stored in directory: /root/.cache/pip/wheels/5d/92/bb/474bbadbc8c0062b9eb168f69982a0443263f8ab1711a8cad0
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [5]:
# importing libraries 
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data Preprocessing 

In [7]:
#loading the dataset (no header)
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)

#creating a list of transactions from dataframe
transactions = []
#print(len(dataset.index))
# print(len(dataset.columns))

# all elements in list for apyori must be str
for i in range(0, len(dataset.index)):
  transactions.append([str(dataset.values[i,j]) for j in range(0, len(dataset.columns))])



## Training model on Dataset

In [8]:
# assumung we wanted at least 3 transactions per week, minsup = 3*7/7501
# rule of thumb is to start with 0.8 and keep dividing by 2 till you get desirable number of rules 
# lift less than 3 aren't that relevant in most cases
# min len and max len = 2 --> (product A-> product B) --> depends on probelme

from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.0027, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

## Visualising Results 

## Direct results

In [9]:
ec_results = list(rules)
ec_results 

[RelationRecord(items=frozenset({'chicken', 'extra dark chocolate'}), support=0.0027996267164378083, ordered_statistics=[OrderedStatistic(items_base=frozenset({'extra dark chocolate'}), items_add=frozenset({'chicken'}), confidence=0.23333333333333334, lift=3.8894074074074076)]),
 RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'esca

## Putting results in a pd frame

In [12]:
def inspect(results):   #conf and support are not needed 
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    # confidences = [result[2][0][2] for result in results]
    # lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports))
resultsinDataFrame = pd.DataFrame(inspect(ec_results), columns = ['Product A', 'Product B', 'Support'])

In [13]:
resultsinDataFrame


Unnamed: 0,Product A,Product B,Support
0,extra dark chocolate,chicken,0.0028
1,light cream,chicken,0.004533
2,mushroom cream sauce,escalope,0.005733
3,pasta,escalope,0.005866
4,fromage blanc,honey,0.003333
5,herb & pepper,ground beef,0.015998
6,tomato sauce,ground beef,0.005333
7,light cream,olive oil,0.0032
8,whole wheat pasta,olive oil,0.007999
9,pasta,shrimp,0.005066


## Sorted Final Results

In [14]:

resultsinDataFrame.nlargest(10, "Support")

Unnamed: 0,Product A,Product B,Support
5,herb & pepper,ground beef,0.015998
8,whole wheat pasta,olive oil,0.007999
3,pasta,escalope,0.005866
2,mushroom cream sauce,escalope,0.005733
6,tomato sauce,ground beef,0.005333
9,pasta,shrimp,0.005066
1,light cream,chicken,0.004533
4,fromage blanc,honey,0.003333
7,light cream,olive oil,0.0032
0,extra dark chocolate,chicken,0.0028
