# Eclat

Eclat stands for _Equivalence Class Clustering and Bottom-up Lattice Traversal_. While the full form may sound complicated, Eclat is a more efficient, scalable, and simple version of Apriori. It only uses support to determine important rules, where the support is equal to the fraction of instances containing the specified charactersistics (X, Y ...) over total instances possible.

$$\large support(\text{X, Y ...}) = \frac{\text{# of Instances containing X, Y...}}{\text{# of Total Instances}} $$

If the support value for a set containing X & Y is 1 *(75%)*, 75% of all subsets possible will contain both X and Y.
<hr>

## Code (Personal Implementation)

__Setting up the Dataset:__

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('market_basket_optimization.csv')

<hr>

__Powerset Generation Algorithm:__ 
_(Testing With Item Sets of 2)_

In [2]:
#Total number of entries.
entry_total = len(dataset.index)

# Contains a list of locations for where every item appeared.
locations = dict()

# - Dataset Preprocessing
for index in dataset.index:
    # Filters all non-item values in each entry.
    entry = dataset.values[index]
    entry = [element for element in entry if isinstance(element, str)]
    
    # Adds element location to corresponding entry.
    for element in entry:
        list_ = locations.get(element)
        
        if list_ is None:
            locations.update({element : set([index])})
        else:
            list_.add(index)

# Stores the probability of each item pair occuring.
probabilities = dict()
# Stores all possible items.
items = list(locations.keys())

# - Finding Item Pair Probabilities
for item_index in range(len(items)):
    item = items[item_index]
    for pair_index in range(item_index + 1, len(items)):
        pair = items[pair_index]
        
        # Checks the probability of each item pair occuring and stores it within the dictionary.
        probabilities.update({tuple([item, pair]) : 
                              len(locations.get(item).intersection(locations.get(pair))) / entry_total})

# Sorts the probabilities list in terms of most likely pair to occur.
probabilities = {key: value for key, value in sorted(probabilities.items(), key=lambda item: item[1], reverse=True)}
        
probabilities

{('mineral water', 'spaghetti'): 0.05973333333333333,
 ('mineral water', 'chocolate'): 0.05266666666666667,
 ('eggs', 'mineral water'): 0.05093333333333333,
 ('mineral water', 'milk'): 0.048,
 ('mineral water', 'ground beef'): 0.040933333333333335,
 ('spaghetti', 'chocolate'): 0.0392,
 ('spaghetti', 'ground beef'): 0.0392,
 ('eggs', 'spaghetti'): 0.036533333333333334,
 ('eggs', 'french fries'): 0.0364,
 ('mineral water', 'frozen vegetables'): 0.03573333333333333,
 ('milk', 'spaghetti'): 0.03546666666666667,
 ('french fries', 'chocolate'): 0.0344,
 ('mineral water', 'french fries'): 0.03373333333333333,
 ('mineral water', 'pancakes'): 0.03373333333333333,
 ('eggs', 'chocolate'): 0.0332,
 ('milk', 'chocolate'): 0.03213333333333333,
 ('mineral water', 'green tea'): 0.030933333333333334,
 ('eggs', 'milk'): 0.0308,
 ('burgers', 'eggs'): 0.0288,
 ('green tea', 'french fries'): 0.028533333333333334,
 ('frozen vegetables', 'spaghetti'): 0.027866666666666668,
 ('french fries', 'spaghetti'): 0.0