## Eclat

* Like a simplified version of Apriori
* In Eclat model, we only have **support**.
    * We **not** consider rules, but **sets of products**.

**Support**
* for the 'M' and 'L' does **not** mean one item, but **a set** of items.
    * How often does the set occurs in the transactions.
        * For example, how often does the watchlist contains both movies (a set of two)?
<img src='../../resources/association_rule/eclat/support.png' />


**Steps:**
1. Set a minimum support
2. Take all the subsets in transactions having higher support than minimum support.
3. Sort these subsets by decreasing support.

<hr />

### Implementing Eclat

In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Data Preparation

In [11]:
df = pd.read_csv('data/market_basket_optimisation.csv', header=None)

In [12]:
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,


In [13]:
transactions = []

for i in range(0, 7501):
    transactions.append([str(df.values[i, j]) for j in range(0, 20)])

### Training Eclat model

In [14]:
from apyori import apriori

# Despite Eclat model deals with only support
# But keep the rest for stronger correlations

# As we only consider support, so not rules but sets of products.
# We can change the max_length for larger sets of products
rules = apriori(
    transactions=transactions,
    min_support=round((3*7)/(7501), 3),
    min_confidence=0.2,
    min_lift=3,
    min_length=2, max_length=2
)

### Visualizations

In [15]:
results = list(rules)

In [16]:
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

In [17]:
# No rules, so no LHS & RHS

def inspect(results):
    lhs = [tuple(result[2][0][0])[0] for result in results]
    rhs = [tuple(result[2][0][1])[0] for result in results]
    supports = [result[1] for result in results]
    return list(zip(lhs, rhs, supports))

resultsDataFrame = pd.DataFrame(inspect(results), columns=['Product 1', 'Product 2', 'Support'])

### Results by Sorting Support

In [19]:
resultsDataFrame.sort_values(by=['Support'], ascending=False)

Unnamed: 0,Product 1,Product 2,Support
4,herb & pepper,ground beef,0.015998
7,whole wheat pasta,olive oil,0.007999
2,pasta,escalope,0.005866
1,mushroom cream sauce,escalope,0.005733
5,tomato sauce,ground beef,0.005333
8,pasta,shrimp,0.005066
0,light cream,chicken,0.004533
3,fromage blanc,honey,0.003333
6,light cream,olive oil,0.0032
