<a href="https://colab.research.google.com/github/chelynl/Machine_Learning/blob/main/14_Eclat_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Association Analysis: Eclat**
- Equivalence Class Clustering and bottom-up Lattice Traversal
- Unsupervised, works in a vertical manner 
- Simpler, faster, and more scalable than Apriori
- However, **use Apriori** for **deep analysis** of your market basket
- Only use support for model evaluation and look at "sets" rather than individual items
- Determine how often a set of 2 items or more occur (support)

### Eclat Algorithm:
1. Set a minimum support.
2. Take all the subsets in transactions having higher support than minimum value
3. Sort these subsets by decreasing support (strongest combo at top)

### Advantages over Apriori algorithm:
- Memory Requirements: Since ECLAT algorithm uses a Depth-First Search approach, it uses less memory than Apriori algorithm.
- Speed: ECLAT algorithm is typically faster than the Apriori algorithm.
- Number of Computations: ECLAT algorithm does not involve the repeated scanning of the data to compute the individual support values.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/chelynl/ML_notes/main/association_rule_learning/Market_Basket_Optimisation.csv?token=AMGO4ME5MM5XMQHQVFFPXXTA2IRSS', header = None )
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [None]:
# Get list of transaction lists excluding nan values
transactions = [[str(df.values[i,j]) for j in range(0, df.shape[1]) if str(df.values[i,j]) != 'nan'] for i in range(0, df.shape[0])]

In [None]:
# Get list of unique items
items = []

for transaction in transactions:
  for item in transaction:
    if item.strip() not in items:
      items.append(item)

In [None]:
len(items)

119

In [None]:
# Generate a list of item pairs with relevant support value
# [[(item_a, item_b) , support_value], ...]
# support_value is initialized to 0 for all pairs
eclat = []

for i in range(len(items)):
  for j in range(i+1, len(items)):
    eclat.append([(items[i], items[j]), 0])

In [None]:
# Compute support value for each pair by looking for transactions with both items
for p in eclat:
  for t in transactions:
    # check if both items from pair are in the transaction
    if (p[0][0] in t) and (p[0][1] in t):
      # add 1 to support
      p[1] += 1
  p[1] = p[1]/len(transactions)

In [None]:
# Converts eclat in sorted DataFrame to be visualized in variable explorer
eclat_df = pd.DataFrame(eclat, columns = ['rule', 'support']).sort_values(by = 'support', ascending = False)
eclat_df

Unnamed: 0,rule,support
1580,"(mineral water, spaghetti)",0.059725
1585,"(mineral water, chocolate)",0.052660
1568,"(mineral water, eggs)",0.050927
1571,"(mineral water, milk)",0.047994
1607,"(mineral water, ground beef)",0.040928
...,...,...
4796,"(rice, dessert wine)",0.000000
5881,"(hot dogs, hand protein bar)",0.000000
641,"(whole weat flour, strong cheese)",0.000000
4797,"(rice, flax seed)",0.000000
