# Eclat
@dev: Not exactly the Eclat, but using Apriori and just filtering by support. However, Eclat has a uses a different calculation and results are probably different if using a real Eclat algorithm:

- **Apriori Algorithm**: It works in a two-step process, where it first generates frequent itemsets and then derives association rules from these itemsets. The Apriori algorithm uses a breadth-first search and a tree structure to count the support of itemsets and prunes the tree with the apriori principle, which states that all subsets of a frequent itemset must also be frequent.

- **ECLAT Algorithm**: ECLAT (Equivalence Class Clustering and Bottom-Up Lattice Traversal) uses a depth-first search method. Unlike Apriori, ECLAT builds a Vertical Data Format where for each item, the list of transaction IDs (TIDs) that contain the item is stored. It then performs intersections on these TID sets to find common TIDs, which helps to quickly count the support of itemsets.

TODO: def

### Importing the dataset

In [13]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

**dataset**:
- Each row is a transaction with the products purchased by a customer
- 7,501 transactions in a week

In [14]:
dataset = pd.read_csv("./filez/Market_Basket_Optimisation.csv", header=None, sep=",")
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


### Data preprocessing

In [15]:
# Convert the DataFrame into a list of lists (transactions), excluding NaN values
transactions = dataset.apply(lambda x: x.dropna().tolist(), axis=1).tolist()
for tx in transactions[:5]:
    print(tx)

['shrimp', 'almonds', 'avocado', 'vegetables mix', 'green grapes', 'whole weat flour', 'yams', 'cottage cheese', 'energy drink', 'tomato juice', 'low fat yogurt', 'green tea', 'honey', 'salad', 'mineral water', 'salmon', 'antioxydant juice', 'frozen smoothie', 'spinach', 'olive oil']
['burgers', 'meatballs', 'eggs']
['chutney']
['turkey', 'avocado']
['mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea']


In [16]:
te = TransactionEncoder()

# Transform the data into a one-hot encoded DataFrame
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)
df.head()

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,False,True,True,False,True,False,False,False,False,False,...,False,True,False,False,True,False,False,True,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,True,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False


### Training the Eclat model on the dataset

In [17]:
# @dev: See parameters setting in apriori file

# Step 1: Generate frequent itemsets
frequent_itemsets = apriori(df, min_support=0.003, use_colnames=True, max_len=2)

# Step 2: Generate rules and filter by confidence and lift
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.2)

# Further filter the rules by lift
rules = rules[rules["lift"] >= 3]

### Visualizing the results

In [18]:
# select specific rules
selected_rules = rules[["antecedents", "consequents", "support"]]

# sort rules by lift
sorted_rules = selected_rules.sort_values(by="support", ascending=False)

print(sorted_rules)

                antecedents    consequents   support
188         (herb & pepper)  (ground beef)  0.015998
267     (whole wheat pasta)    (olive oil)  0.007999
139                 (pasta)     (escalope)  0.005866
138  (mushroom cream sauce)     (escalope)  0.005733
198          (tomato sauce)  (ground beef)  0.005333
270                 (pasta)       (shrimp)  0.005066
58            (light cream)      (chicken)  0.004533
166         (fromage blanc)        (honey)  0.003333
211           (light cream)    (olive oil)  0.003200


Influence of "herb & pepper" to "ground beef":
- `Antecedents`: ('herb & pepper') - **IF** part of an association rule. It indicates that the item 'herb & pepper' is present in the transactions we are considering.
- `Consequents`: ('ground beef') - **THEN** part of the rule. It indicates that in the transactions where 'herb & pepper' is present, 'ground beef' also tends to be present.
- `Support`: 0.015998 - Proportion of all transactions that contain both 'herb & pepper' and 'ground beef'. This combination of items appears in about 0.15% of all transactions.