# Eclat Algorithm

## Main Task
> Identify the best deals to maximize the chance that customers will get the deals. (Buy the product X and get the product Y for free, this can optimize the sales and the profit).

### Data Understanding  

**1.0. What is the domain area of the dataset?**  
The dataset *Market_Basket_Optimisation.csv* contains information about people who have visited a shopping centrum. 

**2.0. Which data format?**  
The dataset is in *csv* format!  

**2.1. Do the files have headers or another file describing the data?**  
The files does have headers that describes the data! Each column has a name that describes the data it contains!  

**2.2. Are the data values separated by commas, semicolon, or tabs?**  
The data values are separated by commas!  
Example: 
*burgers,meatballs,eggs*

**3.0 How many features and how many observations does the dataset have?**  
The dataset has:  
* 5 features or columns!
* 200 observations or rows!  

**4.0 Does it contain numerical features? How many?**  
No

**5.0. Does it contain categorical features?  How many?**  
No, but they can be categorized.

In [5]:
# In case apyori module is not installed on your machine!
# !pip install apyor

In [6]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori

### Data Pre-processing

In [7]:
dataset = pd.read_csv("../Dataset/Market_Basket_Optimisation.csv")

In [8]:
dataset.head()

Unnamed: 0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
1,chutney,,,,,,,,,,,,,,,,,,,
2,turkey,avocado,,,,,,,,,,,,,,,,,,
3,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
4,low fat yogurt,,,,,,,,,,,,,,,,,,,


In [9]:
print(f"Number of features in the dataset is {dataset.shape[1]} and the number of observations/rows in the dataset is {dataset.shape[0]}")

Number of features in the dataset is 20 and the number of observations/rows in the dataset is 7500


In [10]:
## For using eclat, we need to convert it into list format.
## transactions = [['apple','almonds'],['apple'],['banana','apple']]....

transactions = []
for i in range(0, len(dataset)):
    temp = []
    for j in range(0, 20):
        tempStr = str(dataset.values[i, j]) #dataset.values[rows, columns]
        if(tempStr != 'nan'):
            temp.append(tempStr)
    transactions.append(temp)

In [11]:
transactions[0]

['burgers', 'meatballs', 'eggs']

### Training the Eclat Model on the dataset

The Eclat algorithm is a depth-first search algorithm used for frequent itemset mining. Frequent itemset mining is a technique for finding sets of items that appear together frequently in a dataset.  
This is often used in market basket analysis to find items that are commonly bought together.

The Eclat algorithm works by using a vertical dataset layout, where each transaction is represented as a set of items.  
The algorithm starts with individual items and extends them to larger itemsets as long as they meet a minimum support threshold.

Here are the steps we took to implement the Eclat algorithm:

1. Convert Transactions to Dictionary: We first converted the transactions into a dictionary format where the keys are transaction IDs and the values are the items in each transaction.

2. Generate Itemsets: We then generated a list of all individual items along with their transaction identifiers (TIDs). This was done using the dict_to_list function.

3. Find Frequent Itemsets: We used the find_frequent_itemsets function to find the frequent itemsets in the transactions.  
    This function uses the Eclat algorithm to find itemsets that appear in at least a minimum number of transactions (the min_support).

4. Iterate and Intersect: The Eclat algorithm iterates over each pair of items, intersects their TID sets to find the TID sets of the 2-itemsets, and checks the support of each 2-itemset.  
    If the support is above the minimum support, the itemset is kept. This process is repeated for 3-itemsets, 4-itemsets, etc., until no more itemsets can be generated.

5. Yield Frequent Itemsets: The find_frequent_itemsets function returns a generator that yields the frequent itemsets and their supports. These can be used to identify patterns in the transactions.

In [12]:
from collections import defaultdict

def eclat(prefix, items, dataset, min_support):
    while items:
        i, itids = items.pop()
        isupp = len(itids)
        if isupp >= min_support:
            yield (frozenset(prefix + [i]), isupp)
            suffix = []
            for j, jtids in items:
                jtids = jtids & itids
                if len(jtids) >= min_support:
                    suffix.append((j, jtids))
            yield from eclat(prefix + [i], sorted(suffix, key=lambda item: len(item[1]), reverse=True), dataset, min_support)

def dict_to_list(dataset):
    tid_list = defaultdict(set)
    for tid, items in dataset.items():
        for item in items:
            tid_list[item].add(tid)
    return list(tid_list.items())

def find_frequent_itemsets(dataset, min_support):
    dataset = dict_to_list(dataset)
    dataset.sort(key=lambda item: len(item[1]), reverse=True)
    yield from eclat([], dataset, dataset, min_support)

# Convert your transactions into a dictionary format
transactions_dict = {i: transaction for i, transaction in enumerate(transactions)}

# Define the minimum support
min_support = 2

# Find the frequent itemsets
frequent_itemsets = list(find_frequent_itemsets(transactions_dict, min_support))

# Print the frequent itemsets
for itemset, support in frequent_itemsets:
    print(f"Itemset: {set(itemset)}, Support: {support}")

Itemset: {'water spray'}, Support: 3
Itemset: {'water spray', 'shrimp'}, Support: 2
Itemset: {'napkins'}, Support: 5
Itemset: {'napkins', 'herb & pepper'}, Support: 2
Itemset: {'napkins', 'herb & pepper', 'ground beef'}, Support: 2
Itemset: {'napkins', 'grated cheese'}, Support: 2
Itemset: {'napkins', 'low fat yogurt'}, Support: 2
Itemset: {'napkins', 'low fat yogurt', 'spaghetti'}, Support: 2
Itemset: {'napkins', 'ground beef'}, Support: 2
Itemset: {'napkins', 'spaghetti'}, Support: 2
Itemset: {'napkins', 'mineral water'}, Support: 2
Itemset: {'cream'}, Support: 7
Itemset: {'herb & pepper', 'cream'}, Support: 2
Itemset: {'herb & pepper', 'cream', 'pancakes'}, Support: 2
Itemset: {'herb & pepper', 'cream', 'pancakes', 'ground beef'}, Support: 2
Itemset: {'pancakes', 'cream', 'ground beef', 'herb & pepper', 'spaghetti'}, Support: 2
Itemset: {'herb & pepper', 'cream', 'pancakes', 'spaghetti'}, Support: 2
Itemset: {'herb & pepper', 'cream', 'ground beef'}, Support: 2
Itemset: {'herb & pep