<a href="https://colab.research.google.com/github/aleksanderprofic/Machine-Learning/blob/master/AssociationRuleLearning/Eclat/market_basket_optimisation_eclat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Eclat

## Importing the libraries

In [1]:
!pip install apyori



In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data Preprocessing

In [3]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header=None)
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


### Creating transactions list where every transaction is a list of producs that particular person bought

In [4]:
transactions = [row[row.notnull()].to_list() for _, row in dataset.iterrows()]

## Training the Eclat model on the dataset

In [5]:
from apyori import apriori
# min_support is at least how many times per week items appeared in one transaction - 
#   in this case 0.003 = 3 x 7 / 7501 = times per day x times per week / number of transactions in a week
# min_confidence is how many times if we have product A then we also have product B;
#   as a rule of thumb it is set by default to 0.8 (80% of the time we have product B together with product A), 
#   but if it doesn't give many results we can go lower by half - in this case 0.2 is nice
# min_lift - good idea is to set lift to at least 3
rules = apriori(transactions=transactions, min_support=0.003, min_confidence=0.2, min_lift=3, max_length=2)

## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [6]:
results = list(rules)

### Putting the results well organised into a Pandas DataFrame

In [7]:
def extract_data_from_results(results, take_one_product=True):
    extracted_data = []
    for result in results:
        result_statistics = result.ordered_statistics[0]

        product1          = tuple(result_statistics.items_base)[0]
        product2          = tuple(result_statistics.items_add)[0]
        support           = result.support

        extracted_data.append([product1, product2, support])
    return extracted_data

df = pd.DataFrame(extract_data_from_results(results), columns=['Product 1', 'Product 2', 'Support'])

### Displaying the results sorted by descending supports

In [8]:
df.sort_values(by='Support', axis=0, ascending=False, inplace=True, ignore_index=True)
df

Unnamed: 0,Product 1,Product 2,Support
0,herb & pepper,ground beef,0.015998
1,whole wheat pasta,olive oil,0.007999
2,pasta,escalope,0.005866
3,mushroom cream sauce,escalope,0.005733
4,tomato sauce,ground beef,0.005333
5,pasta,shrimp,0.005066
6,light cream,chicken,0.004533
7,fromage blanc,honey,0.003333
8,light cream,olive oil,0.0032


These are the 8 most frequently bought sets of two products