### Dataset

The market basket optimizations dataset has rows that have the transactions by customers for a week (all the items that people purchased).

The goal is to generate the best recommendations to customers in order to find deals, i.e if a person buys something, then suggest another product that is likely to be purchased (and make an offer over there).

## Eclat

### Import the libs

In [7]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### Data preprocessing

In [8]:
dataset_path = "../../../../datasets/ml_az_course/009_Market_Basket_optimisation.csv"
# the csv don't have column names
df = pd.read_csv(dataset_path, header=None)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,


In [9]:
transactions = list()

for i in range(df.shape[0]):
    transactions.append([str(df.values[i, j]) for j in range(0, 20)])

In [10]:
transactions[:1]

[['shrimp',
  'almonds',
  'avocado',
  'vegetables mix',
  'green grapes',
  'whole weat flour',
  'yams',
  'cottage cheese',
  'energy drink',
  'tomato juice',
  'low fat yogurt',
  'green tea',
  'honey',
  'salad',
  'mineral water',
  'salmon',
  'antioxydant juice',
  'frozen smoothie',
  'spinach',
  'olive oil']]

### Training the Apriori model on the dataset

In [11]:
from apyori import apriori

In [12]:
rules = apriori(
    transactions=transactions,
    min_support=0.003,
    min_confidence=0.2,
    min_lift=3,
    min_length=2, # amount of items to consider to build the association rule
    max_length=2,
)

In [13]:
rules

<generator object apriori at 0x7f42a893d820>

### Visualising the results

#### Displaying the first results coming directly from the output of the apriori function

In [14]:
results = list(rules)
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

#### Putting the results well organized into a dataframe

In [15]:
def inspect(results):
    lhs = list()
    rhs = list()
    supports = list()

    for result in results:
        data_result = result[2][0]
    
        lhs.append(tuple(data_result[0])[0])
        rhs.append(tuple(data_result[1])[0])
        supports.append(result[1])

    return list(zip(lhs, rhs, supports))

#### Displaying the first results non sorted

In [16]:
results_df = pd.DataFrame(
    inspect(results), columns=["product_1", "product_2", "support"]
)
results_df

Unnamed: 0,product_1,product_2,support
0,light cream,chicken,0.004533
1,mushroom cream sauce,escalope,0.005733
2,pasta,escalope,0.005866
3,fromage blanc,honey,0.003333
4,herb & pepper,ground beef,0.015998
5,tomato sauce,ground beef,0.005333
6,light cream,olive oil,0.0032
7,whole wheat pasta,olive oil,0.007999
8,pasta,shrimp,0.005066


#### Displaying the results sorted by descending lifts

In [17]:
results_df.sort_values(by="support", ascending=False)

Unnamed: 0,product_1,product_2,support
4,herb & pepper,ground beef,0.015998
7,whole wheat pasta,olive oil,0.007999
2,pasta,escalope,0.005866
1,mushroom cream sauce,escalope,0.005733
5,tomato sauce,ground beef,0.005333
8,pasta,shrimp,0.005066
0,light cream,chicken,0.004533
3,fromage blanc,honey,0.003333
6,light cream,olive oil,0.0032


**Note**

At the end, the apriori and eclat algorithms are the same, the difference is the output, to order the sets in eclat we only use the support metric.