# Market Basket Analysis

I will work with data from a Retail company that sells groceries.

The **goal** of this project is to provide support to the decision-making process of the company, by finding the best deals of 2 products that it should provide to their customers (based on some transaction data).

The deals will be in the form of: "Buy the *product A* and get the *product B* for free".

## Import the Libraries

In [1]:
import numpy as np
import pandas as pd

In [2]:
!pip install apyori



In [3]:
from apyori import apriori

## Get the data

The data includes transaction informations.
* Each row of the data corresponds to the products that were bought per transaction.
* There are 7500 different transactions in total.

In [12]:
dataset = pd.read_csv('data/Market_Basket.csv',header=None)
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


## Data Preprocessing

In [13]:
#create a list of lists of transactions
transactions = []
for i in range(dataset.shape[0]):
    transactions.append([str(dataset.iloc[i,j]) for j in range(dataset.shape[1])])

## Train the Association Rule Learning model

In [5]:
rules = apriori(transactions= transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

## Visualize the results

Display the first results coming directly from the output of the apriori function

In [6]:
results = list(rules) # h fora (A --> B) fainetai ws exis: A = items_base kai B = items_add 
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

As we can see, the results are not presented in a readable way.

Thus, for convenience, we will create a DataFrame with them.

In [7]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results] #get the product from the left size of the rule
    rhs         = [tuple(result[2][0][1])[0] for result in results] #get the product from the right size of the rule
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

In [8]:
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


Finally, let's display the results sorted by descending lifts

In [9]:
resultsinDataFrame.sort_values(by = 'Lift',ascending=False)

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471


Overall, the Retail company should create the deals based on these combinations (1 deal per combination), since these products are highly associated.

E.g: Buy *fromage blanc* and get *honey* for free!