## Business problem

You are a business owner of a store selling food and you want to offer some deals for your customers. In order to do this you need to identify the best association rules of your products.

Each row corresponds to a transaction made by a customer. Each transaction contains all the items the customer purchased.

In [29]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data preprocessing

We need to convert this to a list of lists i.e, we need a list for every transaction (row of data) and then these need to be within a list. This is done by looping over every row and appending every column together.

In [28]:
df = pd.read_csv('Market_Basket_Optimisation.csv', header=None)

trans = []

for row in range(0, len(df)):
    trans.append([str(df.values[row, col]) for col in range(0, len(df.columns))])

## Training the Apriori model

To get our minimum support we need to do a bit of thinking. What could we define as a 'frequently purchased' product? Perhaps a product that has been purchased at least 3 times in one day?

Recall from our theory lesson that the support is the total number of times M occured, divided by the total observations. So, taking our daily value of 3 and multiplying by 7 (as our data is recorded over a weekly period), our minimum support is

(3 * 7) / 7501 = 0.00279...

Which we can round to 0.003.

For our minimum confidence, we may have to try some different values and see what rules we get. For now lets use 0.2.

For minimum lift, again, this could be trial and error with experience, a good starting point for now is 3 and we can change this.

Our min and max length is how many (min/max) elements we want to have in our rule. Because we are looking at who purchased product B when they purchased product A, we can set both to 2.

In [37]:
from apyori import apriori

rules = apriori(
    transactions = trans, 
    min_support=0.003, 
    min_confidence=0.2, 
    min_lift=3, 
    min_length=2, 
    max_length=2
)

To conclude, <br>
we have defined that our products in the rules appear at least 0.3% of the time, <br>
for product A in the left side of the rule, we will have product B on the right hand side 20% of the time. <br>

## Displaying the first results

So what does this tell us? Lets look at the first relationship. The base product (A) is light cream, then the product that was added (B) was chicken - if people buy light cream, there is a 29% (confidence) they will buy chicken.

This rule appears in 0.45% of transactions (support).

In [38]:
results = list(rules)
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

## Putting the results into a df

In [58]:
def inspect(results):
    lhs = [tuple(result[2][0][0])[0] for result in results]
    rhs = [tuple(result[2][0][1])[0] for result in results]
    support = [result[1] for result in results]
    confidence = [result[2][0][2] for result in results]
    lift = [result[2][0][3] for result in results]
    
    return list(zip(lhs, rhs, support, confidence, lift))

df_results = pd.DataFrame(inspect(results), columns=['Product A', 'Product B', 'Support', 'Confidence', 'Lift'])
df_results.sort_values(by=['Lift'], ascending=False)

Unnamed: 0,Product A,Product B,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471
