Market Basket Analysis is a type of frequent itemset mining which analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets”. The discovery of these associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by customers.

With Market Basket Analysis, the buying patterns of the customers are represented using“Association Rules”. The interestingness of a rule is measured using two metrics viz. support and confidence.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Why use Apyori package?


1.   Consists of only one file with no dependencies!
2.   Can be used as API.
3.   Supports JSON output format.



In [2]:
!pip install apyori



The dataset is publicly available on Kaggle:
https://www.kaggle.com/hemanthkumar05/market-basket-optimization

In [3]:
dataset = pd.read_csv("./sample_data/Market_Basket_Optimisation.csv" , header=None)
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


In [4]:
#We need list of list for our apyori package implementation
transcations = []

In [5]:
for i in range(len(dataset)):
    transcations.append([str(dataset.values[i,j]) for j in range(0,20)])

transcations[0]

['shrimp',
 'almonds',
 'avocado',
 'vegetables mix',
 'green grapes',
 'whole weat flour',
 'yams',
 'cottage cheese',
 'energy drink',
 'tomato juice',
 'low fat yogurt',
 'green tea',
 'honey',
 'salad',
 'mineral water',
 'salmon',
 'antioxydant juice',
 'frozen smoothie',
 'spinach',
 'olive oil']

In [6]:
print("Total number of Transcations : ",len(transcations))

Total number of Transcations :  7501


Support : Total percentage of transcations under analysis that show A is bought with B, AKA relative frequency

Confidence : Total percentage of transcations under analysis that show person who bought A has also bought B(just like Amazon's recommendation) AKA Reliability of rule.

For example,
Support = (3x7) / 7501 i.e product buyed at least 3 times 

Confidence = 0.20 i.e 20% 
lift = 3

min_length = 2 i.e atleast 2 different products

In [7]:
from apyori import apriori
rules = apriori(transcations,min_support= 0.003,min_confidence= 0.2,min_lift= 3, min_length = 2)

#### Visualizing Rules now.

In [8]:
results = list(rules)

In [9]:
clean_results = []
for i in range(0, len(results)):
    clean_results.append('RULE: ' + str(results[i][0]) + '   SUPPORT:' + str(results[i][1]) + '   CONFIDENCE:' + str(results[i][2][0][2]) + '   LIFT:' + str(results[i][2][0][3]))

In [10]:
print("Number of rules generated : ", len(clean_results))

Number of rules generated :  160


In [11]:
clean_results[7]

"RULE: frozenset({'olive oil', 'whole wheat pasta'})   SUPPORT:0.007998933475536596   CONFIDENCE:0.2714932126696833   LIFT:4.122410097642296"

It is quite obvious that customer use Olive Oil for making Pasta. As, Adding olive oil to boiling pasta water actually prevents the water from boiling over, it's not meant to keep noodles from sticking together.

There are lots of modern packages and tools for such analysis. I encourage you to explore and apply those packages. Thanks!