Market basket analysis (MBA), also known as association-rule mining, is a useful method of discovering customer
purchasing patterns by extracting associations or co-occurrences from stores' transactional databases (Chen et al., 2005). 
It is a modelling technique based upon the theory that if you buy a certain group of items, 
you are more (or less) likely to buy another group of items

For example, if you are in a FMCG Retail Store  and you buy a loaf of Bread, you are more likely to buy a packet of Butter at the same time than somebody who didn't buy the Bread. Another example, if you are buying a XiaoMi Power Bank in an online store, you are more likely to also buy a carrying case to go with the power bank. Amazon knows this well from 
the transaction data of its millions of customers and thus recommends a case to you as seen below:


The set of items a customer buys is known as an itemset, and MBA tries to identify relationships from the purchases of itemset. The output of MBA consists of a series of product association rules. From the transaction data extracted from the shopping carts of online retailers or the point of sales system of retail stores, we can use MBA to extract interesting association rules between products. 
For example, if customers buy product A they also tend to buy product B.

In this example, if customers buy Bread they also tend to buy Butter. Some people often link products with 
high association to "complementary goods". In Economics 101, complementary good or service is consumed or used in conjunction with another good or service. Usually, the complementary good has little to no value when consumed alone, but when combined with another good or service, it adds to the overall value of the offering. For example a car and petrol. It would be of little value to buy petrol without owning a car. Complementary goods often have a negative cross-price elasticity of demand coefficient (Farnham, 2014). However, it is worth pointing out that, while complementary goods tend to have high association, not all products with high association rules are complementary goods. In MBA, we are more interested in product-pairs with high association rules i.e. products that are frequently purchased together. For example, in a retail store, MBA findings may show that Barbie dolls and candy are frequently purchased together, even though they are not technically complementary goods. In short, complementary goods are fairly obvious and common sense, but MBA seeks to uncover product associations that may not be so obvious and straighforward. In doing so, it is attempting to convert the abstract consumer tastes and preferences into association rules that are more insightful 
and actionable, from business perspective.

Case Study
For simplicity we are analyzing only 2 items – Bread and Butter. We want to know if there is any evidence that suggests that buying Bread leads to buying Butter.

Problem Statament: Is the purchase of Bread leads to the purchase of Butter?

Hypothesis: There is significant evidence to show that buying Bread leads to buying Butter.

Bread => Butter

Antecedent => Consequent

Let's take the example of a FMCG Retailer  which generates 1,000 transactions monthly, of which Bread was purchased in 150 transactions, Butter in 130 transactions, and both together in 50 transactions.

In set theory it can be represented as Bread only – 100, Butter only – 80, Bread and Butter – 50, as shown in the Venn diagram below:

alt text


Analysis and Findings
We can use MBA to extract the association rule between Bread and Butter. There are three metrics or criteria to evaluate the strength or quality of an association rule, which are support, confidence and lift.

1. Support
Support measures the percentage of transactions containing a particular combination of items relative to the total number of transactions. In our example, this is the percentage of transactions where both Bread and Butter are bought together. We need to calculate this to know if this combination of items is significant or negligible? Generally, we want a high percentage i.e. high support in order to make sure it is a useful relationship. Typically, we will set a threshold, for example we will only look at a combination if more than 1% of transactions have this combination.

Support (antecedent (Bread) and consequent (Butter)) = Number of transactions having both items / Total transactions

alt text

Result: The support value of 5% means 5% of all transactions have this combination of Bread and Butter bought together. Since the value is above the threshold of 1%, it shows there is indeed support for this association and thus satisfy the first criteria.



 Confidence
Confidence measures the probability of finding a particular combination of items whenever antecedent is bought. In probability terms, confidence is the conditional probability of the consequent given the antecedent and is represented as P (consequent / antecedent). In our example, it is the probability of both Bread and Butter being bought together whenever Bread is bought. Typically, we may set a threshold, say we want this combination to occur at least 25% of times when Bread is bought.

Confidence (antecedent i.e. Bread and consequent i.e. Butter) = P (Consequent (Butter) is bought GIVEN antecedent (Bread) is bought)

alt text

Result: The confidence value of 33.3% is above the threshold of 25%, indicating we can be confident that Butter will be bought whenever Bread is bought, and thus satisfy the second criteria.

3. Lift
Lift is a metric to determine how much the purchase of antecedent influences the purchase of consequent.
In our example, we want to know whether the purchase of Butter is independent of the purchase of Bread (or) is the purchase of Butter happening due to the purchase of Bread? In probability terms, we want to know which is higher, P (Butter) or P (Butter / Bread)? If the purchase of Butter is influenced by the purchase of Bread, then P (Butter / Bread) will be higher than P (Butter), or in other words, the ratio of P (Butter / Bread) over P (Butter) will be higher than 1.

alt text

Result: The lift value of 2.56 is greater than 1, it shows that the purchase of Butter is indeed influenced by the purchase of Bread rather than Butter's purchase being independent of Bread. The lift value of 2.56 also means that Bread's purchase lifts the Butter's purchase by 2.56 times.



Introduction
The goal of market basket analysis is to discover items that are most likely to be purchased together.
Such discoveries are usually done by mining historical data of transactions. This discipline is very important for e-commerce websites and retail magazines as it allows them to better organize their layouts.

In this workshop, you will use the customers' purchase history of a grocery store to cluster items that should be placed or bundled together.


Market Basket Analysis of Store Data
Dataset Description
Different products given 7500 transactions over the course of a week at a French retail store.
We have library(apyori) to calculate the association rule using Apriori.

In [1]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py): started
  Building wheel for apyori (setup.py): finished with status 'done'
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5975 sha256=90b6d05d4952c227f882271f030eee281001f3e7dfde4a8468d7191b7cf4577f
  Stored in directory: c:\users\admin\appdata\local\pip\cache\wheels\47\6f\0f\21a86f3679f7ed6bbe4dc6694f86818c5d85c2044bfab0f1e8
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori

Read data and Display

In [3]:
store_data = pd.read_csv("C:\\Users\\Admin\\Desktop\\FILES\\AKPROJECTS\\Customer Segmentation\\Market_Basket_Optimisation.csv", header=None)
display(store_data.head())
print(store_data.shape)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


(7501, 20)


Preprocessing on Data

In [4]:

records = []
for i in range(1, 7501):
    records.append([str(store_data.values[i, j]) for j in range(0, 20)])

In [5]:
print(type(records))

<class 'list'>



Apriori Algorithm
Now time to apply algorithm on data.
We have provide min_support, min_confidence, min_lift, and min length of sample-set for find rule.

In [7]:

association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

In [8]:
print("There are {} Relation derived.".format(len(association_results)))

There are 48 Relation derived.



Association Rules Derived

In [9]:
for i in range(0, len(association_results)):
    print(association_results[i][0])

frozenset({'light cream', 'chicken'})
frozenset({'mushroom cream sauce', 'escalope'})
frozenset({'pasta', 'escalope'})
frozenset({'herb & pepper', 'ground beef'})
frozenset({'tomato sauce', 'ground beef'})
frozenset({'whole wheat pasta', 'olive oil'})
frozenset({'shrimp', 'pasta'})
frozenset({'light cream', 'chicken', 'nan'})
frozenset({'shrimp', 'chocolate', 'frozen vegetables'})
frozenset({'ground beef', 'spaghetti', 'cooking oil'})
frozenset({'mushroom cream sauce', 'nan', 'escalope'})
frozenset({'pasta', 'nan', 'escalope'})
frozenset({'spaghetti', 'frozen vegetables', 'ground beef'})
frozenset({'milk', 'olive oil', 'frozen vegetables'})
frozenset({'shrimp', 'frozen vegetables', 'mineral water'})
frozenset({'spaghetti', 'olive oil', 'frozen vegetables'})
frozenset({'spaghetti', 'shrimp', 'frozen vegetables'})
frozenset({'spaghetti', 'tomatoes', 'frozen vegetables'})
frozenset({'spaghetti', 'grated cheese', 'ground beef'})
frozenset({'herb & pepper', 'mineral water', 'ground beef'})


Rules Generated

In [10]:
for item in association_results:
    # first index of the inner list
    # Contains base item and add item
    pair = item[0]
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    # second index of the inner list
    print("Support: " + str(item[1]))

    # third index of the list located at 0th
    # of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

Rule: light cream -> chicken
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
Rule: mushroom cream sauce -> escalope
Support: 0.005733333333333333
Confidence: 0.30069930069930073
Lift: 3.7903273197390845
Rule: pasta -> escalope
Support: 0.005866666666666667
Confidence: 0.37288135593220345
Lift: 4.700185158809287
Rule: herb & pepper -> ground beef
Support: 0.016
Confidence: 0.3234501347708895
Lift: 3.2915549671393096
Rule: tomato sauce -> ground beef
Support: 0.005333333333333333
Confidence: 0.37735849056603776
Lift: 3.840147461662528
Rule: whole wheat pasta -> olive oil
Support: 0.008
Confidence: 0.2714932126696833
Lift: 4.130221288078346
Rule: shrimp -> pasta
Support: 0.005066666666666666
Confidence: 0.3220338983050848
Lift: 4.514493901473151
Rule: light cream -> chicken
Support: 0.004533333333333334
Confidence: 0.2905982905982906
Lift: 4.843304843304844
Rule: shrimp -> chocolate
Support: 0.005333333333333333
Confidence: 0.23255813953488372
Lift: 3.