<a href="https://colab.research.google.com/github/Ckiteme/CKiteme-Asignment-Market-Basket-Analysis/blob/main/CKiteme_Asignment_Market_Basket_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Practice Notebook: Market Basket Analysis

**Background and Problem Statement**

Care five is a German multinational retail corporation headquartered in Berlin, Germany.
It is the eighth-largest retailer in the world by revenue. It operates a chain of
hypermarkets, groceries stores, and convenience stores, which as of January 2021,
comprises its 12,00 stores in over 30 countries.

As a Data analyst working for one of the stores, you must perform market basket
analysis to help the store maximize revenue. More specifically, your task will analyze
transactional data to identify the top 10 products likely to be purchased together.
Given a dataset containing transactional data of products sold in the past week, you will
be required to perform the following:

● Define the business question

● Perform data importation and loading

● Perform data preprocessing

● Find frequent itemsets

● Generate association rules

● Perform metric interpretation and provide recommendation


## Pre-requisites

In [3]:
# Import the required libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

### Perform data importation and loading

we will perform market basket analysis to help the store maximize revenue. More specifically, our task will analyze transactional data to identify the top 10 products likely to be purchased together.

In [4]:
# Dataset URL (CSV) = = https://bit.ly/30A2gHO
# ---
## 1. Each row corresponds to a transaction and each column corresponds 
#    to an item purchased in that specific transaction.
# 2. The NaN tells us that the item represented by the column was not 
#    purchased in that specific transaction.
# ---
# 

# Step 1: Loading and Data Processing

Carefive_df = pd.read_csv('https://bit.ly/30A2gHO')
Carefive_df.sample(10)

Unnamed: 0,A,Quantity,Transaction,Store,Product
12200,42200,1,125834,6,Magazine
631,30631,6,94946,5,Magazine
14903,44903,1,133130,1,Wrapping Paper
7928,37928,1,114152,2,Pens
4866,34866,1,106559,5,Bow
9605,39605,1,119069,4,Toothpaste
7996,37996,1,114461,8,Perfume
5507,35507,4,108074,7,Candy Bar
3995,33995,1,104276,3,Toothpaste
6028,36028,1,109361,9,Candy Bar


In [5]:
# Step 1: Data processing 
# ---
# We group the bread dataframe by Transaction 
# and Item and display the count of items
# ---
Carefive_df2 = Carefive_df.groupby(["Transaction","Product"]).size().reset_index(name="Count")
Carefive_df2.sample(10)

Unnamed: 0,Transaction,Product,Count
8655,129260,Toothpaste,1
8508,128660,Perfume,1
5088,114431,Pens,1
5650,116672,Toothpaste,1
307,94520,Wrapping Paper,1
7240,123365,Toothpaste,1
9456,132722,Toothpaste,2
1535,99965,Pain Reliever,3
1341,99149,Photo Processing,1
4167,110648,Prescription Med,1


In [6]:
# Step 1: Data processing 
# ---
# Then we consolidate the items into one transaction per row 
# with each item one-hot encoded.
# ---
#
Carefive_df3 = (Carefive_df2.groupby(['Transaction', 'Product'])['Count']
          .sum().unstack().reset_index().fillna(0)
          .set_index('Transaction'))

Carefive_df3.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0.0,4.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93200,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93206,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [7]:
# Step 1: Data processing
# ---
# We then use our custom encoding function to convert 
# all the values to 0 or 1. 
# The Apriori algorithm will only take 0's or 1's.
# ---
# 
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

Carefive_df4 = Carefive_df3.applymap(encode_units)

Carefive_df4.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93197,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
93200,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93206,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0
93212,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


In [11]:
# Step 2: We generate the frequent itemsets
Carefive_frequent_itemsets = apriori(Carefive_df4, min_support=0.01, use_colnames=True)
Carefive_frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.051591,(Bow)
1,0.175736,(Candy Bar)
2,0.15284,(Greeting Cards)
3,0.231936,(Magazine)
4,0.020071,(Pain Reliever)


In [13]:
# Step 3: Finding the association rules
Carefive_rules = association_rules(Carefive_frequent_itemsets, metric="lift", min_threshold=1)

# Sorting 
Carefive_rules.sort_values("confidence", ascending = False, inplace = True)

# Previewing the associative rules
Carefive_rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
46,"(Pencils, Toothpaste)",(Candy Bar),0.022748,0.175736,0.011002,0.48366,2.752198,0.007005,1.596359
22,"(Magazine, Greeting Cards)",(Candy Bar),0.037467,0.175736,0.017247,0.460317,2.61937,0.010662,1.527313
40,"(Magazine, Toothpaste)",(Candy Bar),0.029884,0.175736,0.013232,0.442786,2.51961,0.007981,1.47926
28,"(Greeting Cards, Toothpaste)",(Candy Bar),0.033304,0.175736,0.01457,0.4375,2.48953,0.008718,1.465358
20,"(Candy Bar, Magazine)",(Greeting Cards),0.039994,0.15284,0.017247,0.431227,2.821431,0.011134,1.489452
50,"(Pencils, Magazine)",(Greeting Cards),0.028546,0.15284,0.012043,0.421875,2.760244,0.00768,1.465358
51,"(Pencils, Greeting Cards)",(Magazine),0.029884,0.231936,0.012043,0.402985,1.737486,0.005112,1.286508
21,"(Candy Bar, Greeting Cards)",(Magazine),0.04609,0.231936,0.017247,0.374194,1.61335,0.006557,1.227319
57,"(Magazine, Toothpaste)",(Greeting Cards),0.029884,0.15284,0.011151,0.373134,2.441344,0.006583,1.351422
34,"(Pencils, Magazine)",(Candy Bar),0.028546,0.175736,0.010407,0.364583,2.074609,0.005391,1.297202


**Observation**

The top 10 products likely to be purchased together are;

1.   (Pencils, Toothpaste)	(Candy Bar)	
2.   (Magazine, Greeting Cards)	(Candy Bar)
3.   (Magazine, Toothpaste)	(Candy Bar)	
4.   (Greeting Cards, Toothpaste)	(Candy Bar)
5.   (Candy Bar, Magazine)	(Greeting Cards)	
6.   (Pencils, Magazine)	(Greeting Cards)	
7.   (Pencils, Greeting Cards)	(Magazine)	
8.   (Candy Bar, Greeting Cards)	(Magazine)
9.   (Magazine, Toothpaste)	(Greeting Cards)
10.  (Pencils, Magazine)	(Candy Bar

These items should be placed nearing each other in the stores
