# Market Basket Analysis Project


# Instructions
## Background and Problem Statement
Care five is a German multinational retail corporation headquartered in Berlin, Germany.
It is the eighth-largest retailer in the world by revenue. It operates a chain of
hypermarkets, groceries stores, and convenience stores, which as of January 2021,
comprises its 12,00 stores in over 30 countries.
As a Data analyst working for one of the stores, you must perform market basket
analysis to help the store maximize revenue. More specifically, your task will analyze
transactional data to identify the top 10 products likely to be purchased together.
Given a dataset containing transactional data of products sold in the past week, you will
be required to perform the following:

● Define the business question

● Perform data importation and loading

● Perform data preprocessing

● Find frequent itemsets

● Generate association rules

● Perform metric interpretation and provide recommendation


## Dataset
Study your data carefully before implementing your solution.
Dataset URL = https://bit.ly/30A2gHO

#Defining the business question

As a Data analyst working for one of the stores, perform market basket analysis to help the store maximize revenue.

# Defining the Metric for Success

We will achieve our objective by finding association of itemsets with more than 0.3 Confidence and Lift greater than 1

# Data Processing

In [None]:
# Import the required libraries
import pandas as pd
import numpy as np               
import matplotlib.pyplot as plt
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [None]:
df = pd.read_csv("https://bit.ly/30A2gHO")
df.head(10)

Unnamed: 0,A,Quantity,Transaction,Store,Product
0,30000,2,93194,6,Magazine
1,30001,2,93194,6,Candy Bar
2,30002,2,93194,6,Candy Bar
3,30003,2,93194,6,Candy Bar
4,30004,2,93194,6,Candy Bar
5,30005,2,93197,1,Pencils
6,30006,1,93200,6,Candy Bar
7,30007,1,93200,6,Candy Bar
8,30008,1,93200,6,Candy Bar
9,30009,1,93200,6,Magazine


# Data Processing


In [None]:
#Group the basket dataframe by Transaction & Product
#Display the count of items
# ---
df2 = df.groupby(['Transaction','Product']).size().reset_index(name='Count')
df2.head(10)

Unnamed: 0,Transaction,Product,Count
0,93194,Candy Bar,4
1,93194,Magazine,1
2,93197,Pencils,1
3,93200,Candy Bar,3
4,93200,Magazine,1
5,93206,Greeting Cards,1
6,93206,Magazine,1
7,93206,Pencils,2
8,93212,Toothbrush,1
9,93215,Candy Bar,2


In [None]:
#Consolidate the items into one transaction per row 
#One Hot Encode
# ---
#
df3 = (df2.groupby(['Transaction', 'Product'])['Count']
          .sum().unstack().reset_index().fillna(0)
          .set_index('Transaction'))

df3.head(10)

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0.0,4.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93200,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93206,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
93215,0.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93233,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
93239,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93245,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93248,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
#Use encoding function to convert all the values to 0 or 1. 
#Apriori algorithm will only take 0's or 1's.
# ---
# 
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

df4 = df3.applymap(encode_units)

df4.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93197,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
93200,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93206,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0
93212,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


#Find Frequent Itemsets

In [None]:
#Generate frequent itemsets
shop_frequent_itemsets = apriori(df4, min_support=0.01, use_colnames=True)
shop_frequent_itemsets.head(15)

Unnamed: 0,support,itemsets
0,0.051591,(Bow)
1,0.175736,(Candy Bar)
2,0.15284,(Greeting Cards)
3,0.231936,(Magazine)
4,0.020071,(Pain Reliever)
5,0.135147,(Pencils)
6,0.144068,(Pens)
7,0.082664,(Perfume)
8,0.055456,(Photo Processing)
9,0.014422,(Prescription Med)


# Generate association rules

In [None]:
#Finding the association rules
shop_rules = association_rules(shop_frequent_itemsets, metric="lift", min_threshold=1)

# Sorting 
shop_rules.sort_values("lift", ascending = False, inplace = True)

#Top 10 combinations of items frequently bought together
shop_rules.head(15)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
19,(Perfume),(Toothbrush),0.082664,0.067648,0.017098,0.206835,3.057514,0.011506,1.175482
18,(Toothbrush),(Perfume),0.067648,0.082664,0.017098,0.252747,3.057514,0.011506,1.227611
1,(Bow),(Toothbrush),0.051591,0.067648,0.01011,0.195965,2.896843,0.00662,1.159592
0,(Toothbrush),(Bow),0.067648,0.051591,0.01011,0.149451,2.896843,0.00662,1.115055
25,(Greeting Cards),"(Candy Bar, Magazine)",0.15284,0.039994,0.017247,0.11284,2.821431,0.011134,1.082112
20,"(Candy Bar, Magazine)",(Greeting Cards),0.039994,0.15284,0.017247,0.431227,2.821431,0.011134,1.489452
55,(Greeting Cards),"(Pencils, Magazine)",0.15284,0.028546,0.012043,0.078794,2.760244,0.00768,1.054546
50,"(Pencils, Magazine)",(Greeting Cards),0.028546,0.15284,0.012043,0.421875,2.760244,0.00768,1.465358
47,(Candy Bar),"(Pencils, Toothpaste)",0.175736,0.022748,0.011002,0.062606,2.752198,0.007005,1.04252
46,"(Pencils, Toothpaste)",(Candy Bar),0.022748,0.175736,0.011002,0.48366,2.752198,0.007005,1.596359


# Metric interpretation 

In [None]:
# Sorting 
shop_rules.sort_values("confidence", ascending = False, inplace = True)

# Previewing the associative rules
shop_rules.head(15)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
46,"(Pencils, Toothpaste)",(Candy Bar),0.022748,0.175736,0.011002,0.48366,2.752198,0.007005,1.596359
22,"(Magazine, Greeting Cards)",(Candy Bar),0.037467,0.175736,0.017247,0.460317,2.61937,0.010662,1.527313
40,"(Magazine, Toothpaste)",(Candy Bar),0.029884,0.175736,0.013232,0.442786,2.51961,0.007981,1.47926
28,"(Greeting Cards, Toothpaste)",(Candy Bar),0.033304,0.175736,0.01457,0.4375,2.48953,0.008718,1.465358
20,"(Candy Bar, Magazine)",(Greeting Cards),0.039994,0.15284,0.017247,0.431227,2.821431,0.011134,1.489452
50,"(Pencils, Magazine)",(Greeting Cards),0.028546,0.15284,0.012043,0.421875,2.760244,0.00768,1.465358
51,"(Pencils, Greeting Cards)",(Magazine),0.029884,0.231936,0.012043,0.402985,1.737486,0.005112,1.286508
21,"(Candy Bar, Greeting Cards)",(Magazine),0.04609,0.231936,0.017247,0.374194,1.61335,0.006557,1.227319
56,"(Magazine, Toothpaste)",(Greeting Cards),0.029884,0.15284,0.011151,0.373134,2.441344,0.006583,1.351422
34,"(Pencils, Magazine)",(Candy Bar),0.028546,0.175736,0.010407,0.364583,2.074609,0.005391,1.297202


# Observation & Recommendation


   **Highest Lift**


* Toothbrush and Perfume are the items most likely to be purchased together with the highest lift of 3.057514. They should be palaced close to each.


**Highest confidence levels**

The purchase of Toothpaste, Pencils, Magazine, Greeting Cards and Candy Bar go hand in hand. The Supermarket should consider placing the above items on the same aisle.