<a href="https://colab.research.google.com/github/CoolerKula/Market-Basket-Analysis-Project/blob/main/Market_Basket_Analysis_Project_Bernard.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Background and Problem Statement**

Care five is a German multinational retail corporation headquartered in Berlin, Germany. It is the eighth-largest retailer in the world by revenue. It operates a chain of hypermarkets, groceries stores, and convenience stores, which as of January 2021, comprises its 1,200 stores in over 30 countries.

As a Data analyst working for one of the stores, you must perform market basket analysis to help the store maximize revenue. More specifically, your task will analyze transactional data to identify the top 10 products likely to be purchased together. Given a dataset containing transactional data of products sold in the past week, you will
be required to perform the following:

● Define the business question

● Perform data importation and loading

● Perform data preprocessing

● Find frequent itemsets

● Generate association rules

● Perform metric interpretation and provide recommendation
Dataset

Study your data carefully before implementing your solution.

Dataset URL = https://bit.ly/30A2gHO

**Defining the business question**/ **Specifying the Research Question**

As a Data analyst working for one of the stores, you must perform market basket analysis to help the store maximize revenue.

**Defining the Metric for Success**

We will achieve our objective by finding association of itemsets with more than 0.3 Confidence and Lift greater than 1

**Perform data importation and loading**

In [None]:
# Import the required libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [None]:
basket_df = pd.read_csv("https://bit.ly/30A2gHO")
basket_df.head()

Unnamed: 0,A,Quantity,Transaction,Store,Product
0,30000,2,93194,6,Magazine
1,30001,2,93194,6,Candy Bar
2,30002,2,93194,6,Candy Bar
3,30003,2,93194,6,Candy Bar
4,30004,2,93194,6,Candy Bar


**Perform data preprocessing**

In [None]:
# We group the basket dataframe by Transaction 
# and Product and display the count of items
# ---
basket_df1 = basket_df.groupby(['Transaction','Product']).size().reset_index(name='Count')
basket_df1.head()

Unnamed: 0,Transaction,Product,Count
0,93194,Candy Bar,4
1,93194,Magazine,1
2,93197,Pencils,1
3,93200,Candy Bar,3
4,93200,Magazine,1


In [None]:
# Then we consolidate the items into one transaction per row 
# with each item one-hot encoded.
# ---
#
basket_df2 = (basket_df1.groupby(['Transaction', 'Product'])['Count']
          .sum().unstack().reset_index().fillna(0)
          .set_index('Transaction'))

basket_df2.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0.0,4.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93200,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93206,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [None]:
# We then use our custom encoding function to convert 
# all the values to 0 or 1. 
# The Apriori algorithm will only take 0's or 1's.
# ---
# 
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

basket_df3 = basket_df2.applymap(encode_units)

basket_df3.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93197,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
93200,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93206,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0
93212,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


**Find frequent itemsets**

In [None]:
#We generate the frequent itemsets
shop_frequent_itemsets = apriori(basket_df3, min_support=0.01, use_colnames=True)
shop_frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.051591,(Bow)
1,0.175736,(Candy Bar)
2,0.15284,(Greeting Cards)
3,0.231936,(Magazine)
4,0.020071,(Pain Reliever)


**Generate association rules**

In [None]:
#Finding the association rules
shop_rules = association_rules(shop_frequent_itemsets, metric="lift", min_threshold=1)

# Sorting 
shop_rules.sort_values("confidence", ascending = False, inplace = True)

# Previewing the associative rules
shop_rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
46,"(Pencils, Toothpaste)",(Candy Bar),0.022748,0.175736,0.011002,0.48366,2.752198,0.007005,1.596359
22,"(Magazine, Greeting Cards)",(Candy Bar),0.037467,0.175736,0.017247,0.460317,2.61937,0.010662,1.527313
40,"(Magazine, Toothpaste)",(Candy Bar),0.029884,0.175736,0.013232,0.442786,2.51961,0.007981,1.47926
28,"(Greeting Cards, Toothpaste)",(Candy Bar),0.033304,0.175736,0.01457,0.4375,2.48953,0.008718,1.465358
20,"(Candy Bar, Magazine)",(Greeting Cards),0.039994,0.15284,0.017247,0.431227,2.821431,0.011134,1.489452


**Perform metric interpretation and provide recommendation Dataset**

**Observation**

* The output above shows the Top 5 itemsets sorted by confidence value and all itemsets have support value over 1% and lift value over 1. 

* The first itemset shows the association rule "if Pencils, Toothpaste then Candy Bar" with support value at 0.022748 means nearly 2.3% of all transactions have this combination of Pencils, Toothpaste and Candy Bar bought together. 

* Therefore, we can conclude that there is indeed evidence to suggest that the purchase of Toothpaste, Pencils, Magazine, Greeting Cards and Candy Bar go hand in hand. Care Five Supermarket should consider bundling the above items next to one another, the staff in the store should also be trained to cross-sell these items, knowing that customers are more likely to purchase them together, thereby increasing the supermarket's revenue.