<a href="https://colab.research.google.com/github/MbogoriL/market-basket-analysis/blob/main/Market_Basket_Analysis_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Independent Project: Market Basket Analysis


# Defining the Question

**a) Specifying the Data Analysis Question**


Analyze transactional data to identify the top 10 products likely to be purchased together.

**b) Defining the metric of success**

Finding the top 10 products likely to be purchased together


**c) Understanding the context**

Care five is a German multinational retail corporation headquartered in Berlin, Germany.
It is the eighth-largest retailer in the world by revenue. It operates a chain of
hypermarkets, groceries stores, and convenience stores, which as of January 2021,
comprises its 12,00 stores in over 30 countries. As a Data analyst working for one of the  stores, you must perform market basket
analysis to help the store maximize revenue.


**d) Recording the experimental design**

*   Perform data importation and loading
*   Perform data preprocessing
*   Find frequent itemsets
*   Generate association rules
*   Perform metric interpretation and provide recommendation

# Data Importation and Loading

**Prerequisites**

In [None]:
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [None]:
#load dataset
stores_df = pd.read_csv('MarketBasketAnalysis.csv')
stores_df.sample(10)

Unnamed: 0,A,Quantity,Transaction,Store,Product
7234,37234,1,112523,7,Magazine
5597,35597,4,108275,6,Toothbrush
5591,35591,1,108275,6,Magazine
13535,43535,4,129314,10,Photo Processing
14294,44294,1,131453,9,Soap
7566,37566,2,113438,2,Pens
6898,36898,1,111638,10,Shampoo
5309,35309,1,107618,3,Magazine
5893,35893,2,108998,7,Markers
1228,31228,1,96566,1,Bow


In [None]:
stores_df.shape

(15001, 5)

# Data Preprocessing

In [None]:
#dropping A as it contains the numbering of items from the sample dataset
stores_df = stores_df.drop('A', axis=1)
stores_df.head()

Unnamed: 0,Quantity,Transaction,Store,Product
0,2,93194,6,Magazine
1,2,93194,6,Candy Bar
2,2,93194,6,Candy Bar
3,2,93194,6,Candy Bar
4,2,93194,6,Candy Bar


In [None]:
#group items per transaction
stores_df2 = (stores_df.groupby(['Transaction', 'Store', 'Product'])['Quantity']
          .sum().unstack().reset_index().fillna(0)
          .set_index('Transaction'))

stores_df2.head()

Product,Store,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
93194,6,0.0,8.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93197,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93200,6,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93206,8,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93212,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0


In [None]:
#one hot encode the store number column
stores_df2 = pd.get_dummies(stores_df2, columns=['Store'])



In [None]:
stores_df2.head()

Unnamed: 0_level_0,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,...,Store_1,Store_2,Store_3,Store_4,Store_5,Store_6,Store_7,Store_8,Store_9,Store_10
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
93194,0.0,8.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,1,0,0,0,0
93197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,...,1,0,0,0,0,0,0,0,0,0
93200,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0,0,0,0,0,1,0,0,0,0
93206,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,...,0,0,0,0,0,0,0,1,0,0
93212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0,0,0,1,0,0,0,0,0,0


In [None]:
#convert all the values into 0 or 1
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

stores_df3 = stores_df2.applymap(encode_units)

stores_df3.head()

Unnamed: 0_level_0,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,...,Store_1,Store_2,Store_3,Store_4,Store_5,Store_6,Store_7,Store_8,Store_9,Store_10
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
93194,0,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
93197,0,0,0,0,0,0,0,1,0,0,...,1,0,0,0,0,0,0,0,0,0
93200,0,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
93206,0,0,0,1,1,0,0,1,0,0,...,0,0,0,0,0,0,0,1,0,0
93212,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


# Find Frequent Itemsets


In [None]:
stores_frequent_itemsets = apriori(stores_df2, min_support=0.01, use_colnames=True)
stores_frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.051591,(Bow)
1,0.175736,(Candy Bar)
2,0.15284,(Greeting Cards)
3,0.231936,(Magazine)
4,0.020071,(Pain Reliever)


In [None]:
# Step 3: Finding the association rules
shop_rules = association_rules(stores_frequent_itemsets, metric="lift", min_threshold=1)

# Sorting 
shop_rules.sort_values("confidence", ascending = False, inplace = True)

# Previewing the associative rules
shop_rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
105,"(Toothpaste, Pencils)",(Candy Bar),0.022748,0.175736,0.011002,0.48366,2.752198,0.007005,1.596359
81,"(Greeting Cards, Magazine)",(Candy Bar),0.037467,0.175736,0.017247,0.460317,2.61937,0.010662,1.527313
99,"(Toothpaste, Magazine)",(Candy Bar),0.029884,0.175736,0.013232,0.442786,2.51961,0.007981,1.47926
87,"(Greeting Cards, Toothpaste)",(Candy Bar),0.033304,0.175736,0.01457,0.4375,2.48953,0.008718,1.465358
82,"(Candy Bar, Magazine)",(Greeting Cards),0.039994,0.15284,0.017247,0.431227,2.821431,0.011134,1.489452


(Toothpaste,pencils) have a strong association with candy bars, and should be sold together.