# <font color='#2F4F4F'>Market Basket Analysis with Python - Project</font>

## <font color='#2F4F4F'>Step 1. Business Understading </font>

### a) Specifying the Research Question

Perform market basket analysis to help a store maximize revenue

### b) Defining the Metric for Success

A market basket analysis that identifys the top 10 products likely to be purchased together

### c) Understanding the Context 

Care five is a German multinational retail corporation headquartered in Berlin, Germany.
It is the eighth-largest retailer in the world by revenue. It operates a chain of
hypermarkets, groceries stores, and convenience stores, which as of January 2021,
comprises its 12,00 stores in over 30 countries.
As a Data analyst working for one of the stores, you must perform market basket
analysis to help the store maximize revenue. More specifically, your task will analyze
transactional data to identify the top 10 products likely to be purchased together.

### d) Recording the Experimental Design



1. Define the business question
2. Perform data importation and loading
3. Perform data preprocessing
4. Find frequent itemsets
5. Generate association rules
6. Perform metric interpretation and provide recommendation



## <font color='#2F4F4F'>Step 2. Data Importation</font>

In [1]:
# Import the required libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [5]:
# Loading our data
carebasket_df = pd.read_csv("https://bit.ly/30A2gHO")
carebasket_df.head()

Unnamed: 0,A,Quantity,Transaction,Store,Product
0,30000,2,93194,6,Magazine
1,30001,2,93194,6,Candy Bar
2,30002,2,93194,6,Candy Bar
3,30003,2,93194,6,Candy Bar
4,30004,2,93194,6,Candy Bar


## <font color='#2F4F4F'>Step 3. Data Pre-Processing</font>

In [3]:
# <Data pre-processing>
# Grouping the data by Transaction and product and displaying the count of items
carebasket_df2 = carebasket_df.groupby(["Transaction","Product"]).size().reset_index(name="Count")
carebasket_df2.head()

Unnamed: 0,Transaction,Product,Count
0,93194,Candy Bar,4
1,93194,Magazine,1
2,93197,Pencils,1
3,93200,Candy Bar,3
4,93200,Magazine,1


In [6]:
# ---
#Consolidating the items into one transaction per row with each item one-hot encoded.
# ---
#
carebasket_df3 = (carebasket_df2.groupby(['Transaction', 'Product'])['Count']
          .sum().unstack().reset_index().fillna(0)
          .set_index('Transaction'))

carebasket_df3.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0.0,4.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93197,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93200,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93206,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93212,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [7]:
# ---
# We then use our custom encoding function to convert 
# all the values to 0 or 1. 
# The Apriori algorithm will only take 0's or 1's.
# ---
# 
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

carebasket_df4 = carebasket_df3.applymap(encode_units)

carebasket_df4.head()

Product,Bow,Candy Bar,Deodorant,Greeting Cards,Magazine,Markers,Pain Reliever,Pencils,Pens,Perfume,Photo Processing,Prescription Med,Shampoo,Soap,Toothbrush,Toothpaste,Wrapping Paper
Transaction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
93194,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93197,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
93200,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
93206,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0
93212,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


## <font color='#2F4F4F'>Step 4. Find frequent itemsets</font>

In [27]:
# < We generate the frequent itemsets>
carebasket_frequent_itemsets = apriori(carebasket_df4, min_support=0.01, use_colnames=True)
carebasket_frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.051591,(Bow)
1,0.175736,(Candy Bar)
2,0.15284,(Greeting Cards)
3,0.231936,(Magazine)
4,0.020071,(Pain Reliever)


## <font color='#2F4F4F'>Step 5. Generate association rules</font>

In [37]:
# <Finding the association rules>
care_rules = association_rules(carebasket_frequent_itemsets, metric="lift", min_threshold=2)

# Sorting 
care_rules.sort_values("confidence", ascending = False, inplace = True)

# Previewing the associative rules
care_rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
18,"(Pencils, Toothpaste)",(Candy Bar),0.022748,0.175736,0.011002,0.48366,2.752198,0.007005,1.596359
5,"(Greeting Cards, Magazine)",(Candy Bar),0.037467,0.175736,0.017247,0.460317,2.61937,0.010662,1.527313
15,"(Toothpaste, Magazine)",(Candy Bar),0.029884,0.175736,0.013232,0.442786,2.51961,0.007981,1.47926
9,"(Toothpaste, Greeting Cards)",(Candy Bar),0.033304,0.175736,0.01457,0.4375,2.48953,0.008718,1.465358
4,"(Candy Bar, Magazine)",(Greeting Cards),0.039994,0.15284,0.017247,0.431227,2.821431,0.011134,1.489452


## <font color='#2F4F4F'>Step 6. Perform metric interpretation and provide recommendation</font>

Observation:
*   The output above shows the top 5 itemsets sorted by confidence level. All items have support above 1 , confidence level above 40% and lift above 2.4
*   We see that candy bar purchases are lifted by purchase of other items such as Toothpaste, magazine,greeting cards and pencils by atleast 2.5 times while greeting cards purchases by lifted 2.8 times by purchase of candy bar and magazine

Recommendation:
*   The store should consider bundling candy bar, magazine and greeting cards
*   They may also consider displaying magazines, greeting cards and pencils in close proximity with the candy bars as this will further lift their purchases





