### **Cross Selling (Market Basket Analysis)**

We perform Market basket analysis to understand which of items can be bundled together. We utlize Apriori algorithm to perform Market Basket analysis 

In [27]:
# import requiresd libraries and packages 
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
plt.style.use('ggplot')
import seaborn as sns 

from mlxtend.frequent_patterns import apriori, association_rules

**Read preprocessed and cleaned data** 

In [2]:
sales_df = pd.read_csv('online_sales_final.csv')
sales_df.head()

Unnamed: 0,customer_id,transaction_id,transaction_date,product_sku,product_description,product_category,quantity,avg_price,delivery_charges,coupon_status,month,gender,location,tenure_months,gst,coupon_code,discount_pct,invoice_value,first_purchase_date,customer_type
0,17850,16679,2019-01-01,GGOENEBJ079499,Nest Learning Thermostat 3rd Gen-USA - Stainle...,Nest-USA,1,153.71,6.5,Used,Jan,M,Chicago,12,0.1,ELEC10,0.1,158.6729,2019-01-01,New
1,17850,16680,2019-01-01,GGOENEBJ079499,Nest Learning Thermostat 3rd Gen-USA - Stainle...,Nest-USA,1,153.71,6.5,Used,Jan,M,Chicago,12,0.1,ELEC10,0.1,158.6729,2019-01-01,New
2,17850,16681,2019-01-01,GGOEGFKQ020399,Google Laptop and Cell Phone Stickers,Office,1,2.05,6.5,Used,Jan,M,Chicago,12,0.1,OFF10,0.1,8.5295,2019-01-01,New
3,17850,16682,2019-01-01,GGOEGAAB010516,Google Men's 100% Cotton Short Sleeve Hero Tee...,Apparel,5,17.53,6.5,Not Used,Jan,M,Chicago,12,0.18,SALE10,0.1,99.5843,2019-01-01,New
4,17850,16682,2019-01-01,GGOEGBJL013999,Google Canvas Tote Natural/Navy,Bags,1,16.5,6.5,Used,Jan,M,Chicago,12,0.18,AIO10,0.1,24.023,2019-01-01,New


Let's explore the top selling products 

In [3]:
# top 20 selling products  
sales_df['product_description'].value_counts()[:20]

Nest Learning Thermostat 3rd Gen-USA - Stainless Steel    3511
Nest Cam Outdoor Security Camera - USA                    3328
Nest Cam Indoor Security Camera - USA                     3230
Google Sunglasses                                         1523
Nest Protect Smoke + CO White Battery Alarm-USA           1361
Nest Learning Thermostat 3rd Gen-USA - White              1089
Nest Protect Smoke + CO White Wired Alarm-USA             1065
Google 22 oz Water Bottle                                  902
Nest Thermostat E - USA                                    844
Google Laptop and Cell Phone Stickers                      806
Nest Cam IQ - USA                                          599
Google Men's 100% Cotton Short Sleeve Hero Tee Black       595
Google Twill Cap                                           546
Google Men's 100% Cotton Short Sleeve Hero Tee White       504
Nest Secure Alarm System Starter Pack - USA                498
Google Men's Vintage Badge Tee Black                   

Let’s pivot this table to convert the items into columns and the transaction into rows. The resulting table tells us how many times each item has been purchased in one transaction.

 **A cross-tabulation is created to represent the frequency of products in each transaction.**

In [4]:
# Create a cross-tabulation of transactions and products
df_cross = pd.crosstab(sales_df['transaction_id'], sales_df['product_description'])
df_cross.head()

product_description,1 oz Hand Sanitizer,20 oz Stainless Steel Insulated Tumbler,22 oz Android Bottle,22 oz YouTube Bottle Infuser,23 oz Wide Mouth Sport Bottle,24 oz YouTube Sergeant Stripe Bottle,25L Classic Rucksack,26 oz Double Wall Insulated Bottle,7&quot; Dog Frisbee,8 pc Android Sticker Sheet,...,YouTube Twill Cap,YouTube Women's Favorite Tee White,YouTube Women's Fleece Hoodie Black,YouTube Women's Racer Back Tank Black,YouTube Women's Short Sleeve Hero Tee Charcoal,YouTube Women's Short Sleeve Tri-blend Badge Tee Charcoal,YouTube Women's Short Sleeve Tri-blend Badge Tee Grey,YouTube Womens 3/4 Sleeve Baseball Raglan White/Black,YouTube Wool Heather Cap Heather/Black,YouTube Youth Short Sleeve Tee Red
transaction_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
16679,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
16680,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
16681,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
16682,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
16684,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


**A custom encoding function is defined to convert item frequencies into binary values (1 if frequency > 0, else 0). This step is essential for inputting data into the Apriori algorithm.**

In [11]:
## Define a function for encoding frequent items
def encode(freq):
    res = 0
    if freq > 0:
        res = 1
    return res
# apply encoding 
basket_ip = df_cross.applymap(encode)

In [13]:
basket_ip.shape 

(25061, 404)

#### **Appriori Algorithm**

- The algorithm starts by finding frequent individual items (itemsets of size 1) in the dataset.
- It then iteratively explores larger itemsets by joining pairs of frequent itemsets from the previous step.
- At each iteration, the algorithm prunes itemsets that do not meet the minimum support threshold.
- This process continues until no more frequent itemsets can be generated.

The Apriori algorithm is applied to discover frequent itemsets in the transaction data. The `min_support` parameter is set to 0.001, determining the threshold for considering an itemset as frequent.

In [23]:
# Perform Apriori algorithm to find frequent itemsets
freq_itemset = apriori(basket_ip, min_support=0.001, use_colnames=True)
# Generate association rules based on the frequent itemsets
rules = association_rules(freq_itemset, metric='lift')
# Display the first few rules
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(1 oz Hand Sanitizer),(Google 22 oz Water Bottle),0.005147,0.027293,0.001476,0.286822,10.508829,0.001336,1.363904,0.909524
1,(Google 22 oz Water Bottle),(1 oz Hand Sanitizer),0.027293,0.005147,0.001476,0.054094,10.508829,0.001336,1.051745,0.930231
2,(1 oz Hand Sanitizer),(Google Kick Ball),0.005147,0.011213,0.001157,0.224806,20.049353,0.001099,1.275536,0.955039
3,(Google Kick Ball),(1 oz Hand Sanitizer),0.011213,0.005147,0.001157,0.103203,20.049353,0.001099,1.10934,0.960897
4,(1 oz Hand Sanitizer),(Google Laptop and Cell Phone Stickers),0.005147,0.032162,0.001037,0.20155,6.266817,0.000872,1.212147,0.844778


In [29]:
# Sort and display the top 10 rules based on support, confidence, and lift
top_rules = rules.sort_values(['support', 'confidence', 'lift'], axis=0, ascending=False).head(10)

In [31]:
top_rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
401,(Nest Cam Indoor Security Camera - USA),(Nest Cam Outdoor Security Camera - USA),0.128886,0.132796,0.027653,0.214551,1.615644,0.010537,1.104087,0.43743
400,(Nest Cam Outdoor Security Camera - USA),(Nest Cam Indoor Security Camera - USA),0.132796,0.128886,0.027653,0.208233,1.615644,0.010537,1.100216,0.439403
414,(Nest Protect Smoke + CO White Battery Alarm-USA),(Nest Learning Thermostat 3rd Gen-USA - Stainl...,0.054307,0.140098,0.009018,0.166054,1.185272,0.00141,1.031125,0.165288
415,(Nest Learning Thermostat 3rd Gen-USA - Stainl...,(Nest Protect Smoke + CO White Battery Alarm-USA),0.140098,0.054307,0.009018,0.064369,1.185272,0.00141,1.010754,0.181778
405,(Nest Protect Smoke + CO White Battery Alarm-USA),(Nest Cam Outdoor Security Camera - USA),0.054307,0.132796,0.007661,0.141073,1.062327,0.000449,1.009636,0.062039
404,(Nest Cam Outdoor Security Camera - USA),(Nest Protect Smoke + CO White Battery Alarm-USA),0.132796,0.054307,0.007661,0.057692,1.062327,0.000449,1.003592,0.067654
417,(Nest Protect Smoke + CO White Wired Alarm-USA),(Nest Learning Thermostat 3rd Gen-USA - Stainl...,0.042496,0.140098,0.007222,0.169953,1.2131,0.001269,1.035968,0.183462
416,(Nest Learning Thermostat 3rd Gen-USA - Stainl...,(Nest Protect Smoke + CO White Wired Alarm-USA),0.140098,0.042496,0.007222,0.051552,1.2131,0.001269,1.009548,0.204286
402,(Nest Protect Smoke + CO White Battery Alarm-USA),(Nest Cam Indoor Security Camera - USA),0.054307,0.128886,0.005985,0.110213,0.855124,-0.001014,0.979015,-0.151932
403,(Nest Cam Indoor Security Camera - USA),(Nest Protect Smoke + CO White Battery Alarm-USA),0.128886,0.054307,0.005985,0.04644,0.855124,-0.001014,0.991749,-0.162821


#### **Conclusion**

In the analysis using the Apriori algorithm, the top 10 association rules have been identified based on support, confidence, and lift metrics. These rules reveal significant relationships between products, indicating potential cross-selling opportunities. Below are the top 5 cross-selling product pairs:

1. **Nest Cam Indoor Security Camera - USA** is associated with **Nest Cam Outdoor Security Camera - USA**.
2. **Nest Protect Smoke + CO White Battery Alarm-USA** is associated with **Nest Learning Thermostat 3rd Gen-USA - Stainless**.
3. **Nest Protect Smoke + CO White Battery Alarm-USA** is associated with **Nest Cam Outdoor Security Camera - USA**.
4. **Nest Protect Smoke + CO White Wired Alarm-USA** is associated with **Nest Learning Thermostat 3rd Gen-USA - Stainless**.
5 . **Nest Protect Smoke + CO White Battery Alarm-USA** is associated with **Nest Cam Indoor Security Camera - USA**.


These associations suggest potential bundles or complementary products that customers frequently purchase together. Businesses can leverage this information for targeted marketing strategies, product placements, and enhancing the overall customer experience.