# Market Basket Analysis

This is my market basket analysis project on a Indian E-Commerce dataset. This project identifies hidden trends in customer purchasing behavior by analyzing what products are frequently bought together in a E-Commerce's transaction. Market basket analysis is one of the data analysis techniques that aims to identify frequently occurring itemsets or combinations of products to understand customer buying habits.

You can get the dataset also on [Kaggle E-Commerce Dataset](https://www.kaggle.com/datasets/benroshan/ecommerce-data)!

In [57]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

## Load Dataset

In [58]:
df = pd.read_csv('Dataset/Order Details.csv')
df

Unnamed: 0,Order ID,Amount,Profit,Quantity,Category,Sub-Category
0,B-25601,1275.0,-1148.0,7,Furniture,Bookcases
1,B-25601,66.0,-12.0,5,Clothing,Stole
2,B-25601,8.0,-2.0,3,Clothing,Hankerchief
3,B-25601,80.0,-56.0,4,Electronics,Electronic Games
4,B-25602,168.0,-111.0,2,Electronics,Phones
...,...,...,...,...,...,...
1495,B-26099,835.0,267.0,5,Electronics,Phones
1496,B-26099,2366.0,552.0,5,Clothing,Trousers
1497,B-26100,828.0,230.0,2,Furniture,Chairs
1498,B-26100,34.0,10.0,2,Clothing,T-shirt


In [59]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1500 entries, 0 to 1499
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Order ID      1500 non-null   object 
 1   Amount        1500 non-null   float64
 2   Profit        1500 non-null   float64
 3   Quantity      1500 non-null   int64  
 4   Category      1500 non-null   object 
 5   Sub-Category  1500 non-null   object 
dtypes: float64(2), int64(1), object(3)
memory usage: 70.4+ KB


In [60]:
df.describe()

Unnamed: 0,Amount,Profit,Quantity
count,1500.0,1500.0,1500.0
mean,287.668,15.97,3.743333
std,461.050488,169.140565,2.184942
min,4.0,-1981.0,1.0
25%,45.0,-9.25,2.0
50%,118.0,9.0,3.0
75%,322.0,38.0,5.0
max,5729.0,1698.0,14.0


In [61]:
df['Sub-Category'].value_counts()

Sub-Category
Saree               210
Hankerchief         198
Stole               192
Phones               83
Bookcases            79
Electronic Games     79
T-shirt              77
Printers             74
Chairs               74
Furnishings          73
Accessories          72
Shirt                69
Skirt                64
Leggings             53
Kurti                47
Trousers             39
Tables               17
Name: count, dtype: int64

## Create Basket

In [69]:
basket_df = df.groupby('Order ID')['Sub-Category'].agg(set).reset_index()
basket_df['Sub-Category'] = basket_df['Sub-Category'].apply(lambda x: '|'.join(x))
basket_df = basket_df['Sub-Category'].str.get_dummies('|').astype('bool')
basket_df

Unnamed: 0,Accessories,Bookcases,Chairs,Electronic Games,Furnishings,Hankerchief,Kurti,Leggings,Phones,Printers,Saree,Shirt,Skirt,Stole,T-shirt,Tables,Trousers
0,False,True,False,True,False,True,False,False,False,False,False,False,False,True,False,False,False
1,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False
2,False,False,True,False,False,True,True,False,False,False,True,False,False,True,False,False,True
3,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False
4,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,True,False,False,False,False,False,False,True,True,False,True,True,False,False,False,False,True
496,False,False,True,True,False,True,False,True,False,True,False,False,False,False,False,False,False
497,True,False,True,False,False,False,False,True,False,False,True,False,True,False,True,False,False
498,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True
