<h2>Market Basket Analysis</h2>

Market Basket Analysis is a data-driven technique used to uncover patterns and relationships within large transactional datasets, particularly in retail and e-commerce. It helps businesses understand which products or items are often purchased together, providing insights for optimizing product placement, marketing strategies, and promotions.

Market Basket Analysis is a valuable tool for businesses seeking to optimize their product offerings, increase cross-selling opportunities, and improve marketing strategies. It can lead to higher revenue, enhanced customer satisfaction, and overall business success.

In [1]:
# import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [2]:
# read the dataset
data = pd.read_csv("market_basket_dataset.csv")

In [3]:
# check the shape of data
data.shape

(500, 5)

In [4]:
# let's look at the first five records
data.head()

Unnamed: 0,BillNo,Itemname,Quantity,Price,CustomerID
0,1000,Apples,5,8.3,52299
1,1000,Butter,4,6.06,11752
2,1000,Eggs,4,2.66,16415
3,1000,Potatoes,4,8.1,22889
4,1004,Oranges,2,7.26,52255


In [6]:
# datatypes present
data.dtypes

BillNo          int64
Itemname       object
Quantity        int64
Price         float64
CustomerID      int64
dtype: object

In [7]:
# descriptive statistics
data.describe(include='all').T

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
BillNo,500.0,,,,1247.442,144.483097,1000.0,1120.0,1246.5,1370.0,1497.0
Itemname,500.0,19.0,Bananas,37.0,,,,,,,
Quantity,500.0,,,,2.978,1.426038,1.0,2.0,3.0,4.0,5.0
Price,500.0,,,,5.61766,2.572919,1.04,3.57,5.43,7.92,9.94
CustomerID,500.0,,,,54229.8,25672.122585,10504.0,32823.5,53506.5,76644.25,99162.0


In [8]:
# concise information
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   BillNo      500 non-null    int64  
 1   Itemname    500 non-null    object 
 2   Quantity    500 non-null    int64  
 3   Price       500 non-null    float64
 4   CustomerID  500 non-null    int64  
dtypes: float64(1), int64(3), object(1)
memory usage: 19.7+ KB


In [9]:
# check for duplicate records
data.duplicated().sum()

0

In [10]:
# check for null/missing values
data.isnull().sum()

BillNo        0
Itemname      0
Quantity      0
Price         0
CustomerID    0
dtype: int64