# Retail Customer Segmentation and Sales Analysis

## Project Overview
This project analyzes retail transaction data to understand customer purchasing behaviour, identify high-level customer segments, generate actionable insights, and evaluate the impact of several factors like discounts, promotions and more on sales performance.

## Notebook Scope
In this notebook, we explore cleaned retail transaction, customer, and product-level datasets to uncover spending patterns, customer behaviour, and sales trends. The goal is to generate insights that can support business decision-making.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('default')
sns.set_theme(style='whitegrid')

## Data Loading

In [2]:
file_path_1 = r"C:\Users\abc\Documents\projects\Retail-Customer_Segmentation-and-Sales-Analysis\data\processed\transactions.csv"
df_transactions = pd.read_csv(file_path_1)
print('Transactions data loaded')

file_path_2 = r"C:\Users\abc\Documents\projects\Retail-Customer_Segmentation-and-Sales-Analysis\data\processed\customers.csv"
df_customers = pd.read_csv(file_path_2)
print('Customers data loaded')

file_path_3 = r"C:\Users\abc\Documents\projects\Retail-Customer_Segmentation-and-Sales-Analysis\data\processed\products.csv"
df_products = pd.read_csv(file_path_3)
print('Products data loaded')

Transactions data loaded
Customers data loaded
Products data loaded


## Basic Data Overview
This section provides a high-level overview of the cleaned datasets. The goal is to confirm data structure, key columns,and basic integrity before deeper exploratory analysis.

**Transactions dataset**

The transactions dataset captures transaction-level information such as date, payment method and total cost, forming the core of the analysis. It displays one transactions per row.

In [3]:
df_transactions.shape

(5000, 12)

In [4]:
df_transactions['Transaction_date'] = pd.to_datetime(df_transactions['Transaction_date'],errors='coerce')
df_transactions['Transaction_time'] = pd.to_datetime(df_transactions['Transaction_time'], format='%H:%M:%S',errors='coerce').dt.time

In [5]:
df_transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Transaction_ID    5000 non-null   int64         
 1   Customer_ID       5000 non-null   int64         
 2   Transaction_date  5000 non-null   datetime64[ns]
 3   Transaction_time  5000 non-null   object        
 4   Total_Items       5000 non-null   int64         
 5   Total_Cost        5000 non-null   float64       
 6   Payment_Method    5000 non-null   object        
 7   City              5000 non-null   object        
 8   Store_Type        5000 non-null   object        
 9   Discount_Applied  5000 non-null   bool          
 10  Season            5000 non-null   object        
 11  Promotion         5000 non-null   object        
dtypes: bool(1), datetime64[ns](1), float64(1), int64(3), object(6)
memory usage: 434.7+ KB


In [6]:
df_transactions['Transaction_ID'].nunique()

5000

In [7]:
df_transactions.head(3)

Unnamed: 0,Transaction_ID,Customer_ID,Transaction_date,Transaction_time,Total_Items,Total_Cost,Payment_Method,City,Store_Type,Discount_Applied,Season,Promotion
0,1000000000,1,2022-01-21,06:27:29,3,71.65,Mobile Payment,Los Angeles,Warehouse Club,True,Winter,No Promotion
1,1000000001,2,2023-03-01,13:01:21,2,25.93,Cash,San Francisco,Specialty Store,True,Fall,BOGO (Buy One Get One)
2,1000000002,3,2024-03-21,15:37:04,6,41.49,Credit Card,Houston,Department Store,True,Winter,No Promotion


**Customers dataset**

The customers dataset aggregates customer-level information and will be used to analyze spending behaviour and customer segements. It displays one customer per row.

In [8]:
df_customers.shape

(4830, 7)

In [9]:
df_customers['Customer_ID'].nunique()

4830

In [10]:
df_customers.head(3)

Unnamed: 0,Customer_ID,Customer_Name,Customer_Category,index,Total_Spend,Avg_Basket_Value,Total_Transactions
0,1,Stacey Price,Homemaker,0,71.65,71.65,1
1,2,Michelle Carlson,Professional,1,25.93,25.93,1
2,3,Lisa Graves,Professional,2,41.49,41.49,1


**Products dataset**

The products dataset represents individual items per transaction, enabling product-level and basket analysis. It displays one product per row

In [11]:
df_products.shape

(14911, 3)

In [13]:
df_products.head(5)

Unnamed: 0,Transaction_ID,Customer_ID,Product
0,1000000000,1,Ketchup
1,1000000000,1,Shaving Cream
2,1000000000,1,Light Bulbs
3,1000000001,2,Ice Cream
4,1000000001,2,Milk


The datasets are well-structured and ready for exploratory analysis. Transactions, customers and products have been seperated to support flexible analysis across different business dimensions.