Skip to content

baherend/E-Commerce-Sales-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

E-Commerce-Sales-Analysis

A complete Exploratory Data Analysis (EDA) project on retail sales, featuring Customer Segmentation (Clusters), Market Basket Analysis, and Time-Series visualizations using Python.

πŸ“Š Retail E-Commerce Data Analysis & Customer Segmentation

πŸš€ Project Overview

This project is an end-to-end Exploratory Data Analysis (EDA) on a retail e-commerce dataset. The goal of this analysis is to transform raw, messy transaction data into actionable business strategies. By analyzing purchasing behaviors, I identified peak shopping hours, product affinities (Market Basket Analysis), and clustered customers into distinct groups to drive targeted marketing.


🧹 Data Cleaning & Preprocessing

Before diving into the analysis, the dataset required significant cleaning to ensure data integrity:

  • Missing Values: Handled over 135,000 rows missing a CustomerID.
  • Duplicates: Removed 5,268 duplicate transactions to prevent revenue inflation.
  • Returns & Cancellations: Isolated 9,071 return transactions (negative values) to calculate true net revenue accurately.
  • Label Standardization: Cleaned the Description column to group product bundles accurately.

πŸ’‘ Top 5 Business Insights

  1. The "Golden Hours": The highest volume of retail orders occurs between 12 PM and 3 PM.
  2. The "Wholesale" Spike: While regular retail orders average around $18, there is a massive revenue spike on Wednesdays at 8 PM, averaging $724 per order due to bulk buyers.
  3. Global Value: The UK provides the highest volume of transactions, but international markets (like the Netherlands) spend significantly more per visit.
  4. Impact of Returns: On specific days (e.g., Fridays at 6 PM), the value of returns outweighed new sales, highlighting times to investigate shipping or quality issues.
  5. Product Synergy: Market Basket Analysis revealed strong relationships between specific items (e.g., T-Light Holders and Candles are almost always bought together).

🎯 Customer Segmentation (3 Distinct Clusters)

Using a Log-Scale Scatter Plot, I successfully segmented the customer base into three distinct tiers based on their purchasing frequency and monetary value:

  • πŸ† Group 1: VIPs (High Spenders) * Behavior: Spend over $2,000 and frequently make large bulk orders.
  • ⭐ Group 2: Loyal Regulars * Behavior: The steady heartbeat of the business, visiting frequently and spending $18–$50 per trip.
  • πŸ›οΈ Group 3: Occasional Shoppers * Behavior: The largest group by volume, mostly consisting of one-time retail buyers.

πŸ“ˆ Strategic Action Plan

Department Recommended Strategy Data-Driven Justification
Marketing Target VIPs with "Exclusive Previews" and Occasionals with "10% Welcome Back" coupons. Personalized targeting based on clusters maximizes conversion rates.
Store Layout Place frequently bought together "Product Pairs" next to each other. Reduces friction and encourages impulse buying.
Operations Increase staff availability daily between 12 PM and 3 PM. Aligns with the peak order volume hours to ensure fast service.
Sales/Pricing Launch "Buy Both & Save" Bundle Deals for the top 5 product pairs. Encourages customers to increase their average order value.

πŸ› οΈ Technologies Used

  • Python (Pandas, NumPy)
  • Data Visualization (Matplotlib, Seaborn)
  • Market Basket Analysis (itertools, collections)

Note: The original dataset was compressed (ZIP) for easy uploading. Extract the CSV file before running the notebook.

About

A complete Exploratory Data Analysis (EDA) project on retail sales, featuring Customer Segmentation (Clusters), Market Basket Analysis, and Time-Series visualizations using Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors