A complete Exploratory Data Analysis (EDA) project on retail sales, featuring Customer Segmentation (Clusters), Market Basket Analysis, and Time-Series visualizations using Python.
This project is an end-to-end Exploratory Data Analysis (EDA) on a retail e-commerce dataset. The goal of this analysis is to transform raw, messy transaction data into actionable business strategies. By analyzing purchasing behaviors, I identified peak shopping hours, product affinities (Market Basket Analysis), and clustered customers into distinct groups to drive targeted marketing.
Before diving into the analysis, the dataset required significant cleaning to ensure data integrity:
- Missing Values: Handled over 135,000 rows missing a
CustomerID. - Duplicates: Removed 5,268 duplicate transactions to prevent revenue inflation.
- Returns & Cancellations: Isolated 9,071 return transactions (negative values) to calculate true net revenue accurately.
- Label Standardization: Cleaned the
Descriptioncolumn to group product bundles accurately.
- The "Golden Hours": The highest volume of retail orders occurs between 12 PM and 3 PM.
- The "Wholesale" Spike: While regular retail orders average around $18, there is a massive revenue spike on Wednesdays at 8 PM, averaging $724 per order due to bulk buyers.
- Global Value: The UK provides the highest volume of transactions, but international markets (like the Netherlands) spend significantly more per visit.
- Impact of Returns: On specific days (e.g., Fridays at 6 PM), the value of returns outweighed new sales, highlighting times to investigate shipping or quality issues.
- Product Synergy: Market Basket Analysis revealed strong relationships between specific items (e.g., T-Light Holders and Candles are almost always bought together).
Using a Log-Scale Scatter Plot, I successfully segmented the customer base into three distinct tiers based on their purchasing frequency and monetary value:
- π Group 1: VIPs (High Spenders) * Behavior: Spend over $2,000 and frequently make large bulk orders.
- β Group 2: Loyal Regulars * Behavior: The steady heartbeat of the business, visiting frequently and spending $18β$50 per trip.
- ποΈ Group 3: Occasional Shoppers * Behavior: The largest group by volume, mostly consisting of one-time retail buyers.
| Department | Recommended Strategy | Data-Driven Justification |
|---|---|---|
| Marketing | Target VIPs with "Exclusive Previews" and Occasionals with "10% Welcome Back" coupons. | Personalized targeting based on clusters maximizes conversion rates. |
| Store Layout | Place frequently bought together "Product Pairs" next to each other. | Reduces friction and encourages impulse buying. |
| Operations | Increase staff availability daily between 12 PM and 3 PM. | Aligns with the peak order volume hours to ensure fast service. |
| Sales/Pricing | Launch "Buy Both & Save" Bundle Deals for the top 5 product pairs. | Encourages customers to increase their average order value. |
- Python (Pandas, NumPy)
- Data Visualization (Matplotlib, Seaborn)
- Market Basket Analysis (itertools, collections)
Note: The original dataset was compressed (ZIP) for easy uploading. Extract the CSV file before running the notebook.