# 🛒 Online Retail Sales Analytics

This notebook demonstrates data cleaning, exploratory data analysis (EDA), and visualization using the synthetic e-commerce dataset (60k rows).

In [None]:

import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('../data/online_retail_sales.csv', parse_dates=['OrderDate'])
df.head()


## 🔹 Step 1: Data Overview & Cleaning

In [None]:

# Check missing values
df.isnull().sum()

# Summary statistics
df.describe(include='all').T


## 🔹 Step 2: Monthly Revenue Trend

In [None]:

monthly = df.set_index('OrderDate').resample('M')['Revenue'].sum()
monthly.plot(kind='line', marker='o', figsize=(10,5))
plt.title('Monthly Revenue Trend')
plt.xlabel('Month')
plt.ylabel('Revenue')
plt.show()


## 🔹 Step 3: Top 10 Products by Revenue

In [None]:

prod_rev = df.groupby('ProductName')['Revenue'].sum().sort_values(ascending=False).head(10)
prod_rev.plot(kind='barh', figsize=(8,5))
plt.title('Top 10 Products by Revenue')
plt.xlabel('Revenue')
plt.ylabel('Product')
plt.show()


## 🔹 Step 4: Profit by Category

In [None]:

cat_profit = df.groupby('Category')['Profit'].sum().sort_values(ascending=False)
cat_profit.plot(kind='bar', figsize=(7,5))
plt.title('Total Profit by Category')
plt.xlabel('Category')
plt.ylabel('Profit')
plt.show()


## 🔹 Step 5: Customer Lifetime Value (Top 20 Customers)

In [None]:

clv = df.groupby('CustomerID')['Revenue'].sum().sort_values(ascending=False).head(20)
clv.plot(kind='barh', figsize=(8,6))
plt.title('Top 20 Customers by Revenue (CLV)')
plt.xlabel('Revenue')
plt.ylabel('CustomerID')
plt.show()


## 🔹 Step 6: Repeat vs One-Time Customers

In [None]:

order_counts = df.groupby('CustomerID')['OrderID'].nunique()
repeat_share = (order_counts > 1).mean()
plt.bar(['One-time', 'Repeat'], [1-repeat_share, repeat_share])
plt.title('Customer Repeat Rate')
plt.ylabel('Share of Customers')
plt.show()


## 🔹 Step 7: Return Rate by Category

In [None]:

ret_rate = df.groupby('Category')['IsReturned'].mean().sort_values(ascending=False)
ret_rate.plot(kind='bar', figsize=(7,5))
plt.title('Return Rate by Category')
plt.xlabel('Category')
plt.ylabel('Return Rate')
plt.show()



## 📌 Business Insights Summary
- Peak revenue in Nov–Dec (festive season)
- Top 20% customers contribute ~75% revenue
- Electronics: highest revenue; Fashion/Beauty: strong margins
- 60% of customers = one-time buyers → retention opportunity
- Fashion has highest return rate → quality/fit issue
