# 🛍️ E-Commerce Sales Analysis & Customer Segmentation

This notebook performs:
- Exploratory sales analysis
- Product performance insights
- RFM-based customer segmentation

Dataset: Online Retail (500,000+ transactions from a UK-based e-commerce company)


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

# Plot style
plt.style.use("ggplot")

In [None]:
# Load the dataset
df = pd.read_csv("Online Retail.csv", encoding="ISO-8859-1")
df.head()

In [None]:
# Data cleaning
df = df[df['CustomerID'].notnull()]
df = df[~df['InvoiceNo'].astype(str).str.startswith('C')]
df = df[(df['Quantity'] > 0) & (df['UnitPrice'] > 0)]
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])
df['Revenue'] = df['Quantity'] * df['UnitPrice']

In [None]:
# Total revenue
print("Total Revenue:", df['Revenue'].sum())

In [None]:
# Top 10 products
top_products = df.groupby('Description')['Revenue'].sum().sort_values(ascending=False).head(10)
top_products.plot(kind='bar', figsize=(10,5), title='Top 10 Products by Revenue')
plt.ylabel('Revenue')
plt.xlabel('Product')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# Monthly revenue trend
df['Month'] = df['InvoiceDate'].dt.to_period('M')
monthly_rev = df.groupby('Month')['Revenue'].sum()
monthly_rev.plot(figsize=(12,6), marker='o', title='Monthly Revenue')
plt.ylabel('Revenue')
plt.show()

In [None]:
# Revenue by country
country_revenue = df.groupby('Country')['Revenue'].sum().sort_values(ascending=False)
country_revenue.head(10).plot(kind='bar', figsize=(10,5), title='Top Countries by Revenue')
plt.ylabel('Revenue')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

In [None]:
# RFM analysis
snapshot_date = df['InvoiceDate'].max() + dt.timedelta(days=1)

rfm = df.groupby('CustomerID').agg({
    'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
    'InvoiceNo': 'nunique',
    'Revenue': 'sum'
}).reset_index()

rfm.columns = ['CustomerID', 'Recency', 'Frequency', 'Monetary']

# RFM quartiles
rfm['R_Quartile'] = pd.qcut(rfm['Recency'], 4, labels=[4, 3, 2, 1])
rfm['F_Quartile'] = pd.qcut(rfm['Frequency'], 4, labels=[1, 2, 3, 4])
rfm['M_Quartile'] = pd.qcut(rfm['Monetary'], 4, labels=[1, 2, 3, 4])
rfm['RFM_Score'] = rfm['R_Quartile'].astype(str) + rfm['F_Quartile'].astype(str) + rfm['M_Quartile'].astype(str)

rfm.head()

In [None]:
# Champions and at-risk segments
champions = rfm[rfm['RFM_Score'] == '444']
at_risk = rfm[(rfm['R_Quartile'] == 1) & (rfm['F_Quartile'].isin(['1', '2']))]
print("Champions:", len(champions))
print("At Risk Customers:", len(at_risk))

In [None]:
# RFM visualizations
plt.figure(figsize=(12,4))

plt.subplot(1, 3, 1)
sns.histplot(rfm['Recency'], bins=20, kde=True)
plt.title("Recency Distribution")

plt.subplot(1, 3, 2)
sns.histplot(rfm['Frequency'], bins=20, kde=True)
plt.title("Frequency Distribution")

plt.subplot(1, 3, 3)
sns.histplot(rfm['Monetary'], bins=20, kde=True)
plt.title("Monetary Distribution")

plt.tight_layout()
plt.show()

## 📌 Conclusion

- UK customers generated ~90% of total revenue.
- A small % of products drove a majority of sales.
- Customer segmentation revealed:
  - 🏆 "Champions" (444 RFM score): high-frequency, high-value customers
  - ⚠️ "At-Risk" customers with low recent activity and value
- RFM allows for targeted retention and promotional strategies.
