# RFM & Cohort Analysis — Expanded EDA
Professional-style notebook with EDA, top products, seasonality, ARPU by RFM segment, and business commentary.

**Generated automatically.**

## 1) Load data and quick overview
We load the cleaned dataset and show a few rows and basic dataset metrics.

In [None]:
import pandas as pd
import numpy as np
from datetime import timedelta
import matplotlib.pyplot as plt
pd.options.display.max_columns = 50

df = pd.read_csv('data_cleaned.csv', parse_dates=['InvoiceDate'])
df.head()

### Dataset statistics

In [None]:
print('Number of rows:', len(df))
print('Number of unique customers:', df['CustomerID'].nunique())
print('Date range:', df['InvoiceDate'].min(), 'to', df['InvoiceDate'].max())
print('Total revenue:', df['TotalPrice'].sum())

## 2) RFM Segmentation summary
Load precomputed RFM summary and show top segments.

In [None]:
rfm = pd.read_csv('rfm_summary.csv')
rfm.groupby('Segment').agg({'CustomerID':'nunique','Monetary':'mean','Recency':'mean','Frequency':'mean'}).rename(columns={'CustomerID':'Count'})

### Visualization — Count by RFM Segment

In [None]:
seg = rfm['Segment'].value_counts()
plt.figure(figsize=(8,5))
plt.bar(seg.index, seg.values)
plt.title('Count of Customers by RFM Segment')
plt.xlabel('Segment')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 3) Top Products (by revenue)
Identify the top 10 products contributing to revenue.

In [None]:
top = pd.read_csv('data_cleaned.csv')
prod = top.groupby(['StockCode','Description'])['TotalPrice'].sum().reset_index().sort_values('TotalPrice', ascending=False).head(10)
prod

### Visualization — Top 10 Products by Revenue

In [None]:
plt.figure(figsize=(10,6))
plt.barh(prod['Description'][::-1], prod['TotalPrice'][::-1])
plt.title('Top 10 Products by Revenue')
plt.xlabel('Revenue')
plt.tight_layout()
plt.show()

## 4) Seasonality: Revenue and Orders over Time

In [None]:
monthly = pd.read_csv('data_cleaned.csv', parse_dates=['InvoiceDate']).set_index('InvoiceDate').resample('M').agg({'TotalPrice':'sum','InvoiceNo':lambda x: x.nunique()}).reset_index()
monthly['Month'] = monthly['InvoiceDate'].dt.to_period('M').dt.to_timestamp()
monthly.head()

In [None]:
plt.figure(figsize=(10,5))
plt.plot(monthly['Month'], monthly['TotalPrice'])
plt.title('Monthly Revenue')
plt.xlabel('Month')
plt.ylabel('Revenue')
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(10,5))
plt.plot(monthly['Month'], monthly['InvoiceNo'])
plt.title('Monthly Unique Invoices (Orders)')
plt.xlabel('Month')
plt.ylabel('Unique Invoices')
plt.tight_layout()
plt.show()

## 5) ARPU (Average Revenue Per User) by RFM Segment

In [None]:
rfm = pd.read_csv('rfm_summary.csv')
# compute ARPU using df
df = pd.read_csv('data_cleaned.csv', parse_dates=['InvoiceDate'])
df = df.merge(rfm[['CustomerID','Segment']], on='CustomerID', how='left')
arpu = df.groupby('Segment').agg({'CustomerID':lambda x: x.nunique(),'TotalPrice':'sum'}).reset_index()
arpu['ARPU'] = arpu['TotalPrice']/arpu['CustomerID']
arpu = arpu.sort_values('ARPU', ascending=False)
arpu

In [None]:
plt.figure(figsize=(8,5))
plt.bar(arpu['Segment'], arpu['ARPU'])
plt.title('ARPU by RFM Segment')
plt.xlabel('Segment')
plt.ylabel('ARPU')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 6) Cohort Retention Quick View
Load cohort retention table and show heatmap.

In [None]:
ret = pd.read_csv('cohort_retention.csv', parse_dates=['CohortMonth']).set_index('CohortMonth')
ret.head()

In [None]:
plt.figure(figsize=(10,6))
mat = ret.fillna(0).values
plt.imshow(mat, aspect='auto')
plt.title('Cohort Retention Matrix')
plt.ylabel('Cohort Month')
plt.xlabel('Cohort Index (months since first purchase)')
plt.yticks(range(len(ret.index)), [d.split(' ')[0] if isinstance(d,str) else str(d) for d in ret.index.astype(str)], fontsize=8)
plt.colorbar()
plt.tight_layout()
plt.show()

## 7) Business Insights & Recommended Actions
- Target **Champions** with VIP programs and referral incentives.
- Re-activate **At Risk** and **Needs Attention** segments with email campaigns and time-limited discounts.
- Investigate cohorts with sharp drop-offs in months 2-3 — improve onboarding and first 30-day experience.

---

This notebook is designed for presentation to stakeholders: each section contains short commentary, visualizations, and outputs needed to make business decisions.