# RFM & Cohort Analysis — VS Code Notebook (Full Analysis)

**Purpose:** A full, presentation-ready notebook for VS Code with EDA, RFM segmentation, cohort analysis, visualizations, and business insights. Open this file in VS Code's Jupyter editor and run cells top-to-bottom.

---

## 1. Setup and Imports
Import required libraries and set display options.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import timedelta
pd.options.display.max_columns = 100
print('Libraries loaded')

## 2. Load cleaned data
We use the pre-cleaned `data_cleaned.csv`. If you don't have it, run `rfm_analysis.py` first.

In [None]:
df = pd.read_csv('data_cleaned.csv', parse_dates=['InvoiceDate'])
df.head()


### Dataset basic stats

In [None]:
print('Rows:', len(df))
print('Unique customers:', df['CustomerID'].nunique())
print('Date range:', df['InvoiceDate'].min(), 'to', df['InvoiceDate'].max())
print('Total revenue:', df['TotalPrice'].sum())

## 3. RFM Summary
Load computed RFM metrics and show aggregated stats by segment.

In [None]:
rfm = pd.read_csv('rfm_summary.csv')
rfm.groupby('Segment').agg({'CustomerID':'count','Monetary':'mean','Recency':'mean','Frequency':'mean'}).rename(columns={'CustomerID':'Count'})

### 3.1 Visual: Customer count by RFM segment

In [None]:
seg = rfm['Segment'].value_counts()
plt.figure(figsize=(8,5))
plt.bar(seg.index, seg.values)
plt.title('Count of Customers by RFM Segment')
plt.xlabel('Segment')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 4. Top Products
Top products by revenue help prioritize promotions and assortments.

In [None]:
top_products = df.groupby(['StockCode','Description'])['TotalPrice'].sum().reset_index().sort_values('TotalPrice', ascending=False).head(15)
top_products

### 4.1 Visual: Top 10 products by revenue

In [None]:
prod = top_products.head(10)
plt.figure(figsize=(10,6))
plt.barh(prod['Description'][::-1], prod['TotalPrice'][::-1])
plt.title('Top 10 Products by Revenue')
plt.xlabel('Revenue')
plt.tight_layout()
plt.show()

## 5. Seasonality & Trends
Monthly revenue and order counts to reveal peaks and troughs.

In [None]:
monthly = df.set_index('InvoiceDate').resample('M').agg({'TotalPrice':'sum','InvoiceNo':lambda x: x.nunique()}).reset_index()
monthly['Month'] = monthly['InvoiceDate'].dt.to_period('M').dt.to_timestamp()
monthly.head()

In [None]:
plt.figure(figsize=(10,5))
plt.plot(monthly['Month'], monthly['TotalPrice'])
plt.title('Monthly Revenue')
plt.xlabel('Month')
plt.ylabel('Revenue')
plt.tight_layout()
plt.show()

In [None]:
plt.figure(figsize=(10,5))
plt.plot(monthly['Month'], monthly['InvoiceNo'])
plt.title('Monthly Unique Invoices (Orders)')
plt.xlabel('Month')
plt.ylabel('Unique Invoices')
plt.tight_layout()
plt.show()

## 6. ARPU (Average Revenue Per User) by RFM Segment
Compute ARPU to understand per-customer revenue contribution by segment.

In [None]:
df_rfm = df.merge(rfm[['CustomerID','Segment']], on='CustomerID', how='left')
arpu = df_rfm.groupby('Segment').agg({'CustomerID':lambda x: x.nunique(),'TotalPrice':'sum'}).reset_index()
arpu['ARPU'] = arpu['TotalPrice']/arpu['CustomerID']
arpu = arpu.sort_values('ARPU', ascending=False)
arpu

In [None]:
plt.figure(figsize=(8,5))
plt.bar(arpu['Segment'], arpu['ARPU'])
plt.title('ARPU by RFM Segment')
plt.xlabel('Segment')
plt.ylabel('ARPU')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 7. Cohort Analysis
Load cohort retention table and display heatmap for visual retention trends.

In [None]:
ret = pd.read_csv('cohort_retention.csv', parse_dates=['CohortMonth']).set_index('CohortMonth')
ret.head()

In [None]:
plt.figure(figsize=(10,6))
mat = ret.fillna(0).values
plt.imshow(mat, aspect='auto')
plt.title('Cohort Retention Matrix')
plt.ylabel('Cohort Month')
plt.xlabel('Cohort Index (months since first purchase)')
plt.yticks(range(len(ret.index)), [d.split(' ')[0] if isinstance(d,str) else str(d) for d in ret.index.astype(str)], fontsize=8)
plt.colorbar()
plt.tight_layout()
plt.show()

## 8. Actionable Insights & Next Steps
1. Prioritize **Champions** with VIP engagement and referral incentives.
2. Launch win-back campaigns for **At Risk** and **Needs Attention** segments.
3. Investigate cohorts with retention drop-offs at months 2–3 and improve early customer experience.
4. Promote top products and optimize inventory around seasonal peaks.

---

**Export:** Use VS Code's Export as PDF feature or run the notebook to render visuals inline for reports.