# 4. Business Profiling - RFM Segments

**Goal:** Interpret the 4 clusters into actionable marketing personas.

**Segments:**
- **Champions:** High Spend, Frequent, Recent
- **Loyal Customers:** Good Spend/Frequency
- **At Risk:** High past value, but haven't returned
- **New / Low Value:** Low Frequency, Recent or Old

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

rfm = pd.read_csv('../data/processed/rfm_data.csv', index_col='CustomerID')
rfm_log = np.log1p(rfm)
scaler = StandardScaler()
rfm_scaled = scaler.fit_transform(rfm_log)

# Fit Final K=4 Model
km = KMeans(n_clusters=4, random_state=42, n_init=10)
rfm['Cluster'] = km.fit_predict(rfm_scaled)

## 1. Snake Plot (Standardized Profile)

Best way to visualize relative differences.

In [2]:
rfm_melted = pd.melt(rfm.reset_index(), id_vars=['CustomerID', 'Cluster'], 
                     value_vars=['Recency', 'Frequency', 'Monetary'], 
                     var_name='Metric', value_name='Value')

summary = rfm.groupby('Cluster').mean()
summary['Count'] = rfm['Cluster'].value_counts()
display(summary)

# Note: Cluster IDs are random. We must map them manually based on the table above.
# Example Logic (Adjust based on output):
# Best R, F, M -> Champions
# High R (bad), High F/M -> At Risk

Unnamed: 0_level_0,Recency,Frequency,Monetary,Count
Cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,18.725864,2.090584,538.231287,839
1,12.112033,13.634855,8015.424412,723
2,70.69738,4.076923,1791.090873,1183
3,184.023839,1.318068,342.421268,1594


## 2. Recommendation Engine

**Strategy Map:**

| Segment | Strategy |
|---------|----------|
| **Champions** | Early access, Referrals, VIP rewards |
| **Loyal** | Upsell to higher tiers, Review requests |
| **At Risk** | aggressive win-back coupons, Surveys |
| **Lost** | Ignore (save budget) or low-cost automated email |