## Customer segmentation

Reads: data/mart/customer_segments.parquet
Purpose: prove the segmentation is meaningful and actionable

Must-have sections:

Segment sizes and share

Segment profiles (median recency/frequency/monetary per segment)

Behavioral differentiation plots (boxplots / distributions by segment)

Premium vs non-premium within segments (if relevant)

“So what?”: 1–2 recommended actions per segment (growth framing)

Optional output: export a few key plots to reports/figures/ for README.

Segmenting the customer by demographics (Variables about the customer) and behaviour (Contacts with the business).

- Top customers: Customers that are active and either returning shoppers (R >= 3 F >= 4) or big spenders (R >= 3 M >= 4)
- At risk: Used to be good customers but are now long inactive (R <= 2 F >= 3) or (R <= 2 M >= 3)
- New customers: Very recent, potencially good customers (R = 5 F <= 2 M >= 2)
- Dormant: Inactive, low engagement customers (R = 1 F <= 2 M <= 2)
- Others: Remaining customers split by premium subscription status.

In [7]:
import sys
from pathlib import Path

p = Path.cwd().resolve()
PROJECT_ROOT = next((parent for parent in [p, *p.parents] if (parent / "src").exists()), None)
if PROJECT_ROOT is None:
    raise RuntimeError("Could not find project root (folder containing 'src')")
sys.path.insert(0, str(PROJECT_ROOT / "src"))

import pandas as pd 
from config import MART_DIR

import plotly.express as px
import plotly.graph_objects as go


DATA = MART_DIR / 'customer_segments.parquet'

In [8]:
df = pd.read_parquet(DATA)

In [10]:
segments = df['segment'].value_counts().reset_index()
segments.columns = ['segment', 'Count']

fig = px.bar(
    segments, 
    x='segment',
    y='Count',
    title='Distribution by segment')

fig.show()

In [12]:
# Variation in demographic features by segment
df.groupby('segment').agg({
    'age': 'mean',
    'income': 'mean',
    'credit_score': 'mean'
}).round(2)

Unnamed: 0_level_0,age,income,credit_score
segment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
At Risk,49.04,114433.83,575.29
Dormant,48.86,113560.96,576.63
New Customers,48.66,115676.59,574.98
Others - Not premium,49.2,114723.69,575.14
Others - Premium,48.89,114483.95,573.9
Top Customers,48.92,114867.65,574.55


In [13]:
#Variation in behavioral metrics by segment
df.groupby('segment').agg({
    'frequency': 'mean',
    'recency': 'mean',
    'monetary': 'mean',
    'loyalty_points': 'mean',
    'support_tickets': 'mean'
}).round(2)

Unnamed: 0_level_0,frequency,recency,monetary,loyalty_points,support_tickets
segment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
At Risk,28.28,292.6,5623.24,2502.39,5.0
Dormant,10.37,329.3,2075.78,2496.78,4.93
New Customers,10.45,37.77,4026.96,2488.71,4.99
Others - Not premium,15.87,141.29,2718.06,2528.3,5.08
Others - Premium,15.99,142.15,2762.99,2498.06,4.95
Top Customers,31.24,109.74,6185.38,2503.07,4.99
