# Week 9: Complete Customer Lifetime Value Analysis

## Learning Objectives
This capstone notebook brings together all concepts from the week:
1. Build a complete CLV calculation pipeline
2. Perform comprehensive RFM segmentation
3. Conduct cohort retention analysis
4. Calculate customer value scores
5. Generate actionable business insights
6. Create visualizations for executive presentation

## Business Context
You're presenting a comprehensive customer analytics report to the Lagos e-commerce leadership team. Your analysis will inform:
- Marketing budget allocation
- Customer retention strategies
- Product development priorities
- Revenue forecasting

**Duration:** Full class session

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 2)
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 7)

print("Libraries loaded successfully!")

## Step 1: Data Loading and Preparation

In [None]:
# Load all datasets
customers = pd.read_csv('../datasets/customers.csv')
orders = pd.read_csv('../datasets/orders.csv')
order_items = pd.read_csv('../datasets/order_items.csv')
products = pd.read_csv('../datasets/products.csv')

print("=== Data Loaded ===")
print(f"Customers: {len(customers):,}")
print(f"Orders: {len(orders):,}")
print(f"Order Items: {len(order_items):,}")
print(f"Products: {len(products):,}")

In [None]:
# Create comprehensive dataset
data = (order_items
        .merge(orders, on='order_id', how='left')
        .merge(customers, on='customer_id', how='left')
        .merge(products, on='product_id', how='left'))

# Calculate total transaction value
data['total_price'] = data['price'] + data['freight_value']

# Parse timestamps
data['order_purchase_timestamp'] = pd.to_datetime(data['order_purchase_timestamp'])
data['order_date'] = data['order_purchase_timestamp'].dt.date
data['order_month'] = data['order_purchase_timestamp'].dt.to_period('M')
data['order_year'] = data['order_purchase_timestamp'].dt.year

print(f"\nCombined dataset: {len(data):,} rows")
print(f"Date range: {data['order_date'].min()} to {data['order_date'].max()}")

## Step 2: Customer Lifetime Value Calculation

In [None]:
# Calculate CLV components
clv_data = data.groupby('customer_id').agg(
    # Revenue metrics
    total_revenue=('total_price', 'sum'),
    avg_order_value=('total_price', 'mean'),
    
    # Purchase behavior
    total_orders=('order_id', 'nunique'),
    total_items=('order_item_id', 'count'),
    
    # Product diversity
    unique_products=('product_id', 'nunique'),
    unique_categories=('product_category_name', 'nunique'),
    
    # Time-based metrics
    first_purchase=('order_purchase_timestamp', 'min'),
    last_purchase=('order_purchase_timestamp', 'max'),
    
    # Location
    state=('customer_state', 'first'),
    city=('customer_city', 'first')
).round(2)

# Calculate derived metrics
clv_data['customer_lifetime_days'] = (clv_data['last_purchase'] - clv_data['first_purchase']).dt.days
clv_data['customer_lifetime_months'] = (clv_data['customer_lifetime_days'] / 30).round(1)
clv_data['avg_items_per_order'] = (clv_data['total_items'] / clv_data['total_orders']).round(2)

print("=== Customer Lifetime Value Summary ===")
print(clv_data.describe())
print("\n=== Top 10 Customers by CLV ===")
print(clv_data.nlargest(10, 'total_revenue')[['total_revenue', 'total_orders', 'avg_order_value', 'city']])

## Step 3: RFM Segmentation

In [None]:
# Define analysis date
analysis_date = data['order_purchase_timestamp'].max() + pd.Timedelta(days=1)

# Calculate RFM
rfm = data.groupby('customer_id').agg(
    recency=('order_purchase_timestamp', lambda x: (analysis_date - x.max()).days),
    frequency=('order_id', 'nunique'),
    monetary=('total_price', 'sum')
).round(2)

# Create RFM scores (1-4)
rfm['r_score'] = pd.qcut(rfm['recency'], 4, labels=[4, 3, 2, 1], duplicates='drop')
rfm['f_score'] = pd.qcut(rfm['frequency'].rank(method='first'), 4, labels=[1, 2, 3, 4], duplicates='drop')
rfm['m_score'] = pd.qcut(rfm['monetary'], 4, labels=[1, 2, 3, 4], duplicates='drop')

# Combined RFM score
rfm['rfm_score'] = rfm['r_score'].astype(str) + rfm['f_score'].astype(str) + rfm['m_score'].astype(str)

print("=== RFM Distribution ===")
print(rfm.describe())

In [None]:
# Create business segments
def classify_customer(row):
    r = int(row['r_score'])
    f = int(row['f_score'])
    m = int(row['m_score'])
    
    if r >= 4 and f >= 4 and m >= 4:
        return 'Champions'
    elif r >= 3 and f >= 3 and m >= 3:
        return 'Loyal Customers'
    elif r >= 4 and f <= 2:
        return 'New Customers'
    elif r <= 2 and f >= 3:
        return 'At Risk'
    elif r <= 2 and f <= 2:
        return 'Lost'
    elif m >= 4:
        return 'Big Spenders'
    elif r >= 3:
        return 'Potential Loyalists'
    else:
        return 'Need Attention'

rfm['segment'] = rfm.apply(classify_customer, axis=1)

# Analyze segments
segment_analysis = rfm.groupby('segment').agg(
    customer_count=('recency', 'count'),
    avg_recency=('recency', 'mean'),
    avg_frequency=('frequency', 'mean'),
    avg_monetary=('monetary', 'mean'),
    total_revenue=('monetary', 'sum')
).round(2)

segment_analysis['revenue_pct'] = (segment_analysis['total_revenue'] / segment_analysis['total_revenue'].sum() * 100).round(2)
segment_analysis = segment_analysis.sort_values('total_revenue', ascending=False)

print("=== RFM Segment Analysis ===")
print(segment_analysis)

In [None]:
# Visualize segment distribution
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Customer count by segment
segment_analysis['customer_count'].plot(kind='barh', ax=axes[0], color='steelblue')
axes[0].set_title('Customer Count by Segment', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Number of Customers')

# Revenue contribution by segment
segment_analysis['revenue_pct'].plot(kind='barh', ax=axes[1], color='coral')
axes[1].set_title('Revenue Contribution by Segment (%)', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Percentage of Total Revenue')

plt.tight_layout()
plt.show()

## Step 4: Cohort Analysis

In [None]:
# Create cohort based on first purchase month
customer_cohort = data.groupby('customer_id')['order_purchase_timestamp'].min().reset_index()
customer_cohort.columns = ['customer_id', 'cohort_date']
customer_cohort['cohort_month'] = customer_cohort['cohort_date'].dt.to_period('M')

# Merge cohort info with transaction data
data_with_cohort = data.merge(customer_cohort[['customer_id', 'cohort_month']], on='customer_id')

# Calculate months since cohort
data_with_cohort['order_period'] = data_with_cohort['order_purchase_timestamp'].dt.to_period('M')
data_with_cohort['months_since_cohort'] = (data_with_cohort['order_period'] - 
                                            data_with_cohort['cohort_month']).apply(lambda x: x.n)

print("=== Cohort Data Sample ===")
print(data_with_cohort[['customer_id', 'cohort_month', 'order_period', 'months_since_cohort']].head(10))

In [None]:
# Create cohort table
cohort_data = data_with_cohort.groupby(['cohort_month', 'months_since_cohort'])['customer_id'].nunique().reset_index()
cohort_data.columns = ['cohort_month', 'months_since_cohort', 'num_customers']

# Pivot to create retention matrix
cohort_pivot = cohort_data.pivot_table(
    index='cohort_month',
    columns='months_since_cohort',
    values='num_customers'
)

# Calculate retention rates
cohort_size = cohort_pivot.iloc[:, 0]
retention_matrix = cohort_pivot.divide(cohort_size, axis=0) * 100

print("=== Cohort Retention Matrix (%) ===")
print(retention_matrix.round(1))

In [None]:
# Visualize cohort retention
plt.figure(figsize=(14, 8))
sns.heatmap(retention_matrix, annot=True, fmt='.1f', cmap='YlGnBu', vmin=0, vmax=100,
            cbar_kws={'label': 'Retention Rate (%)'})
plt.title('Customer Cohort Retention Analysis', fontsize=16, fontweight='bold')
plt.xlabel('Months Since First Purchase')
plt.ylabel('Cohort Month')
plt.tight_layout()
plt.show()

## Step 5: Geographic Analysis

In [None]:
# Nigerian state performance
nigerian_states = ['LA', 'AB', 'PH', 'KA']
nigerian_data = data[data['customer_state'].isin(nigerian_states)]

state_performance = nigerian_data.groupby('customer_state').agg(
    customers=('customer_id', 'nunique'),
    orders=('order_id', 'nunique'),
    revenue=('total_price', 'sum'),
    avg_order=('total_price', 'mean')
).round(2)

state_performance['orders_per_customer'] = (state_performance['orders'] / 
                                             state_performance['customers']).round(2)

# Map state codes to names
state_names = {'LA': 'Lagos', 'AB': 'Abuja', 'PH': 'Port Harcourt', 'KA': 'Kano'}
state_performance.index = state_performance.index.map(state_names)

print("=== Nigerian State Performance ===")
print(state_performance.sort_values('revenue', ascending=False))

## Step 6: Product Category Insights

In [None]:
# Top categories by revenue
category_performance = data.groupby('product_category_name').agg(
    revenue=('total_price', 'sum'),
    orders=('order_id', 'nunique'),
    customers=('customer_id', 'nunique'),
    avg_price=('total_price', 'mean')
).round(2).sort_values('revenue', ascending=False)

print("=== Top 10 Product Categories ===")
print(category_performance.head(10))

# Visualize
category_performance.head(10)['revenue'].plot(kind='barh', color='green')
plt.title('Top 10 Product Categories by Revenue', fontsize=14, fontweight='bold')
plt.xlabel('Revenue (₦)')
plt.tight_layout()
plt.show()

## Step 7: Executive Summary & Recommendations

In [None]:
# Generate executive summary metrics
total_customers = data['customer_id'].nunique()
total_revenue = data['total_price'].sum()
total_orders = data['order_id'].nunique()
avg_clv = clv_data['total_revenue'].mean()
repeat_rate = (clv_data[clv_data['total_orders'] > 1].shape[0] / total_customers * 100)

print("="*60)
print("EXECUTIVE SUMMARY: CUSTOMER LIFETIME VALUE ANALYSIS")
print("="*60)
print(f"\nTotal Customers: {total_customers:,}")
print(f"Total Revenue: ₦{total_revenue:,.2f}")
print(f"Total Orders: {total_orders:,}")
print(f"Average Customer Lifetime Value: ₦{avg_clv:,.2f}")
print(f"Repeat Purchase Rate: {repeat_rate:.1f}%")
print(f"\nAverage Order Value: ₦{(total_revenue/total_orders):,.2f}")
print(f"Orders per Customer: {(total_orders/total_customers):.2f}")

print("\n" + "="*60)
print("KEY INSIGHTS")
print("="*60)
print(f"\n1. Customer Segments:")
print(f"   - Champions: {segment_analysis.loc['Champions', 'customer_count'] if 'Champions' in segment_analysis.index else 0} customers (highest value)")
print(f"   - At Risk: {segment_analysis.loc['At Risk', 'customer_count'] if 'At Risk' in segment_analysis.index else 0} customers (need retention efforts)")
print(f"   - Lost: {segment_analysis.loc['Lost', 'customer_count'] if 'Lost' in segment_analysis.index else 0} customers (win-back campaigns)")

print(f"\n2. Geographic Performance:")
print(f"   - Lagos leads with ₦{state_performance.loc['Lagos', 'revenue']:,.2f} in revenue")
print(f"   - {state_performance['orders_per_customer'].idxmax()} has highest engagement")

print(f"\n3. Product Categories:")
top_cat = category_performance.index[0]
print(f"   - {top_cat} is the top revenue driver")
print(f"   - {len(category_performance)} active categories")

print("\n" + "="*60)
print("RECOMMENDATIONS")
print("="*60)
print("\n1. Focus on Champions & Loyal Customers:")
print("   - Implement VIP loyalty program")
print("   - Exclusive early access to new products")
print("   - Personalized recommendations")

print("\n2. Retain At-Risk Customers:")
print("   - Re-engagement email campaigns")
print("   - Special discount offers")
print("   - Survey to understand pain points")

print("\n3. Win Back Lost Customers:")
print("   - Targeted win-back campaigns")
print("   - Limited-time incentives")
print("   - Product improvements communication")

print("\n4. Geographic Expansion:")
print("   - Strengthen presence in underperforming states")
print("   - Localized marketing campaigns")
print("   - Optimize delivery for all regions")

print("\n5. Product Strategy:")
print("   - Expand top-performing categories")
print("   - Cross-sell complementary products")
print("   - Phase out low-performing categories")
print("\n" + "="*60)


## Conclusion

This comprehensive CLV analysis provides actionable insights for:
1. **Customer Segmentation:** Identify and target high-value segments
2. **Retention Strategies:** Reduce churn through timely interventions
3. **Revenue Optimization:** Focus resources on profitable customer groups
4. **Geographic Expansion:** Understand regional performance patterns
5. **Product Planning:** Align inventory with customer preferences

### Next Steps for Your Business
1. Implement automated RFM scoring for real-time segmentation
2. Set up monthly cohort tracking dashboards
3. Create targeted marketing campaigns for each segment
4. Monitor CLV trends to measure improvement
5. A/B test retention strategies on at-risk customers

---
**PORA Academy Cohort 5 - Week 9 Wednesday Python**  
*Customer Lifetime Value Analysis - Complete Pipeline*