# Post 17: CLV-Based Customer Segmentation

## The Problem

The CEO calls a joint meeting with Marketing and Finance.

"We have a problem. Marketing segments customers by behavior. Finance segments by revenue. Sales segments by deal size. No one agrees on who our 'best customers' actually are."

Every team uses different criteria:
- Marketing: "High engagers" (opens every email)
- Finance: "High revenue" (spent $10K+ last year)
- Sales: "Enterprise accounts" (large deal sizes)
- Customer Success: "Active users" (logs in daily)

**The result? Chaos.**
- Marketing sends VIP offers to customers who spent $50 total
- Finance calls low-engagement whales "at risk" when they're perfectly happy
- No unified view of customer value
- Resources spread thin across conflicting priorities

**The real question:** Can we create ONE segmentation that combines behavior AND financial valueâ€”so every team focuses on the right customers?

---

## Why This Solution?

Traditional segmentation fails:
- **Behavior-only:** High engagement â‰  high revenue
- **Revenue-only:** Past spend â‰  future value
- **Recency-Frequency-Monetary (RFM):** Static, doesn't predict future behavior
- **Manual bucketing:** Arbitrary thresholds ("$1K+ = VIP"), misses patterns

**Machine Learning (K-Means Clustering) solves this by:**
- Combining behavioral AND financial signals
- Finding natural customer groups (not arbitrary buckets)
- Predicting lifetime value, not just past spend
- Creating actionable, data-driven segments
- Enabling personalized strategies per segment

**Why K-Means Clustering?**

Unlike supervised learning you've seen in previous posts, K-Means is **unsupervised**:
- No labels needed (no "this customer is VIP" training data)
- Discovers hidden patterns in customer data
- Groups similar customers automatically
- Scales to millions of customers

---

## The Solution (PM Perspective)

### What We Built

A unified CLV-based segmentation system that:
1. Calculates customer lifetime value (CLV) for every customer
2. Combines CLV with behavioral signals (engagement, purchase frequency, refund rate)
3. Uses K-Means clustering to find natural customer segments
4. Assigns intuitive names and strategies to each segment
5. Enables cross-functional alignment on "best customers"

### How It Works

**Step 1: Calculate Customer Lifetime Value (CLV)**

CLV formula (simplified):


For each customer:
- **Avg Order Value:** $80 (customer A) vs $200 (customer B)
- **Purchase Frequency:** 3/year vs 12/year
- **Expected Lifetime:** Based on tenure + engagement signals
- **Refund Rate:** Adjust for returns

**Example:**
- Customer A: ($80 Ã— 3 Ã— 2 years) = $480 CLV
- Customer B: ($200 Ã— 12 Ã— 3 years) = $7,200 CLV

**Step 2: Feature Engineering**

Combine CLV with behavioral signals:

| Feature | Why It Matters |
|---------|----------------|
| **CLV** | Financial value |
| **Purchase Frequency** | Loyalty signal |
| **Avg Order Value** | Spending capacity |
| **Tenure** | Relationship length |
| **Email Engagement** | Brand affinity |
| **Product Categories** | Cross-sell potential |
| **Days Since Last Purchase** | Recency/activity |
| **Refund Rate** | Satisfaction signal |

**Step 3: K-Means Clustering**

Algorithm finds natural groups:
1. Start with random cluster centers
2. Assign each customer to nearest center
3. Recalculate centers based on assignments
4. Repeat until convergence

**Optimal K (number of clusters):**
- Test K=2 to K=8
- Measure silhouette score (how well-separated are clusters?)
- **Result: K=2 optimal** (highest silhouette score: 0.487)

**Step 4: Segment Naming & Strategy**

| Segment | Size | Avg CLV | Total Value | Strategy |
|---------|------|---------|-------------|----------|
| **High Value** | 1,418 (40.5%) | $3,853 | $5.46M | Upsell, loyalty rewards, VIP treatment |
| **Low Value** | 2,082 (59.5%) | $590 | $1.23M | Activation, entry offers, engagement boost |

**Step 5: Cross-Functional Alignment**

Now every team uses the same segmentation:
- **Marketing:** Focus 80% budget on High Value (81% of revenue)
- **Finance:** Forecast revenue by segment, not arbitrary buckets
- **Sales:** Prioritize High Value accounts for upsells
- **Product:** Build features for High Value behavior patterns

---

## Key Insights

### 1. The 80/20 Rule is Real (But It's 81/40)

**81% of revenue comes from 40% of customers.**

Not the classic "80% from 20%"â€”but close enough to guide strategy.

**Action:** Allocate resources proportionally:
- 80% of marketing budget â†’ High Value
- 70% of product development â†’ High Value use cases
- 90% of customer success resources â†’ High Value accounts

### 2. High Value â‰  Big Spenders (It's Frequency Ã— Value)

Common myth: "High CLV = customers who spent the most."

**Reality:**
- Customer A: 1 purchase of $5,000 = $5,000 CLV
- Customer B: 12 purchases of $200 = $7,200 CLV (over 3 years)

**Customer B is more valuable** (repeat behavior > one-time spend).

**Action:** Don't just chase whales. Nurture frequent, moderate spenders.

### 3. Engagement Predicts Future Value

High Value customers:
- Email engagement: 0.68 avg
- Purchase frequency: 11.2/year
- Product categories: 3.4 avg

Low Value customers:
- Email engagement: 0.34 avg
- Purchase frequency: 3.1/year
- Product categories: 1.8 avg

**Engagement drives frequency. Frequency drives CLV.**

### 4. Tenure Alone Doesn't Equal Value

Long-tenured customers aren't always high-value:
- Some stay for years but buy infrequently (low CLV)
- Some are new but buy frequently (high CLV potential)

**Don't confuse loyalty with value.**

### 5. Low Value Isn't "Bad"â€”It's Opportunity

Low Value segment = $1.23M total value (19% of revenue).

**That's not nothing.**

If you can activate even 10% of Low Value to behave like High Value:
- 208 customers Ã— $3,263 incremental CLV = $678K additional revenue

**Action:** Test activation campaigns, onboarding improvements, engagement triggers.

---

## Business Impact

### Immediate Value

**For Marketing:**
- Stop wasting budget on low-value customers
- 80% budget allocation â†’ 81% revenue contribution (aligned!)
- Personalized campaigns by segment (not one-size-fits-all)

**For Finance:**
- Forecast revenue by segment, not guesswork
- Understand concentration risk (40% of customers = 81% revenue)
- Allocate customer success resources efficiently

**For Product:**
- Build features High Value customers actually use
- Test new features with High Value first
- Prioritize based on value, not noise

### Quantifiable Impact

CLV-based segmentation typically delivers:
- **15-25% improvement** in marketing ROI (better targeting)
- **10-15% revenue growth** (from focusing on high-value)
- **20-30% cost reduction** (stop serving unprofitable customers)
- **Cross-functional alignment** (everyone uses same segments)

### Real-World Example

**Before Segmentation (Spray and Pray):**
- Marketing budget: $500K/year
- Spread evenly across all 3,500 customers
- Cost per customer: $143
- Revenue impact: Unclear, unmeasured

**After CLV Segmentation:**
- Allocate 80% budget ($400K) â†’ High Value (1,418 customers)
- Cost per High Value customer: $282 (2x investment)
- Allocate 20% budget ($100K) â†’ Low Value activation
- **Result:**
  - High Value retention improves: 95% â†’ 97% (+2% = $109K saved)
  - Low Value activation: 208 customers move to High Value = $678K
  - **Total incremental revenue: $787K on $500K spend = 1.57x ROI**

---

## Why This Matters for PMs

**You don't need a data science degree to segment customers intelligently.**

What you need:
1. **The business problem:** Conflicting segmentations create chaos
2. **Why clustering helps:** Finds natural groups, not arbitrary buckets
3. **How to operationalize:** Segment â†’ Strategy â†’ Measure â†’ Iterate
4. **What to measure:** Revenue per segment, activation rates, retention

This is **customer intelligence ML**â€”aligning teams around unified customer truth.

---

## What's Next?

**Immediate Actions:**
- Roll out CLV segments company-wide (Marketing, Sales, Finance, Product)
- Personalize experiences by segment (VIP treatment for High Value)
- Test activation campaigns for Low Value (can we move them up?)
- Track: segment migration, revenue per segment, engagement lift

**Iterative Improvements:**
- Add more granular segments (Champions, Loyal, Promising, At Risk)
- Dynamic re-segmentation (monthly updates based on behavior)
- Predictive CLV (forecast future value, not just historical)
- Propensity scoring (which Low Value customers can become High Value?)

**Advanced Opportunities:**
- Real-time segmentation (update segments as behavior changes)
- Multi-dimensional clustering (add product affinity, channel preference)
- Causal modeling (does VIP treatment actually increase CLV?)
- Segment-specific products (build for High Value needs)

---

## PM Takeaways

âœ… **Start with the pain:** Conflicting segmentations waste resources  
âœ… **Use unsupervised learning:** K-Means finds patterns you didn't know existed  
âœ… **Combine signals:** Behavior + financial = complete customer view  
âœ… **Make it actionable:** Clear strategies per segment, not just labels  
âœ… **Measure impact:** Revenue per segment, not just cluster quality

**The goal:** Turn segmentation from art into scienceâ€”and align the entire company around it.

---



In [1]:
# Post 17: CLV-Based Customer Segmentation
# Complete Python Solution - K-Means Clustering for Unified Segmentation

# ============================================================================
# PART 1: SETUP AND DATA LOADING
# ============================================================================

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import silhouette_score, davies_bouldin_score
import warnings
warnings.filterwarnings('ignore')

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

print("="*70)
print("POST 17: CLV-BASED CUSTOMER SEGMENTATION")
print("K-Means Clustering for Unified Segmentation")
print("="*70)

# Load data
clv_df = pd.read_csv('cdp_clv_segmentation.csv')

print(f"\nðŸ“Š Dataset Overview:")
print(f"Total Customers: {len(clv_df):,}")
print(f"Average CLV: ${clv_df['predicted_clv'].mean():,.2f}")
print(f"Median CLV: ${clv_df['predicted_clv'].median():,.2f}")
print(f"CLV Range: ${clv_df['predicted_clv'].min():,.2f} - ${clv_df['predicted_clv'].max():,.2f}")
print(f"\nFirst 5 rows:")
print(clv_df.head())

# ============================================================================
# PART 2: EXPLORATORY DATA ANALYSIS
# ============================================================================

print("\n" + "="*70)
print("EXPLORATORY DATA ANALYSIS")
print("="*70)

print(f"\nBasic Statistics:")
print(clv_df.describe().round(2))

print(f"\nCLV Segment Distribution (Rule-Based):")
print(clv_df['clv_segment'].value_counts())

print(f"\nAverage Metrics by Rule-Based Segment:")
segment_analysis = clv_df.groupby('clv_segment').agg({
    'predicted_clv': ['mean', 'count', 'sum'],
    'purchase_frequency': 'mean',
    'avg_order_value': 'mean',
    'email_engagement_score': 'mean'
}).round(2)
print(segment_analysis)

# ============================================================================
# PART 3: FEATURE PREPARATION FOR CLUSTERING
# ============================================================================

print("\n" + "="*70)
print("FEATURE PREPARATION FOR CLUSTERING")
print("="*70)

# Select features for clustering (mix of behavioral and financial)
cluster_features = [
    'purchase_frequency',
    'avg_order_value',
    'customer_tenure_months',
    'email_engagement_score',
    'product_categories_purchased',
    'days_since_last_purchase',
    'refund_rate',
    'predicted_clv'
]

X = clv_df[cluster_features]

print(f"\nFeatures selected for clustering:")
for i, feature in enumerate(cluster_features, 1):
    print(f"  {i}. {feature}")

print(f"\nFeature Statistics:")
print(X.describe().round(2))

# Standardize features (critical for K-Means distance calculation)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print(f"\nâœ… Features standardized using StandardScaler")
print(f"Mean of scaled features: {X_scaled.mean(axis=0).mean():.6f}")
print(f"Std of scaled features: {X_scaled.std(axis=0).mean():.6f}")

# ============================================================================
# PART 4: DETERMINE OPTIMAL NUMBER OF CLUSTERS
# ============================================================================

print("\n" + "="*70)
print("FINDING OPTIMAL NUMBER OF CLUSTERS")
print("="*70)

# Test different numbers of clusters (Elbow Method + Silhouette)
inertias = []
silhouette_scores = []
davies_bouldin_scores = []
k_range = range(2, 9)

print(f"\nTesting K from 2 to 8...")
for k in k_range:
    kmeans_temp = KMeans(n_clusters=k, random_state=42, n_init=10)
    cluster_labels_temp = kmeans_temp.fit_predict(X_scaled)
    inertias.append(kmeans_temp.inertia_)
    silhouette_scores.append(silhouette_score(X_scaled, cluster_labels_temp))
    davies_bouldin_scores.append(davies_bouldin_score(X_scaled, cluster_labels_temp))

print(f"\nCluster Evaluation Metrics:")
print(f"{'K':<5} {'Inertia':<15} {'Silhouette':<15} {'Davies-Bouldin':<15}")
print("-" * 50)
for k, inertia, silhouette, db in zip(k_range, inertias, silhouette_scores, davies_bouldin_scores):
    print(f"{k:<5} {inertia:<15.2f} {silhouette:<15.4f} {db:<15.4f}")

# Choose optimal k (highest silhouette score)
optimal_k = list(k_range)[np.argmax(silhouette_scores)]
max_silhouette = max(silhouette_scores)

print(f"\nðŸŽ¯ Optimal K selected: {optimal_k}")
print(f"   (Highest silhouette score: {max_silhouette:.4f})")

# ============================================================================
# PART 5: TRAIN K-MEANS WITH OPTIMAL K
# ============================================================================

print("\n" + "="*70)
print(f"TRAINING K-MEANS WITH K={optimal_k}")
print("="*70)

kmeans_final = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
cluster_labels = kmeans_final.fit_predict(X_scaled)

# Add cluster labels to dataframe
clv_df['ml_cluster'] = cluster_labels

# Calculate metrics
silhouette_avg = silhouette_score(X_scaled, cluster_labels)
davies_bouldin_avg = davies_bouldin_score(X_scaled, cluster_labels)

print(f"\nâœ… K-Means training complete!")
print(f"Number of clusters: {optimal_k}")
print(f"Silhouette Score: {silhouette_avg:.4f} (higher is better)")
print(f"Davies-Bouldin Index: {davies_bouldin_avg:.4f} (lower is better)")

# ============================================================================
# PART 6: CLUSTER ANALYSIS
# ============================================================================

print("\n" + "="*70)
print("CLUSTER CHARACTERISTICS")
print("="*70)

print(f"\nCluster Size Distribution:")
print(clv_df['ml_cluster'].value_counts().sort_index())

# Analyze each cluster
print(f"\nDetailed Cluster Analysis:")
for cluster_id in range(optimal_k):
    cluster_data = clv_df[clv_df['ml_cluster'] == cluster_id]
    
    print(f"\n{'='*60}")
    print(f"CLUSTER {cluster_id} ({len(cluster_data):,} customers)")
    print(f"{'='*60}")
    print(f"Average CLV: ${cluster_data['predicted_clv'].mean():,.2f}")
    print(f"Total CLV: ${cluster_data['predicted_clv'].sum():,.2f}")
    print(f"Purchase Frequency (avg): {cluster_data['purchase_frequency'].mean():.1f}")
    print(f"Avg Order Value: ${cluster_data['avg_order_value'].mean():.2f}")
    print(f"Customer Tenure (avg): {cluster_data['customer_tenure_months'].mean():.1f} months")
    print(f"Email Engagement (avg): {cluster_data['email_engagement_score'].mean():.2f}")
    print(f"Product Categories (avg): {cluster_data['product_categories_purchased'].mean():.1f}")
    print(f"Days Since Last Purchase (avg): {cluster_data['days_since_last_purchase'].mean():.0f}")
    print(f"Refund Rate (avg): {cluster_data['refund_rate'].mean():.1%}")

# ============================================================================
# PART 7: CLUSTER NAMING & INTERPRETATION
# ============================================================================

print("\n" + "="*70)
print("CLUSTER NAMING & BUSINESS INTERPRETATION")
print("="*70)

# Assign intuitive names based on CLV
cluster_clv_avg = clv_df.groupby('ml_cluster')['predicted_clv'].mean().sort_values(ascending=False)
cluster_ranks = cluster_clv_avg.index.tolist()

# Create naming map
cluster_names_map = {}
if optimal_k >= 5:
    names = ["Champions (VIP)", "High Value", "Medium Value", "Developing", "At Risk"]
    for i, cluster_id in enumerate(cluster_ranks[:5]):
        cluster_names_map[cluster_id] = names[i]
    for i, cluster_id in enumerate(cluster_ranks[5:], 5):
        cluster_names_map[cluster_id] = f"Cluster {cluster_id}"
elif optimal_k == 4:
    names = ["Champions", "Loyal", "Promising", "At Risk"]
    for i, cluster_id in enumerate(cluster_ranks):
        cluster_names_map[cluster_id] = names[i]
elif optimal_k == 3:
    names = ["High Value", "Medium Value", "Low Value"]
    for i, cluster_id in enumerate(cluster_ranks):
        cluster_names_map[cluster_id] = names[i]
else:  # optimal_k == 2
    names = ["High Value", "Low Value"]
    for i, cluster_id in enumerate(cluster_ranks):
        cluster_names_map[cluster_id] = names[i]

clv_df['cluster_name'] = clv_df['ml_cluster'].map(cluster_names_map)

print(f"\nCluster Names and Characteristics:")
for cluster_id in sorted(cluster_names_map.keys()):
    cluster_name = cluster_names_map[cluster_id]
    cluster_data = clv_df[clv_df['ml_cluster'] == cluster_id]
    pct_customers = len(cluster_data) / len(clv_df) * 100
    pct_revenue = cluster_data['predicted_clv'].sum() / clv_df['predicted_clv'].sum() * 100
    
    print(f"\n{cluster_name}:")
    print(f"  Size: {len(cluster_data):,} customers ({pct_customers:.1f}%)")
    print(f"  Avg CLV: ${cluster_data['predicted_clv'].mean():,.0f}")
    print(f"  Total Value: ${cluster_data['predicted_clv'].sum():,.0f} ({pct_revenue:.1f}% of revenue)")
    print(f"  Purchase Frequency: {cluster_data['purchase_frequency'].mean():.1f}/year")
    print(f"  Email Engagement: {cluster_data['email_engagement_score'].mean():.2f}")

# ============================================================================
# PART 8: BUSINESS RECOMMENDATIONS
# ============================================================================

print("\n" + "="*70)
print("RECOMMENDED BUSINESS STRATEGIES BY CLUSTER")
print("="*70)

strategies = {
    "Champions (VIP)": "VIP treatment, exclusive access, premium support, early product access",
    "Champions": "VIP treatment, loyalty rewards, exclusive offers, referral incentives",
    "High Value": "Upsell premium products, loyalty rewards, personalized recommendations",
    "Loyal": "Loyalty rewards, personalized offers, referral incentives, VIP access",
    "Medium Value": "Engagement campaigns, cross-sell, feature education, upgrade paths",
    "Promising": "Nurture campaigns, onboarding optimization, engagement boost, feature demos",
    "Developing": "Engagement campaigns, product education, special offers, feature unlocks",
    "At Risk": "Win-back campaigns, deep discounts, re-engagement, customer success outreach",
    "Low Value": "Activation campaigns, entry-level offers, engagement boost, trial extensions"
}

for cluster_id in sorted(cluster_names_map.keys()):
    cluster_name = cluster_names_map[cluster_id]
    cluster_data = clv_df[clv_df['ml_cluster'] == cluster_id]
    strategy = strategies.get(cluster_name, "Targeted engagement campaigns")
    
    print(f"\n{cluster_name}:")
    print(f"  Strategy: {strategy}")
    print(f"  Expected Budget Allocation: {len(cluster_data)/len(clv_df)*100:.0f}%")

# ============================================================================
# PART 9: DIMENSIONALITY REDUCTION (PCA)
# ============================================================================

print("\n" + "="*70)
print("DIMENSIONALITY REDUCTION FOR VISUALIZATION (PCA)")
print("="*70)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

clv_df['pca_1'] = X_pca[:, 0]
clv_df['pca_2'] = X_pca[:, 1]

explained_variance = pca.explained_variance_ratio_
print(f"\nPCA Explained Variance:")
print(f"  PC1: {explained_variance[0]:.2%}")
print(f"  PC2: {explained_variance[1]:.2%}")
print(f"  Total: {explained_variance.sum():.2%}")

# ============================================================================
# PART 10: REVENUE CONCENTRATION ANALYSIS
# ============================================================================

print("\n" + "="*70)
print("REVENUE CONCENTRATION ANALYSIS")
print("="*70)

revenue_by_cluster = clv_df.groupby('cluster_name').agg({
    'customer_id': 'count',
    'predicted_clv': ['mean', 'sum']
}).round(2)
revenue_by_cluster.columns = ['Customer_Count', 'Avg_CLV', 'Total_CLV']
revenue_by_cluster['Pct_Customers'] = (revenue_by_cluster['Customer_Count'] / len(clv_df) * 100).round(1)
revenue_by_cluster['Pct_Revenue'] = (revenue_by_cluster['Total_CLV'] / clv_df['predicted_clv'].sum() * 100).round(1)
revenue_by_cluster = revenue_by_cluster.sort_values('Total_CLV', ascending=False)

print("\n" + revenue_by_cluster.to_string())

# Key insight
top_segment = revenue_by_cluster.index[0]
top_pct_customers = revenue_by_cluster.loc[top_segment, 'Pct_Customers']
top_pct_revenue = revenue_by_cluster.loc[top_segment, 'Pct_Revenue']

print(f"\nðŸŽ¯ Key Insight:")
print(f"   {top_pct_revenue:.0f}% of revenue comes from {top_pct_customers:.0f}% of customers")
print(f"   ({top_segment} segment)")

# ============================================================================
# PART 11: EXPORT RESULTS
# ============================================================================

print("\n" + "="*70)
print("EXPORT SEGMENTATION RESULTS")
print("="*70)

output_df = clv_df[['customer_id', 'predicted_clv', 'ml_cluster', 'cluster_name',
                     'purchase_frequency', 'avg_order_value', 'email_engagement_score',
                     'customer_tenure_months']].copy()
output_df = output_df.sort_values('predicted_clv', ascending=False)
output_df.to_csv('clv_segment_predictions.csv', index=False)

print(f"\nâœ… Segmentation results exported to 'clv_segment_predictions.csv'")
print(f"   ({len(output_df):,} customers with cluster assignments)")

# ============================================================================
# PART 12: FINAL SUMMARY
# ============================================================================

print("\n" + "="*70)
print("âœ… COMPLETE SOLUTION SUMMARY")
print("="*70)

print(f"\nðŸ“Š Segmentation Results:")
print(f"   Optimal clusters: {optimal_k}")
print(f"   Silhouette score: {silhouette_avg:.4f}")
print(f"   Total customers: {len(clv_df):,}")
print(f"   Total CLV: ${clv_df['predicted_clv'].sum():,.0f}")

print(f"\nðŸ’¼ Business Impact:")
print(f"   Revenue concentration: {top_pct_revenue:.0f}% from {top_pct_customers:.0f}% of customers")
print(f"   Segments created: {optimal_k}")
print(f"   Actionable strategies: {optimal_k}")

print(f"\nðŸŽ¯ Next Steps:")
print(f"   1. Roll out segmentation to all teams (Marketing, Finance, Sales, Product)")
print(f"   2. Create segment-specific strategies and campaigns")
print(f"   3. Monitor segment migration (are Low Value moving to High Value?)")
print(f"   4. Measure impact: revenue per segment, acquisition cost per segment")

print("\n" + "="*70)


POST 17: CLV-BASED CUSTOMER SEGMENTATION
K-Means Clustering for Unified Segmentation

ðŸ“Š Dataset Overview:
Total Customers: 3,500
Average CLV: $1,912.00
Median CLV: $813.18
CLV Range: $23.82 - $41,896.55

First 5 rows:
  customer_id  purchase_frequency  avg_order_value  total_revenue  \
0   CUST00001                   7            82.85         579.95   
1   CUST00002                   2            65.69         131.38   
2   CUST00003                  16            78.61        1257.76   
3   CUST00004                   2           158.93         317.86   
4   CUST00005                   5            27.78         138.90   

   customer_tenure_months  email_engagement_score  \
0                      17                    0.75   
1                       5                    0.29   
2                       9                    0.34   
3                      12                    0.17   
4                       2                    0.60   

   product_categories_purchased  avg_days_bet