# Post 20: Measuring the Impact of Personalization (Causal Inference)

## The Problem

The personalization team is celebrating.

They rolled out dynamic homepage content to 50% of users (A/B test). Results are in:
- Control: 16.5% purchase rate
- Treatment: 25.4% purchase rate  
- **Lift: 54.3%**

The VP of Product wants to scale to 100% of users.

But the CPO asks a hard question:

**"Does personalization CAUSE more purchases—or do engaged users just buy more anyway?"**

This is the difference between **correlation and causation**.

---

## Why This Matters

### The Selection Bias Problem

When you look closer:
- Treatment group average engagement: 0.294
- Control group average engagement: 0.274

**Treatment group is MORE ENGAGED to begin with!**

Why? Because personalization was rolled out to:
- Users who logged in more frequently
- Users with higher past purchase history  
- Power users on desktop

**High engagement users:**
- More likely to receive personalization (selection bias)
- More likely to purchase ANYWAY (confounding)

So that 54.3% lift includes:
1. TRUE causal effect of personalization
2. BIAS from engaged users buying more regardless

**Question: How much of the 54.3% is real?**

---

## The Solution: Causal Inference

### What is Causal Inference?

Moving from "correlation" (things happen together) to "causation" (one thing CAUSES another).

**Techniques:**
- Randomized Controlled Trials (RCTs) - gold standard but expensive
- **Propensity Score Matching** - what we'll use
- Regression Discontinuity
- Difference-in-Differences
- Instrumental Variables

---

## Propensity Score Matching (Step-by-Step)

### Step 1: Identify Confounders

Variables that affect BOTH treatment assignment AND outcome:
- **Engagement score** - engaged users get personalization AND buy more
- **Past purchases** - repeat buyers get personalization AND buy again
- **Account age** - tenured users see personalization AND are loyal
- **Device type** - desktop users get personalization AND convert better

---

### Step 2: Estimate Propensity Scores

**Propensity score = probability of receiving treatment given confounders**

Train logistic regression:
P(Treatment = 1 | engagement, past_purchases, account_age, device)

Each customer gets a score (0-1):
- High score = likely to receive personalization
- Low score = unlikely to receive personalization

---

### Step 3: Match Treatment to Control

For each treatment customer, find a control customer with:
- Similar propensity score
- Similar engagement
- Similar past behavior
- Similar demographics

**Result: Balanced groups**

Before matching:
- Treatment engagement: 0.294
- Control engagement: 0.274  
- **Difference: 0.020** (biased!)

After matching:
- Treatment engagement: 0.294
- Control engagement: 0.293  
- **Difference: 0.0003** (balanced!)

---

### Step 4: Calculate Average Treatment Effect (ATE)

Compare matched groups:
- Control purchase rate: 16.7%
- Treatment purchase rate: 25.4%  
- **ATE: 8.68 percentage points**
- **Relative lift: 52.0%**

This is the TRUE causal effect.

---

## Naive vs Causal Comparison

| Method | Control | Treatment | Lift | Why? |
|--------|---------|-----------|------|------|
| **Naive** | 16.5% | 25.4% | 54.3% | Includes selection bias |
| **Causal (Matched)** | 16.7% | 25.4% | 52.0% | TRUE incremental effect |

**Difference: 2.3 percentage points due to confounding!**

---


In [2]:
# Post 20: Measuring Impact of Personalization
# Causal Inference with Propensity Score Matching

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import NearestNeighbors
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

print("="*70)
print("POST 20: MEASURING THE IMPACT OF PERSONALIZATION")
print("Causal Inference with Propensity Score Matching")
print("="*70)

# ============================================================================
# LOAD DATA
# ============================================================================

personalization_df = pd.read_csv('cdp_personalization_causal.csv')

print(f"\nDataset Overview:")
print(f"Total Customers: {len(personalization_df):,}")
print(f"Treatment (Personalization): {personalization_df['received_personalization'].sum():,}")
print(f"Control (No Personalization): {(~personalization_df['received_personalization'].astype(bool)).sum():,}")

# ============================================================================
# STEP 1: NAIVE ANALYSIS (WRONG - IGNORES CONFOUNDING)
# ============================================================================

print("\n" + "="*70)
print("STEP 1: NAIVE ANALYSIS (CORRELATION, NOT CAUSATION)")
print("="*70)

treatment = personalization_df[personalization_df['received_personalization'] == 1]
control = personalization_df[personalization_df['received_personalization'] == 0]

naive_treatment_rate = treatment['made_purchase'].mean()
naive_control_rate = control['made_purchase'].mean()
naive_lift = (naive_treatment_rate - naive_control_rate) / naive_control_rate

print(f"\nPurchase Rates:")
print(f"  Control: {naive_control_rate*100:.2f}%")
print(f"  Treatment: {naive_treatment_rate*100:.2f}%")
print(f"  Naive Lift: {naive_lift*100:.2f}%")

print(f"\nProblem: Selection Bias!")
print(f"  Treatment group engagement: {treatment['engagement_score'].mean():.3f}")
print(f"  Control group engagement: {control['engagement_score'].mean():.3f}")
print(f"  → Treatment group more engaged (confounded!)")

# ============================================================================
# STEP 2: PROPENSITY SCORE ESTIMATION
# ============================================================================

print("\n" + "="*70)
print("STEP 2: PROPENSITY SCORE MATCHING (CAUSAL INFERENCE)")
print("="*70)

# Encode device type
personalization_df['device_mobile'] = (personalization_df['device_type'] == 'Mobile').astype(int)
personalization_df['device_desktop'] = (personalization_df['device_type'] == 'Desktop').astype(int)

# Features that predict treatment assignment (confounders)
confounder_cols = ['engagement_score', 'past_purchases', 'account_age_days', 
                   'device_mobile', 'device_desktop']

X = personalization_df[confounder_cols]
y_treatment = personalization_df['received_personalization']

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Estimate propensity scores (probability of receiving treatment)
propensity_model = LogisticRegression(max_iter=1000, random_state=42)
propensity_model.fit(X_scaled, y_treatment)

propensity_scores = propensity_model.predict_proba(X_scaled)[:, 1]
personalization_df['propensity_score'] = propensity_scores

print(f"\nPropensity scores estimated!")
print(f"Propensity score range: {propensity_scores.min():.3f} to {propensity_scores.max():.3f}")
print(f"Mean propensity score: {propensity_scores.mean():.3f}")

# ============================================================================
# STEP 3: MATCHING (1:1 NEAREST NEIGHBOR)
# ============================================================================

print("\n" + "="*70)
print("STEP 3: PROPENSITY SCORE MATCHING (1:1)")
print("="*70)

# Separate treatment and control
treatment_df = personalization_df[personalization_df['received_personalization'] == 1].copy()
control_df = personalization_df[personalization_df['received_personalization'] == 0].copy()

# Match each treatment to nearest control by propensity score
treatment_propensities = treatment_df['propensity_score'].values.reshape(-1, 1)
control_propensities = control_df['propensity_score'].values.reshape(-1, 1)

# Find nearest neighbors
nn = NearestNeighbors(n_neighbors=1, metric='euclidean')
nn.fit(control_propensities)

distances, indices = nn.kneighbors(treatment_propensities)

# Create matched dataset
matched_treatment = treatment_df.copy()
matched_control = control_df.iloc[indices.flatten()].copy()

print(f"\nMatching complete!")
print(f"Matched pairs: {len(matched_treatment):,}")

# Check balance after matching
print(f"\nBalance Check (After Matching):")
print(f"  Engagement - Treatment: {matched_treatment['engagement_score'].mean():.3f}")
print(f"  Engagement - Control: {matched_control['engagement_score'].mean():.3f}")
print(f"  Difference: {abs(matched_treatment['engagement_score'].mean() - matched_control['engagement_score'].mean()):.4f}")

# ============================================================================
# STEP 4: ESTIMATE CAUSAL EFFECT (ATE)
# ============================================================================

print("\n" + "="*70)
print("STEP 4: AVERAGE TREATMENT EFFECT (ATE)")
print("="*70)

# Purchase rates in matched sample
matched_treatment_rate = matched_treatment['made_purchase'].mean()
matched_control_rate = matched_control['made_purchase'].mean()

# Average Treatment Effect (ATE)
ate = matched_treatment_rate - matched_control_rate
ate_pct = (ate / matched_control_rate) * 100

print(f"\nCausal Effect (After Matching):")
print(f"  Control purchase rate: {matched_control_rate*100:.2f}%")
print(f"  Treatment purchase rate: {matched_treatment_rate*100:.2f}%")
print(f"  Average Treatment Effect (ATE): {ate*100:.2f} percentage points")
print(f"  Relative Lift: {ate_pct:.2f}%")

# Statistical significance
t_stat, p_value = stats.ttest_ind(
    matched_treatment['made_purchase'],
    matched_control['made_purchase']
)

print(f"\nStatistical Significance:")
print(f"  T-statistic: {t_stat:.3f}")
print(f"  P-value: {p_value:.4f}")
print(f"  Significant: {'Yes' if p_value < 0.05 else 'No'} (α=0.05)")

# ============================================================================
# STEP 5: COMPARE NAIVE VS CAUSAL
# ============================================================================

print("\n" + "="*70)
print("STEP 5: NAIVE VS CAUSAL COMPARISON")
print("="*70)

comparison = pd.DataFrame({
    'Method': ['Naive (Biased)', 'Causal (Matched)'],
    'Control_Rate': [naive_control_rate*100, matched_control_rate*100],
    'Treatment_Rate': [naive_treatment_rate*100, matched_treatment_rate*100],
    'Lift_%': [naive_lift*100, ate_pct]
})

print(f"\n{comparison.to_string(index=False)}")

print(f"\nKey Insight:")
print(f"  Naive analysis: {naive_lift*100:.1f}% lift (WRONG - includes selection bias)")
print(f"  Causal analysis: {ate_pct:.1f}% lift (CORRECT - true incremental effect)")
print(f"  Difference: {(naive_lift*100 - ate_pct):.1f} percentage points due to confounding!")

# ============================================================================
# STEP 6: REVENUE IMPACT
# ============================================================================

print("\n" + "="*70)
print("STEP 6: BUSINESS IMPACT (REVENUE)")
print("="*70)

# Revenue analysis
treatment_revenue_per_customer = matched_treatment['revenue'].mean()
control_revenue_per_customer = matched_control['revenue'].mean()
revenue_ate = treatment_revenue_per_customer - control_revenue_per_customer

print(f"\nRevenue Impact:")
print(f"  Control (avg revenue/customer): ${control_revenue_per_customer:.2f}")
print(f"  Treatment (avg revenue/customer): ${treatment_revenue_per_customer:.2f}")
print(f"  Incremental revenue (ATE): ${revenue_ate:.2f}/customer")

# Scale to full customer base
total_customers = 1000000  # 1M customers
personalization_rollout_pct = 0.50  # 50% get personalization

customers_with_personalization = int(total_customers * personalization_rollout_pct)
incremental_revenue_total = customers_with_personalization * revenue_ate

print(f"\nProjected Impact (1M customers, 50% rollout):")
print(f"  Customers with personalization: {customers_with_personalization:,}")
print(f"  Incremental revenue: ${incremental_revenue_total:,.0f}")
print(f"  Annual impact: ${incremental_revenue_total:,.0f}")

# ============================================================================
# EXPORT RESULTS
# ============================================================================

print("\n" + "="*70)
print("EXPORT RESULTS")
print("="*70)

# Export matched pairs
matched_treatment['match_id'] = range(len(matched_treatment))
matched_control['match_id'] = range(len(matched_control))

output_df = pd.concat([
    matched_treatment[['customer_id', 'propensity_score', 'engagement_score', 
                      'received_personalization', 'made_purchase', 'revenue', 'match_id']],
    matched_control[['customer_id', 'propensity_score', 'engagement_score',
                    'received_personalization', 'made_purchase', 'revenue', 'match_id']]
])

output_df.to_csv('personalization_causal_analysis.csv', index=False)

print(f"\nResults exported to 'personalization_causal_analysis.csv'")

# ============================================================================
# FINAL SUMMARY
# ============================================================================

print("\n" + "="*70)
print("COMPLETE CAUSAL INFERENCE SUMMARY")
print("="*70)

print(f"\nCausal Effect:")
print(f"   Average Treatment Effect: {ate*100:.2f} percentage points")
print(f"   Relative Lift: {ate_pct:.1f}%")
print(f"   P-value: {p_value:.4f} ({'significant' if p_value < 0.05 else 'not significant'})")

print(f"\nBusiness Impact:")
print(f"   Incremental revenue per customer: ${revenue_ate:.2f}")
print(f"   Total incremental revenue (1M customers): ${incremental_revenue_total:,.0f}")

print(f"\nRecommendation:")
print(f"   TRUE causal lift: {ate_pct:.1f}% (not {naive_lift*100:.1f}%)")
print(f"   Personalization has significant incremental impact")
print(f"   Recommend full rollout to remaining 50% of customers")

print("\n" + "="*70)
print("POST 20 COMPLETE - CAUSAL INFERENCE!")
print("="*70)


POST 20: MEASURING THE IMPACT OF PERSONALIZATION
Causal Inference with Propensity Score Matching

Dataset Overview:
Total Customers: 10,000
Treatment (Personalization): 5,520
Control (No Personalization): 4,480

STEP 1: NAIVE ANALYSIS (CORRELATION, NOT CAUSATION)

Purchase Rates:
  Control: 16.45%
  Treatment: 25.38%
  Naive Lift: 54.28%

Problem: Selection Bias!
  Treatment group engagement: 0.294
  Control group engagement: 0.274
  → Treatment group more engaged (confounded!)

STEP 2: PROPENSITY SCORE MATCHING (CAUSAL INFERENCE)

Propensity scores estimated!
Propensity score range: 0.469 to 0.666
Mean propensity score: 0.552

STEP 3: PROPENSITY SCORE MATCHING (1:1)

Matching complete!
Matched pairs: 5,520

Balance Check (After Matching):
  Engagement - Treatment: 0.294
  Engagement - Control: 0.293
  Difference: 0.0003

STEP 4: AVERAGE TREATMENT EFFECT (ATE)

Causal Effect (After Matching):
  Control purchase rate: 16.70%
  Treatment purchase rate: 25.38%
  Average Treatment Effect (


## Key Insights

### 1. Correlation ≠ Causation

Naive analysis: **54.3% lift**  
Causal analysis: **52.0% lift**

**2.3pp was NOT personalization—it was engaged users buying more.**

---

### 2. Selection Bias is Real

When treatment isn't truly random:
- Self-selection (users opt in)
- Biased rollout (power users first)
- Geographic targeting (rich regions first)

**Always check if groups are balanced on confounders.**

---

### 3. Propensity Matching Isolates True Effect

By matching similar users:
- Remove confounding
- Isolate treatment effect
- Measure incremental impact

**This is what RCTs do—but cheaper.**

---

### 4. Statistical Significance Matters

P-value: 0.0000 (highly significant)

**Personalization has a real, measurable, statistically significant effect.**

Not just noise. Not just chance. TRUE causal impact.

---

### 5. Business Impact is Measurable

**Incremental revenue per customer: $9.67**

Scale to 1M customers (50% rollout):
- 500,000 customers with personalization
- **$4.8M annual incremental revenue**

This is money you can BANK ON (not inflated by bias).

---

## Why This Matters for PMs

### You Can't Always Run RCTs

Sometimes:
- Can't randomize (legacy systems, rollout constraints)
- Selection bias already happened (personalization went to engaged users)
- Want to measure historical impact (can't re-run experiment)

**Causal inference techniques let you measure TRUE impact anyway.**

---

### Investors Care About Causation

CFO: "You say personalization drove 54% lift. Prove it."

**Without causal inference:**
"Well, treatment group bought 54% more..."
"But they were already more engaged..."
"So maybe it's not all personalization..."

**With causal inference:**
"After controlling for engagement, past purchases, and device type, personalization caused a TRUE 52% lift. P-value < 0.0001. Incremental revenue: $4.8M annually."

**Which pitch would you fund?**

---

### Wrong Analysis = Wrong Decisions

**Scenario 1: Over-estimate impact**
- Think personalization is 54% lift (naive)
- TRUE effect is 52%
- Roll out expecting $5.4M, get $5.2M
- Miss revenue targets

**Scenario 2: Under-estimate impact**
- Think personalization has no effect (confounders masked it)
- TRUE effect is significant
- Don't invest, lose $4.8M opportunity

**Causal inference = right decisions.**

---

## When to Use Causal Inference

### Use Cases

1. **A/B tests with selection bias** (like this post)
2. **Historical impact measurement** (can't re-run experiment)
3. **Observational data** (no randomization)
4. **Policy evaluation** (did new feature cause retention?)
5. **Marketing attribution** (did campaign cause sales?)

### Techniques by Scenario

| Scenario | Best Technique |
|----------|----------------|
| Treatment assignment biased | Propensity Score Matching |
| Policy change at threshold | Regression Discontinuity |
| Treatment timing varies | Difference-in-Differences |
| Natural experiment | Instrumental Variables |
| True randomization | Simple A/B test (no need for causal inference) |

---

## What's Next?

### Immediate Actions
- Apply propensity matching to past A/B tests
- Re-analyze with causal lens (true impact?)
- Update revenue projections with TRUE causal effects
- Educate stakeholders on correlation vs causation

### Advanced Techniques
- Uplift modeling (who benefits MOST from treatment?)
- Heterogeneous treatment effects (does personalization work differently for segments?)
- Causal forests (ML + causation)
- Continuous treatment (not binary—dose-response curves)

---

## PM Takeaways

✅ **Correlation ≠ Causation** (always check for confounders)  
✅ **Selection bias inflates impact** (naive analysis misleads)  
✅ **Propensity matching isolates true effect** (balance groups, measure ATE)  
✅ **Statistical significance matters** (p < 0.05 = real effect)  
✅ **Causal inference = better decisions** (right investments, right expectations)

**The goal:** Move from "things happened together" to "this caused that."

---


## The Final Word

You've completed 20 posts on Product Management meets Machine Learning.

**Post 20 taught you the most important lesson:**

**Not all "data-driven" decisions are equally valid.**

- Some are correlation (things happen together)
- Some are causation (one thing causes another)

**Only causation tells you what will happen if you ACT.**

Congrats on finishing the series. You're now equipped to:
- Frame product problems as ML problems
- Build solutions with real business impact
- Measure TRUE incremental value (not inflated estimates)

Welcome to the world of ML-powered, causally-informed product management.

---
