# Marketing Use Cases for Causal Inference

This notebook demonstrates specific marketing use cases where causal inference provides valuable insights.

## Use Cases Covered

1. **Email Marketing Campaign Effectiveness**
2. **Price Promotion Impact Analysis**
3. **Customer Loyalty Program Evaluation**
4. **A/B Testing with Observational Data**
5. **Media Channel Attribution**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

# Import causal inference components
from causal_inference.core.base import TreatmentData, OutcomeData, CovariateData
from causal_inference.estimators.g_computation import GComputationEstimator
from causal_inference.estimators.ipw import IPWEstimator
from causal_inference.estimators.aipw import AIPWEstimator
from causal_inference.data.synthetic import SyntheticDataGenerator
from causal_inference.diagnostics.balance import check_covariate_balance
from causal_inference.diagnostics.reporting import generate_diagnostic_report

# Set random seed and plotting style
np.random.seed(42)
plt.style.use('default')
sns.set_palette("husl")

## Use Case 1: Email Marketing Campaign Effectiveness

**Business Question**: Does sending promotional emails increase customer purchases?

**Challenge**: Customers aren't randomly assigned emails - targeting is based on past behavior, demographics, and engagement.

In [None]:
# Generate realistic email marketing data
np.random.seed(123)
n_customers = 2000

# Customer characteristics
age = np.random.uniform(18, 70, n_customers)
income = np.random.lognormal(10.5, 0.5, n_customers)  # Log-normal income distribution
past_purchases = np.random.poisson(3, n_customers)  # Historic purchase count
email_engagement = np.random.beta(2, 5, n_customers)  # Past email open rates
days_since_last_purchase = np.random.exponential(30, n_customers)

# Email targeting is based on customer characteristics (confounding!)
email_propensity = (
    -2.5 +  # Base propensity
    0.02 * age +  # Older customers more likely to get emails
    0.00001 * income +  # Higher income customers
    0.3 * past_purchases +  # Frequent buyers
    2.0 * email_engagement +  # High engagement
    -0.01 * days_since_last_purchase  # Recent customers
)

email_prob = 1 / (1 + np.exp(-email_propensity))
received_email = np.random.binomial(1, email_prob, n_customers)

# Purchase outcome depends on both customer characteristics AND email
purchase_amount = (
    50 +  # Base purchase amount
    1.5 * age +  # Age effect
    0.002 * income +  # Income effect
    15 * past_purchases +  # Loyalty effect
    100 * email_engagement +  # Engagement effect
    -0.5 * days_since_last_purchase +  # Recency effect
    25 * received_email +  # TRUE CAUSAL EFFECT of email
    np.random.normal(0, 20, n_customers)  # Noise
)

purchase_amount = np.maximum(purchase_amount, 0)  # Non-negative purchases

# Create data objects
email_data = {
    'treatment': TreatmentData(values=received_email, treatment_type="binary"),
    'outcome': OutcomeData(values=purchase_amount, outcome_type="continuous"),
    'covariates': CovariateData(values=pd.DataFrame({
        'age': age,
        'income': income,
        'past_purchases': past_purchases,
        'email_engagement': email_engagement,
        'days_since_last_purchase': days_since_last_purchase
    }))
}

print(f"Email Marketing Dataset:")
print(f"- {n_customers} customers")
print(f"- {np.sum(received_email)} received emails ({np.mean(received_email):.1%})")
print(f"- True email effect: $25")
print(f"- Mean purchase amount: ${np.mean(purchase_amount):.2f}")

# Naive comparison (biased!)
naive_diff = np.mean(purchase_amount[received_email == 1]) - np.mean(purchase_amount[received_email == 0])
print(f"- Naive difference: ${naive_diff:.2f} (biased due to confounding)")

In [None]:
# Analyze email marketing with causal inference
print("=== Email Marketing Causal Analysis ===")

# Use random forest models for flexibility
aipw_email = AIPWEstimator(
    outcome_model=RandomForestRegressor(n_estimators=100, random_state=42),
    propensity_model=RandomForestClassifier(n_estimators=100, random_state=42),
    bootstrap_samples=100
)

aipw_email.fit(email_data['treatment'], email_data['outcome'], email_data['covariates'])
email_effect = aipw_email.estimate_ate()

print(f"\nCausal Effect Results:")
print(f"- Estimated ATE: ${email_effect.ate:.2f}")
print(f"- 95% CI: [${email_effect.confidence_interval[0]:.2f}, ${email_effect.confidence_interval[1]:.2f}]")
print(f"- True effect: $25.00")
print(f"- Bias corrected: ${naive_diff - email_effect.ate:.2f}")

# Business implications
total_customers = 50000  # Scale up
current_email_rate = 0.3
customers_emailed = total_customers * current_email_rate
monthly_lift = customers_emailed * email_effect.ate
annual_lift = monthly_lift * 12

print(f"\nBusiness Impact (scaled to {total_customers:,} customers):")
print(f"- Monthly revenue lift: ${monthly_lift:,.0f}")
print(f"- Annual revenue lift: ${annual_lift:,.0f}")

# Check balance
balance_result = check_covariate_balance(email_data['treatment'], email_data['covariates'])
print(f"\nCovariate Balance: {'Good' if balance_result['overall_balance'] else 'Poor - confounding present'}")

## Use Case 2: Price Promotion Impact Analysis

**Business Question**: What's the true incremental impact of price discounts on sales?

**Challenge**: Promotions are typically used on slow-moving products or during low-demand periods.

In [None]:
# Generate price promotion data
np.random.seed(456)
n_products = 1500

# Product characteristics
base_price = np.random.uniform(10, 100, n_products)
product_age = np.random.uniform(0, 365, n_products)  # Days since launch
category_popularity = np.random.normal(0, 1, n_products)  # Standardized category appeal
inventory_level = np.random.exponential(50, n_products)  # Inventory overstocking
seasonality = np.sin(2 * np.pi * np.arange(n_products) / 365)  # Seasonal component

# Promotions are more likely for overstocked, older, less popular products
promotion_propensity = (
    -1.0 +  # Base propensity
    -0.02 * base_price +  # Cheaper products promoted more
    0.005 * product_age +  # Older products
    -0.5 * category_popularity +  # Less popular categories
    0.01 * inventory_level +  # Overstocked items
    -0.5 * seasonality  # Out-of-season items
)

promotion_prob = 1 / (1 + np.exp(-promotion_propensity))
has_promotion = np.random.binomial(1, promotion_prob, n_products)

# Sales depend on product characteristics AND promotion
log_sales = (
    3.0 +  # Base log sales
    -0.02 * base_price +  # Price sensitivity
    -0.001 * product_age +  # Newer products sell better
    0.8 * category_popularity +  # Popular categories
    -0.005 * inventory_level +  # Overstocked = lower demand
    0.3 * seasonality +  # Seasonal effects
    0.4 * has_promotion +  # TRUE CAUSAL EFFECT (40% sales lift)
    np.random.normal(0, 0.3, n_products)  # Noise
)

sales_units = np.exp(log_sales)  # Convert to actual sales

# Create promotion data
promotion_data = {
    'treatment': TreatmentData(values=has_promotion, treatment_type="binary"),
    'outcome': OutcomeData(values=sales_units, outcome_type="continuous"),
    'covariates': CovariateData(values=pd.DataFrame({
        'base_price': base_price,
        'product_age': product_age,
        'category_popularity': category_popularity,
        'inventory_level': inventory_level,
        'seasonality': seasonality
    }))
}

print(f"Price Promotion Dataset:")
print(f"- {n_products} products")
print(f"- {np.sum(has_promotion)} on promotion ({np.mean(has_promotion):.1%})")
print(f"- True promotion effect: 40% sales lift")

# Naive comparison
naive_ratio = np.mean(sales_units[has_promotion == 1]) / np.mean(sales_units[has_promotion == 0])
print(f"- Naive sales ratio: {naive_ratio:.2f} (biased - promotions target low-selling products)")

In [None]:
# Analyze promotion impact with causal inference
print("=== Price Promotion Causal Analysis ===")

aipw_promotion = AIPWEstimator(
    outcome_model=RandomForestRegressor(n_estimators=100, random_state=42),
    propensity_model=RandomForestClassifier(n_estimators=100, random_state=42),
    bootstrap_samples=100
)

aipw_promotion.fit(promotion_data['treatment'], promotion_data['outcome'], promotion_data['covariates'])
promotion_effect = aipw_promotion.estimate_ate()

# Convert to percentage effect
baseline_sales = np.mean(sales_units[has_promotion == 0])
percentage_lift = (promotion_effect.ate / baseline_sales) * 100

print(f"\nCausal Effect Results:")
print(f"- Estimated ATE: {promotion_effect.ate:.2f} additional units")
print(f"- Percentage lift: {percentage_lift:.1f}%")
print(f"- True effect: 40% sales lift")
print(f"- 95% CI: [{(promotion_effect.confidence_interval[0]/baseline_sales)*100:.1f}%, {(promotion_effect.confidence_interval[1]/baseline_sales)*100:.1f}%]")

# Calculate ROI
avg_price = np.mean(base_price)
discount_rate = 0.20  # 20% discount
cost_per_promotion = avg_price * discount_rate
revenue_per_promotion = promotion_effect.ate * avg_price
roi = (revenue_per_promotion - cost_per_promotion) / cost_per_promotion

print(f"\nPromotion ROI Analysis:")
print(f"- Average discount cost: ${cost_per_promotion:.2f} per product")
print(f"- Additional revenue: ${revenue_per_promotion:.2f} per product")
print(f"- ROI: {roi:.1%}")

if roi > 0:
    print(f"- Recommendation: Promotions are profitable - continue strategy")
else:
    print(f"- Recommendation: Promotions are unprofitable - reconsider strategy")

## Use Case 3: Customer Loyalty Program Evaluation

**Business Question**: Do loyalty programs increase customer lifetime value?

**Challenge**: High-value customers are more likely to join loyalty programs.

In [None]:
# Generate loyalty program data
np.random.seed(789)
n_customers = 3000

# Customer characteristics
customer_age = np.random.uniform(25, 65, n_customers)
income_segment = np.random.choice([1, 2, 3], n_customers, p=[0.3, 0.5, 0.2])  # Low, Mid, High
frequency_segment = np.random.choice([1, 2, 3], n_customers, p=[0.4, 0.4, 0.2])  # Low, Mid, High
years_as_customer = np.random.exponential(2, n_customers)
channel_preference = np.random.choice([0, 1], n_customers, p=[0.6, 0.4])  # Online vs Store

# Loyalty program enrollment (selection bias!)
loyalty_propensity = (
    -3.0 +  # Base propensity
    0.03 * customer_age +  # Older customers more likely
    0.8 * income_segment +  # Higher income
    1.2 * frequency_segment +  # Higher frequency customers
    0.3 * years_as_customer +  # Long-term customers
    0.5 * channel_preference  # Store shoppers more likely
)

loyalty_prob = 1 / (1 + np.exp(-loyalty_propensity))
in_loyalty_program = np.random.binomial(1, loyalty_prob, n_customers)

# Annual spend depends on characteristics AND loyalty program
annual_spend = (
    1000 +  # Base spend
    20 * customer_age +  # Age effect
    800 * income_segment +  # Income effect
    600 * frequency_segment +  # Frequency effect
    100 * years_as_customer +  # Tenure effect
    200 * channel_preference +  # Channel effect
    300 * in_loyalty_program +  # TRUE CAUSAL EFFECT
    np.random.normal(0, 400, n_customers)  # Noise
)

annual_spend = np.maximum(annual_spend, 100)  # Minimum spend

# Create loyalty data
loyalty_data = {
    'treatment': TreatmentData(values=in_loyalty_program, treatment_type="binary"),
    'outcome': OutcomeData(values=annual_spend, outcome_type="continuous"),
    'covariates': CovariateData(values=pd.DataFrame({
        'customer_age': customer_age,
        'income_segment': income_segment,
        'frequency_segment': frequency_segment,
        'years_as_customer': years_as_customer,
        'channel_preference': channel_preference
    }))
}

print(f"Loyalty Program Dataset:")
print(f"- {n_customers} customers")
print(f"- {np.sum(in_loyalty_program)} in loyalty program ({np.mean(in_loyalty_program):.1%})")
print(f"- True loyalty effect: $300 additional annual spend")

# Naive comparison
naive_diff = np.mean(annual_spend[in_loyalty_program == 1]) - np.mean(annual_spend[in_loyalty_program == 0])
print(f"- Naive difference: ${naive_diff:.0f} (biased - high-value customers join more)")

In [None]:
# Analyze loyalty program with causal inference
print("=== Loyalty Program Causal Analysis ===")

aipw_loyalty = AIPWEstimator(
    outcome_model=RandomForestRegressor(n_estimators=100, random_state=42),
    propensity_model=RandomForestClassifier(n_estimators=100, random_state=42),
    bootstrap_samples=100
)

aipw_loyalty.fit(loyalty_data['treatment'], loyalty_data['outcome'], loyalty_data['covariates'])
loyalty_effect = aipw_loyalty.estimate_ate()

print(f"\nCausal Effect Results:")
print(f"- Estimated ATE: ${loyalty_effect.ate:.0f} additional annual spend")
print(f"- 95% CI: [${loyalty_effect.confidence_interval[0]:.0f}, ${loyalty_effect.confidence_interval[1]:.0f}]")
print(f"- True effect: $300")
print(f"- Selection bias: ${naive_diff - loyalty_effect.ate:.0f}")

# Program economics
program_cost_per_member = 50  # Annual cost to run program per member
net_value = loyalty_effect.ate - program_cost_per_member
roi_loyalty = net_value / program_cost_per_member

current_members = 50000
total_net_value = current_members * net_value

print(f"\nProgram Economics:")
print(f"- Program cost per member: ${program_cost_per_member}/year")
print(f"- Net value per member: ${net_value:.0f}/year")
print(f"- ROI: {roi_loyalty:.1%}")
print(f"- Total net value ({current_members:,} members): ${total_net_value:,.0f}/year")

if net_value > 0:
    print(f"- Recommendation: Loyalty program is profitable - consider expansion")
else:
    print(f"- Recommendation: Loyalty program is unprofitable - needs restructuring")

# Visualize the effect
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
# Show bias in naive analysis
methods = ['Naive\nComparison', 'Causal\nInference', 'True\nEffect']
effects = [naive_diff, loyalty_effect.ate, 300]
colors = ['red', 'blue', 'green']

bars = plt.bar(methods, effects, color=colors, alpha=0.7)
plt.title('Loyalty Program Effect: Naive vs Causal')
plt.ylabel('Additional Annual Spend ($)')

# Add value labels
for bar, effect in zip(bars, effects):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 10, 
             f'${effect:.0f}', ha='center', va='bottom')

plt.subplot(1, 2, 2)
# Show distribution of annual spend by loyalty status
loyalty_spend = annual_spend[in_loyalty_program == 1]
non_loyalty_spend = annual_spend[in_loyalty_program == 0]

plt.hist(non_loyalty_spend, bins=30, alpha=0.7, label='Non-Loyalty', density=True)
plt.hist(loyalty_spend, bins=30, alpha=0.7, label='Loyalty', density=True)
plt.xlabel('Annual Spend ($)')
plt.ylabel('Density')
plt.title('Spend Distribution by Loyalty Status')
plt.legend()

plt.tight_layout()
plt.show()

## Summary: Key Marketing Insights

### Email Marketing
- **Finding**: Emails have a significant positive effect on purchases
- **Key Learning**: Naive analysis overestimates impact due to targeting bias
- **Action**: Continue email campaigns but account for true incremental lift

### Price Promotions  
- **Finding**: Promotions drive meaningful sales lift despite targeting slow products
- **Key Learning**: ROI analysis shows promotions can be profitable when effect is measured correctly
- **Action**: Expand promotional strategy with better targeting

### Loyalty Programs
- **Finding**: Programs genuinely increase customer value beyond selection effects
- **Key Learning**: High-value customers joining creates massive selection bias in naive analysis
- **Action**: Invest in program expansion given positive true ROI

### Methodological Insights
1. **Selection bias is everywhere** in marketing data
2. **Naive comparisons** often overestimate effects
3. **Causal inference** reveals true business impact
4. **ROI calculations** must use causal effects, not correlations
5. **Business decisions** improve when based on causal evidence

In [None]:
# Summary comparison across use cases
use_cases_summary = pd.DataFrame({
    'Use Case': ['Email Marketing', 'Price Promotions', 'Loyalty Program'],
    'Naive Effect': [f'${naive_diff:.0f}', f'{(naive_ratio-1)*100:.1f}%', f'${naive_diff:.0f}'],
    'Causal Effect': [f'${email_effect.ate:.0f}', f'{percentage_lift:.1f}%', f'${loyalty_effect.ate:.0f}'],
    'True Effect': ['$25', '40%', '$300'],
    'Bias Direction': ['Overestimate', 'Underestimate', 'Overestimate'],
    'Business Decision': ['Continue', 'Expand', 'Invest']
})

print("=== Marketing Use Cases Summary ===")
print(use_cases_summary.to_string(index=False))

print("\n=== Key Takeaways ===")
print("1. Selection bias affects all marketing interventions")
print("2. Causal inference corrects for confounding")
print("3. True effects guide optimal resource allocation")
print("4. ROI calculations require causal estimates")
print("5. Business strategy improves with causal evidence")