# Experimentation & Causal Inference

Quantify the incremental impact of lifecycle campaigns while correcting for selection bias.

## Business Question
- Campaign owners need to know how much *incremental* revenue treatment drives vs natural demand.
- Targeting favors high-LTV customers, so naive comparisons overstate lift.
- We estimate both naive and causal effects, surface balance diagnostics, and outline actions for marketing leadership.

## Data & Methodology
- `data/marketing_interventions.csv` stores per-customer features, assignment, and pre/post revenue windows.
- Pipelines compute difference-in-means, propensity-score matching (ATT), and IPW ATE.
- Balance checks ensure covariates (`frequency`, `recency`, `T`, `monetary_value`, `pre_period_revenue`, `segment_value`) overlap after matching.

In [None]:
from pathlib import Path
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style='whitegrid')

In [None]:
PROJECT_ROOT = Path.cwd().resolve()
if PROJECT_ROOT.name == 'notebooks':
    PROJECT_ROOT = PROJECT_ROOT.parent
data_dir = PROJECT_ROOT / 'data'
reports_dir = PROJECT_ROOT / 'reports'
interventions = pd.read_csv(data_dir / 'marketing_interventions.csv', parse_dates=['date'])
causal_results = pd.read_csv(data_dir / 'causal_results.csv')
balance_table = pd.read_csv(data_dir / 'causal_balance_table.csv')

In [None]:
summary = (
    interventions.groupby('treatment')
    .agg(customers=('customer_id', 'count'),
         avg_pre=('pre_period_revenue', 'mean'),
         avg_post=('post_period_revenue', 'mean'),
         avg_freq=('frequency', 'mean'))
)
summary.index = summary.index.map({0: 'Control', 1: 'Treatment'})
summary.round(2)

In [None]:
causal_results[['metric', 'estimate', 'ci_low', 'ci_high']].assign(estimate=lambda df: df['estimate'].round(2), ci_low=lambda df: df['ci_low'].round(2), ci_high=lambda df: df['ci_high'].round(2))

In [None]:
plt.figure(figsize=(8, 4))
plot_data = balance_table.copy()
plot_data['covariate'] = plot_data['covariate'].str.replace('_', ' ', regex=False)
sns.barplot(data=plot_data, x='smd', y='covariate', hue='stage', palette=['#f28e2c', '#4e79a7'])
plt.axvline(0, color='black', linewidth=1)
plt.axvline(0.1, color='gray', linestyle='--', linewidth=1)
plt.axvline(-0.1, color='gray', linestyle='--', linewidth=1)
plt.title('Standardized Mean Differences (Pre vs Post Matching)')
plt.xlabel('Standardized Mean Difference')
plt.ylabel('Covariate')
plt.tight_layout()

In [None]:
interventions['lift'] = interventions['post_period_revenue'] - interventions['pre_period_revenue']
plt.figure(figsize=(8, 4))
sns.kdeplot(data=interventions, x='lift', hue='treatment', fill=True, common_norm=False, palette={0: '#bab0ac', 1: '#59a14f'})
plt.title('Lift Distribution: Treated vs Control')
plt.xlabel('Post - Pre Revenue')
plt.ylabel('Density')
plt.tight_layout()

## Interpretation for Marketing Leaders
- Naive lift (~$75) overstates incremental revenue because targeting favored existing high-value customers.
- Propensity-adjusted ATT (~$15, wide CI) reflects a realistic expectation for the treated audience; IPW ATE (~$-2) shows full-population rollout would likely break even.
- Balance diagnostics fall within Â±0.1 SMD after matching, indicating covariates are aligned and bias is reduced.
- Recommendation: continue running controlled tests for High-LTV segments, but gate broader rollouts on improved creative or targeting; integrate these causal lifts into finance forecast decks.