# Quickstart Example: Email Campaign Causal Analysis

This notebook demonstrates the basic usage of the causal inference library with a synthetic marketing dataset.

In [None]:
# Import required libraries
import numpy as np
import pandas as pd

from causal_inference.core import CovariateData, OutcomeData, TreatmentData
from causal_inference.estimators import GComputation

## Step 1: Generate Synthetic Marketing Data

We'll create a synthetic dataset that mimics real marketing campaign data.

In [None]:
# Set random seed for reproducibility
np.random.seed(42)
n = 1000

# Customer characteristics
age = np.random.normal(40, 15, n)
income = np.random.normal(50000, 20000, n)
previous_purchases = np.random.poisson(3, n)

# Email campaign assignment (treatment)
# Higher income customers more likely to be targeted
email_prob = 0.3 + 0.3 * (income > 50000)
email_campaign = np.random.binomial(1, email_prob, n)

# Conversion outcome
# Both email and customer characteristics affect conversion
conversion_prob = (
    0.1  # baseline conversion
    + 0.15 * email_campaign  # email effect
    + 0.0001 * income  # income effect
    + 0.02 * previous_purchases  # loyalty effect
)
conversion = np.random.binomial(1, conversion_prob, n)

# Create DataFrame
data = pd.DataFrame(
    {
        "age": age,
        "income": income,
        "previous_purchases": previous_purchases,
        "email_campaign": email_campaign,
        "conversion": conversion,
    }
)

print(f"Data shape: {data.shape}")
print(f"Conversion rate: {data['conversion'].mean():.3f}")
print(f"Email campaign rate: {data['email_campaign'].mean():.3f}")

## Step 2: Define the Causal Problem

We need to specify:
- **Treatment**: Email campaign (binary)
- **Outcome**: Conversion (binary)
- **Confounders**: Variables that affect both treatment and outcome

In [None]:
# Define treatment: email campaign
treatment = TreatmentData(
    values=data["email_campaign"], name="email_campaign", treatment_type="binary"
)

# Define outcome: conversion
outcome = OutcomeData(
    values=data["conversion"], name="conversion", outcome_type="binary"
)

# Define confounders: variables that affect both treatment and outcome
covariates = CovariateData(
    values=data[["age", "income", "previous_purchases"]],
    names=["age", "income", "previous_purchases"],
)

## Step 3: Estimate Causal Effect

We'll use G-computation (standardization) to estimate the Average Treatment Effect.

In [None]:
# Initialize G-computation estimator
estimator = GComputation()

# Estimate the Average Treatment Effect (ATE)
effect = estimator.estimate_ate(
    treatment=treatment, outcome=outcome, covariates=covariates
)

# Display results
print("📊 Causal Analysis Results")
print("=" * 30)
print(f"Average Treatment Effect: {effect.ate:.4f}")
print(f"95% Confidence Interval: [{effect.ci_lower:.4f}, {effect.ci_upper:.4f}]")
print(f"Standard Error: {effect.se:.4f}")
print(f"P-value: {effect.p_value:.4f}")

# Interpretation
if effect.p_value < 0.05:
    print("✅ Statistically significant effect detected!")
else:
    print("❌ No statistically significant effect found.")

## Step 4: Compare with Naive Analysis

Let's compare our causal estimate with a simple difference in means.

In [None]:
# Naive comparison (biased due to confounding)
treated_mean = data[data["email_campaign"] == 1]["conversion"].mean()
control_mean = data[data["email_campaign"] == 0]["conversion"].mean()
naive_diff = treated_mean - control_mean

print("📈 Comparison of Methods")
print("=" * 30)
print(f"Naive difference: {naive_diff:.4f}")
print(f"Causal effect (G-computation): {effect.ate:.4f}")
print(f"Difference: {abs(naive_diff - effect.ate):.4f}")

if abs(naive_diff - effect.ate) > 0.01:
    print("⚠️  Significant confounding detected - causal methods were necessary!")
else:
    print("✅ Results are similar - minimal confounding in this example")

## Step 5: Interpretation

The Average Treatment Effect tells us the average increase in conversion probability if we sent email campaigns to all customers versus sending them to none.

In [None]:
# Practical interpretation
baseline_conversion_rate = control_mean
relative_lift = effect.ate / baseline_conversion_rate

print("📋 Practical Interpretation")
print("=" * 30)
print(f"Baseline conversion rate: {baseline_conversion_rate:.1%}")
print(f"Email campaign increases conversion by: {effect.ate:.1%} (absolute)")
print(f"Relative lift: {relative_lift:.1%}")

# Business impact calculation
if effect.p_value < 0.05:
    customers_per_month = 10000  # Example
    additional_conversions = customers_per_month * effect.ate
    print(
        f"\n💰 Business Impact (if we email {customers_per_month:,} customers/month):"
    )
    print(f"Additional conversions per month: {additional_conversions:.0f}")

## Next Steps

This was a basic example. In practice, you would:

1. **Run diagnostics** to check assumptions
2. **Compare multiple methods** (IPW, AIPW, etc.)
3. **Perform sensitivity analysis** for unmeasured confounding
4. **Estimate heterogeneous effects** for different customer segments

See the other notebooks and documentation for more advanced examples!