# Causal Inference: A/B Test Validation

This notebook demonstrates how to validate attribution results against ground truth from A/B tests or experiments.

## The Gold Standard

A/B tests provide **causal ground truth** - the actual effect of a treatment (channel) on outcomes.

We can compare our attribution model's estimates against experimental results.

In [None]:
import sys
sys.path.insert(0, '../src')

import numpy as np
from datetime import datetime

# Import validation suite
from validation import (
    generate_synthetic_ground_truth,
    compare_to_ground_truth,
    run_validation_report
)

print("‚úì Imports successful")
print("\nüìä This notebook covers:")
print("  1. Generate synthetic A/B test data")
print("  2. Compare attribution vs ground truth")
print("  3. Validate model accuracy")
print("  4. Interpret results")

## Step 1: Generate Synthetic Ground Truth

In [None]:
# Generate synthetic data with known ground truth
journeys, true_effects = generate_synthetic_ground_truth(
    n_journeys=10000,
    noise_level=0.1,
    seed=42
)

print("‚úì Generated synthetic experiment data")
print(f"\nüìä Dataset:")
print(f"   Journeys: {len(journeys):,}")
print(f"   Conversions: {sum(1 for j in journeys if j['converted']):,}")
print(f"   Conversion rate: {sum(1 for j in journeys if j['converted'])/len(journeys):.1%}")

print(f"\nüéØ Ground Truth Causal Effects:")
for ch, effect in sorted(true_effects.items(), key=lambda x: -x[1]):
    print(f"   {ch}: {effect:.1%}")

## Step 2: Simulate Attribution Output

In [None]:
# Simulate attribution model output (with some error to be realistic)
# This represents what the Markov-Shapley model would produce

model_output = {
    'Search': 0.38,   # True: 0.35 (slight overestimate)
    'Email': 0.22,    # True: 0.25 (slight underestimate)
    'Direct': 0.21,   # True: 0.20 (slight overestimate)
    'Social': 0.11,   # True: 0.12 (close)
    'Display': 0.08   # True: 0.08 (exact)
}

model_cis = {
    'Search': (0.32, 0.44),
    'Email': (0.18, 0.28),
    'Direct': (0.16, 0.26),
    'Social': (0.07, 0.15),
    'Display': (0.04, 0.12)
}

print("‚úì Attribution model output ready")
print("\nüìà Model Estimates vs Ground Truth:")
print("-" * 50)
print(f"{'Channel':<10} {'Model':>8} {'True':>8} {'Error':>10}")
print("-" * 50)
for ch in sorted(true_effects.keys()):
    model = model_output[ch]
    truth = true_effects[ch]
    error = model - truth
    print(f"{ch:<10} {model:>7.1%} {truth:>7.1%} {error:>+8.1%}")
print("-" * 50)

## Step 3: Validate Against Ground Truth

In [None]:
# Run validation against ground truth
result = compare_to_ground_truth(model_output, true_effects, model_cis)

print("="*60)
print("VALIDATION RESULTS")
print("="*60)

print(f"\nüìä Model Performance Metrics:")
print(f"   Rank Correlation (Spearman's œÅ): {result.rank_correlation:.3f}")
print(f"   Mean Absolute Percentage Error: {result.magnitude_error:.1%}")
print(f"   Top Channel Match: {'‚úì YES' if result.top_channel_match else '‚úó NO'}")
print(f"   Top-3 Overlap: {result.top_k_overlap:.0%}")
print(f"   Confidence Calibration: {result.confidence_calibration:.3f}")

print("\n" + "="*60)
print("INTERPRETATION")
print("="*60)

if result.rank_correlation > 0.8:
    print("\n‚úÖ Strong ranking agreement with ground truth")
elif result.rank_correlation > 0.5:
    print("\n‚ö†Ô∏è Moderate ranking agreement")
else:
    print("\n‚ùå Weak ranking - model may need tuning")

if result.magnitude_error < 0.15:
    print("‚úÖ Low magnitude error - accurate attribution values")
elif result.magnitude_error < 0.30:
    print("‚ö†Ô∏è Moderate magnitude error")
else:
    print("‚ùå High magnitude error - significant bias detected")

if result.top_channel_match:
    print("‚úÖ Correctly identified the top-performing channel")
else:
    print("‚ùå Failed to identify the top-performing channel")

## Step 4: Full Validation Report

In [None]:
# Generate comprehensive validation report
report = run_validation_report(model_output, journeys, true_effects, model_cis)

print(report)

## Key Takeaways

1. **Correlation vs Causation**: Attribution models capture correlation patterns.
   A/B tests reveal actual causal effects.

2. **Validation is Essential**: Always validate against ground truth when possible.

3. **Limitations**: Observational attribution cannot replace experiments.
   Use attribution for exploration, experiments for confirmation.

## When to Use Each Approach

| Scenario | Best Approach |
|----------|---------------|
| A/B test available | Use experimental results |
| No experiment | Use attribution model |
| Both available | Compare for validation |
| Strategic planning | Attribution + sensitivity analysis |
| Tactical optimization | A/B tests |

---

**Next:** See `../llm_analysis/03_ai_interpretation.ipynb` for AI-powered analysis of these results.