# Time Series Counterfactual Analysis

**Goal**: Analyze historical "what-if" scenarios - what would have happened if we intervened earlier?

This notebook demonstrates:
1. Historical counterfactual analysis ("What if we intervened 30 days ago?")
2. Time series intervention impact assessment
3. Optimal intervention timing analysis
4. Comparing actual outcomes vs. counterfactual predictions

## 1. Setup and Data Loading

In [None]:
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Load time series data
df = pd.read_csv('data/retail_data.csv')
print(f"Loaded {len(df)} retail stores")
df.head()

## 2. Define Causal Graph

In [None]:
# Simplified causal structure for time series analysis
nodes = ['marketing_spend', 'price_discount', 'staff_count', 
         'foot_traffic', 'conversion_rate', 'customer_satisfaction', 'sales']

edges = [
    ('marketing_spend', 'foot_traffic'),
    ('price_discount', 'conversion_rate'),
    ('staff_count', 'customer_satisfaction'),
    ('foot_traffic', 'sales'),
    ('conversion_rate', 'sales'),
    ('customer_satisfaction', 'sales')
]

# Create adjacency matrix
adj_matrix = pd.DataFrame(0, index=nodes, columns=nodes)
for parent, child in edges:
    adj_matrix.loc[parent, child] = 1

print("Causal Graph (Time Series):")
print(adj_matrix)

## 3. Train Causal Model

In [None]:
from ht_categ import HT, HTConfig

# Train HT model
config = HTConfig(graph=adj_matrix, model_type='XGBoost')
ht_model = HT(config)
ht_model.train(df)

print("✓ Causal model trained")
print(f"\nModel Quality (R²):")
for node, metrics in ht_model.model_metrics.items():
    if 'r2_score' in metrics:
        print(f"  {node}: {metrics['r2_score']:.3f}")

## 4. Historical Counterfactual: "What if we intervened 30 days ago?"

In [None]:
from intervention_search.visualization.timeseries_intervention import TimeSeriesInterventionAnalyzer

# Initialize time series analyzer
ts_analyzer = TimeSeriesInterventionAnalyzer(
    causal_graph=ht_model.graph,
    ht_model=ht_model
)

# Simulate historical intervention
# Scenario: "What if we increased marketing spend by 25%, 30 days ago?"
intervention_spec = {
    'marketing_spend': 25.0  # +25% increase
}

historical_result = ts_analyzer.simulate_historical_intervention(
    intervention=intervention_spec,
    days_ago=30,
    outcome_variable='sales',
    actual_data=df  # Pass actual historical data for comparison
)

print("\n" + "="*70)
print("HISTORICAL COUNTERFACTUAL ANALYSIS")
print("="*70)
print(f"\nScenario: Increase marketing_spend by +25%, 30 days ago")
print(f"\nActual Average Sales (last 30 days): {historical_result.get('actual_avg', 0):.2f}")
print(f"Counterfactual Sales (if intervened): {historical_result.get('counterfactual_avg', 0):.2f}")
print(f"Missed Opportunity: {historical_result.get('missed_opportunity_pct', 0):+.1f}%")
print(f"\nTotal Potential Gain: ${historical_result.get('total_gain', 0):,.2f}")
print("="*70)

## 5. Visualize Counterfactual vs. Actual

In [None]:
# Plot comparison
ts_analyzer.plot_intervention_comparison(
    intervention=intervention_spec,
    outcome_variable='sales',
    days_ago=30,
    actual_data=df,
    show_confidence_bands=True
)

plt.title('Counterfactual Analysis: Marketing Spend +25% (30 Days Ago)')
plt.xlabel('Days Since Intervention')
plt.ylabel('Sales ($)')
plt.legend(['Actual', 'Counterfactual (if intervened)', '90% CI'])
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 6. Multi-Scenario Analysis: Compare Different Intervention Timings

In [None]:
# Test different intervention timings
timings = [7, 14, 21, 30, 60, 90]  # days ago

timing_results = []

for days_ago in timings:
    result = ts_analyzer.simulate_historical_intervention(
        intervention={'marketing_spend': 25.0},
        days_ago=days_ago,
        outcome_variable='sales',
        actual_data=df
    )
    
    timing_results.append({
        'days_ago': days_ago,
        'gain_pct': result.get('missed_opportunity_pct', 0),
        'total_gain': result.get('total_gain', 0)
    })

# Display results
timing_df = pd.DataFrame(timing_results)
print("\n" + "="*70)
print("INTERVENTION TIMING ANALYSIS")
print("="*70)
print("\nOptimal Intervention Timing:")
print(timing_df.sort_values('total_gain', ascending=False))

# Plot
plt.figure(figsize=(10, 5))
plt.bar(timing_df['days_ago'], timing_df['total_gain'], color='steelblue', alpha=0.7)
plt.xlabel('Days Ago (Intervention Timing)')
plt.ylabel('Total Potential Gain ($)')
plt.title('Intervention Value by Timing')
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

print(f"\n✅ Optimal timing: {timing_df.loc[timing_df['total_gain'].idxmax(), 'days_ago']:.0f} days ago")
print("="*70)

## 7. Forward-Looking Intervention Planning

In [None]:
from intervention_search import InterventionSearch

# Find optimal intervention for FUTURE
searcher = InterventionSearch(
    graph=ht_model.graph,
    ht_model=ht_model,
    n_simulations=1000
)

results = searcher.find_interventions(
    target_outcome='sales',
    target_change=20.0,  # +20% increase
    tolerance=3.0,
    confidence_level=0.90,
    verbose=True
)

best = results['best_intervention']

print("\n" + "="*70)
print("FORWARD-LOOKING RECOMMENDATION")
print("="*70)
print(f"\nRecommended Action NOW:")
print(f"  Variable: {best['nodes'][0]}")
print(f"  Change: {list(best['required_pct_changes'].values())[0]:+.1f}%")
print(f"\nExpected Impact (over next 30 days):")
print(f"  Sales Increase: {best['actual_effect']:+.1f}%")
print(f"  90% Confidence Interval: [{best['ci_90'][0]:+.1f}%, {best['ci_90'][1]:+.1f}%]")
print(f"  Confidence: {best['confidence']:.0%}")
print(f"  Quality Grade: {best['quality']['quality_grade']}")
print("="*70)

## 8. Export Counterfactual Data for Reporting

In [None]:
# Export detailed counterfactual data
counterfactual_data = ts_analyzer.export_counterfactual_data(
    intervention={'marketing_spend': 25.0},
    days_ago=30,
    outcome_variable='sales',
    actual_data=df,
    format='csv'
)

# Save to file
output_path = 'counterfactual_analysis_results.csv'
counterfactual_data.to_csv(output_path, index=False)
print(f"✓ Counterfactual analysis exported to: {output_path}")
print(f"\nPreview:")
print(counterfactual_data.head())

## Summary

This notebook demonstrated:

**Historical Analysis:**
- ✅ Counterfactual predictions ("what if we intervened N days ago?")
- ✅ Quantified missed opportunities
- ✅ Optimal intervention timing analysis

**Forward Planning:**
- ✅ Future intervention recommendations
- ✅ Confidence intervals for time series predictions
- ✅ Quality-gated recommendations

**Business Value:**
- Learn from historical missed opportunities
- Optimize intervention timing
- Make data-driven decisions with uncertainty quantification
- Export results for executive reporting