# Notebook 03: Difference-in-Differences & Event Study Analysis

This notebook performs causal inference analysis around the Amazon AI review summary rollout:

1. **Difference-in-Differences (DiD)**: Estimate average treatment effect
2. **Event Study**: Examine dynamic treatment effects
3. **Parallel Trends Test**: Validate DiD assumptions

**Treatment Date:** 2023-08-14 (Amazon AI review summary rollout)

**Treatment Definition:** Products with `rating_number >= threshold` (default: 50) are treated

**Prerequisites:** Run Notebook 01 first to build the panel.

In [None]:
# Standard imports
import sys
from pathlib import Path

# Add src to path
sys.path.insert(0, str(Path.cwd().parent))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Project imports
from src import config
from src.io_utils import load_panel
from src.analysis_did import (
    run_all_did_regressions,
    run_all_event_studies,
    format_did_table,
    format_parallel_trends_summary,
)
from src.plots import (
    plot_timeseries_by_treatment,
    plot_event_study,
)

# Settings
pd.set_option('display.max_columns', 50)
pd.set_option('display.float_format', '{:.4f}'.format)

## Load Data

In [None]:
# Load panel
paths = config.get_output_paths(config.ACTIVE_CATEGORY)
panel_df = load_panel(paths['panel_parquet'])

print(f"Panel shape: {panel_df.shape}")
print(f"\nAI Rollout Date: {config.AI_ROLLOUT_DATE}")
print(f"Treatment Threshold: rating_number >= {config.TREATMENT_THRESHOLD}")

In [None]:
# Treatment group summary
print("\nTreatment Group Summary:")
print("-" * 40)

treat_summary = panel_df.groupby('treated').agg({
    'parent_asin': 'nunique',
    'ReviewCount': 'mean',
    'rating_number': 'mean',
}).round(2)
treat_summary.columns = ['N_Products', 'Avg_ReviewCount', 'Avg_RatingNumber']
print(treat_summary)

In [None]:
# Pre/Post period summary
print("\nPre/Post Period Summary:")
print("-" * 40)

period_summary = panel_df.groupby('post').agg({
    'week_start': ['min', 'max', 'nunique'],
    'ReviewCount': 'mean',
}).round(2)
print(period_summary)

## 1. Visual Pre-Trends Check

Before running DiD, visually inspect whether treatment and control groups follow parallel trends in the pre-period.

In [None]:
# Review count by treatment group
plot_timeseries_by_treatment(
    panel_df,
    y_col='logReviewCount',
    title='Log Review Count by Treatment Group',
    ylabel='log(1 + ReviewCount)',
)
plt.show()

In [None]:
# Verified share by treatment group
plot_timeseries_by_treatment(
    panel_df,
    y_col='VerifiedShare',
    title='Verified Purchase Share by Treatment Group',
    ylabel='Share',
)
plt.show()

In [None]:
# Average helpful votes by treatment group
plot_timeseries_by_treatment(
    panel_df,
    y_col='AvgHelpful',
    title='Average Helpful Votes by Treatment Group',
    ylabel='Helpful Votes',
)
plt.show()

In [None]:
# Average review length by treatment group
plot_timeseries_by_treatment(
    panel_df,
    y_col='AvgLen',
    title='Average Review Length by Treatment Group',
    ylabel='Characters',
)
plt.show()

## 2. Difference-in-Differences Analysis

Run DiD regression:

$$Y_{it} = \delta \cdot (\text{Treated}_i \times \text{Post}_t) + \alpha_i + \gamma_t + \varepsilon_{it}$$

Where:
- $\delta$ = DiD estimate (average treatment effect on the treated)
- $\alpha_i$ = product fixed effects (absorb time-invariant product differences)
- $\gamma_t$ = week fixed effects (absorb common time shocks)
- Main effects (Treated, Post) absorbed by fixed effects
- Standard errors clustered by product

In [None]:
# Run DiD regressions for all outcomes
did_results_df, did_full_results = run_all_did_regressions(
    panel_df,
    outcomes=config.PRIMARY_OUTCOMES,
)

In [None]:
# Display formatted DiD results
if not did_results_df.empty:
    print(format_did_table(did_results_df))

In [None]:
# Save DiD results
if not did_results_df.empty:
    did_results_df.to_csv(paths['did_results'], index=False)
    print(f"\nSaved to {paths['did_results']}")

In [None]:
# Detailed results for one outcome
if 'logReviewCount' in did_full_results:
    results = did_full_results['logReviewCount']['model_results']
    print("\nDetailed DiD Results: logReviewCount")
    print("=" * 60)
    print(results.summary)

### DiD Interpretation

The DiD coefficient ($\delta$) represents the **average treatment effect on the treated (ATT)**:

- **Positive $\delta$**: AI summaries increased the outcome for treated products
- **Negative $\delta$**: AI summaries decreased the outcome
- **Not significant**: No detectable effect of AI summaries

**Causal interpretation requires:**
1. Parallel trends assumption holds (treatment and control would have followed the same trend absent treatment)
2. No anticipation effects
3. Treatment timing is correctly specified
4. SUTVA (no spillovers between products)

## 3. Event Study Analysis

Event study examines dynamic treatment effects and tests parallel trends:

$$Y_{it} = \sum_{k \neq -1} \gamma_k \cdot (\text{Treated}_i \times \mathbf{1}[t = k]) + \alpha_i + \gamma_t + \varepsilon_{it}$$

Where:
- $k$ = weeks relative to treatment (k=0 is treatment week)
- $\gamma_k$ = effect at event time k (relative to k=-1)
- Pre-period coefficients ($\gamma_k$ for $k < -1$) test parallel trends

In [None]:
# Run event studies
es_coef_dfs, es_full_results = run_all_event_studies(
    panel_df,
    outcomes=config.PRIMARY_OUTCOMES,
    window=config.EVENT_STUDY_WINDOW,
)

In [None]:
# Display event study coefficients
for outcome, coef_df in es_coef_dfs.items():
    print(f"\n{outcome} Event Study Coefficients:")
    print(coef_df.round(4).to_string(index=False))

### Event Study Plots

In [None]:
# Plot event studies
for outcome, coef_df in es_coef_dfs.items():
    save_path = config.FIGURES_DIR / f"event_study_{outcome}_{config.ACTIVE_CATEGORY}.png"
    
    plot_event_study(
        coef_df,
        outcome=outcome,
        title=f'Event Study: {outcome}',
        ylabel=f'Effect on {outcome}',
        save_path=save_path,
    )
    plt.show()

## 4. Parallel Trends Test

Formal test of whether pre-period coefficients are jointly zero:

$$H_0: \gamma_{-8} = \gamma_{-7} = \ldots = \gamma_{-2} = 0$$

Rejection suggests parallel trends assumption may be violated.

In [None]:
# Print parallel trends test results
if es_full_results:
    print(format_parallel_trends_summary(es_full_results))

In [None]:
# Detailed parallel trends results
print("\nDetailed Parallel Trends Test Results:")
print("=" * 60)

for outcome, result in es_full_results.items():
    pt_test = result.get('parallel_trends_test')
    if pt_test:
        print(f"\n{outcome}:")
        print(f"  Wald statistic: {pt_test['wald_stat']:.4f}")
        print(f"  Degrees of freedom: {pt_test['df']}")
        print(f"  F-statistic: {pt_test['f_stat']:.4f}")
        print(f"  p-value: {pt_test['p_value']:.4f}")
        print(f"  Tested periods: {pt_test['tested_dummies']}")
        
        if pt_test['null_rejected']:
            print(f"  >>> WARNING: Parallel trends rejected at 5% level")
        else:
            print(f"  >>> Parallel trends NOT rejected at 5% level")

## 5. Robustness: Alternative Treatment Thresholds

Test sensitivity of results to treatment threshold.

In [None]:
# Test different thresholds
from src.analysis_fe import prepare_panel_for_regression
from src.analysis_did import run_did_regression

thresholds = [25, 50, 100, 200]
robustness_results = []

for threshold in thresholds:
    # Create new treatment variable
    panel_copy = panel_df.copy()
    panel_copy['treated_alt'] = (panel_copy['rating_number'].fillna(0) >= threshold).astype(int)
    panel_copy['treated_post_alt'] = panel_copy['treated_alt'] * panel_copy['post']
    
    # Count treated products
    n_treated = panel_copy[panel_copy['treated_alt'] == 1]['parent_asin'].nunique()
    n_control = panel_copy[panel_copy['treated_alt'] == 0]['parent_asin'].nunique()
    
    # Run DiD for logReviewCount
    indexed = prepare_panel_for_regression(panel_copy)
    
    try:
        result = run_did_regression(
            indexed,
            outcome='logReviewCount',
            interaction_col='treated_post_alt',
        )
        
        if result:
            robustness_results.append({
                'threshold': threshold,
                'n_treated': n_treated,
                'n_control': n_control,
                'did_coef': result['did_coef'],
                'did_se': result['did_se'],
                'did_pval': result['did_pval'],
            })
    except Exception as e:
        print(f"  Threshold {threshold}: Error - {e}")

# Display robustness results
if robustness_results:
    rob_df = pd.DataFrame(robustness_results)
    print("\nRobustness: DiD Coefficient by Treatment Threshold")
    print("Outcome: logReviewCount")
    print("=" * 70)
    print(rob_df.to_string(index=False))

## 6. Save Event Study Results

In [None]:
# Save coefficient tables
for outcome, coef_df in es_coef_dfs.items():
    es_path = config.TABLES_DIR / f"event_study_coefs_{outcome}_{config.ACTIVE_CATEGORY}.csv"
    coef_df.to_csv(es_path, index=False)
    print(f"Saved {outcome} to {es_path}")

## Summary and Interpretation

### Key Findings

Summarize the key findings from the DiD and event study analysis:

1. **DiD Estimates**: [Describe coefficient signs, magnitudes, and significance]

2. **Event Study**: [Describe the pattern of pre-period and post-period coefficients]

3. **Parallel Trends**: [Report whether parallel trends test passes/fails]

### Caveats

- **Treatment proxy**: We use review count threshold to proxy for AI summary eligibility. This is imperfect.
- **Limited post-period**: Only ~4-6 weeks of post-treatment data available.
- **Category-specific**: Results may not generalize beyond diaper products.
- **Potential confounders**: Other Amazon changes around this time could affect results.

## Next Steps

1. Run full analysis pipeline: `python scripts/run_analysis.py`
2. Review generated memo: `outputs/memo_diapers.md`
3. Explore robustness with different treatment definitions
4. Consider additional product categories