# 02 Difference-in-Differences + Event Study

TWFE DiD and event studies with synthetic adoption and diagnostics.


## Table of Contents
- [Synthetic adoption + treatment](#synthetic-adoption-treatment)
- [TWFE DiD](#twfe-did)
- [Event study (leads/lags)](#event-study-leads-lags)
- [Diagnostics: pre-trends + placebo](#diagnostics-pre-trends-placebo)
- [Checkpoint (Self-Check)](#checkpoint-self-check)
- [Solutions (Reference)](#solutions-reference)


## Why This Notebook Matters
Causal notebooks focus on **identification**: what would have to be true for a coefficient to represent a causal effect.
You will practice:
- building a county-year panel,
- fixed effects (TWFE),
- clustered standard errors,
- DiD + event studies,
- IV/2SLS.


## Prerequisites (Quick Self-Check)
- Completed Part 02 (regression + robust SE).
- Basic familiarity with panels (same unit over time) and the idea of identification assumptions.

## What You Will Produce
- (no file output; learning/analysis notebook)

## Success Criteria
- You can explain what you built and why each step exists.
- You can run your work end-to-end without undefined variables.

## Common Pitfalls
- Running cells top-to-bottom without reading the instructions.
- Leaving `...` placeholders in code cells.
- Treating regression output as causal without stating identification assumptions.
- Using non-clustered SE when shocks are correlated within groups (e.g., states).

## Quick Fixes (When You Get Stuck)
- If you see `ModuleNotFoundError`, re-run the bootstrap cell and restart the kernel; make sure `PROJECT_ROOT` is the repo root.
- If a `data/processed/*` file is missing, either run the matching build script (see guide) or use the notebook’s `data/sample/*` fallback.
- If results look “too good,” suspect leakage; re-check shifts, rolling windows, and time splits.
- If a model errors, check dtypes (`astype(float)`) and missingness (`dropna()` on required columns).

## Matching Guide
- `docs/guides/07_causal/02_difference_in_differences_event_study.md`



## How To Use This Notebook
- Work section-by-section; don’t skip the markdown.
- Most code cells are incomplete on purpose: replace TODOs and `...`, then run.
- After each section, write 2–4 sentences answering the interpretation prompts (what changed, why it matters).
- Prefer `data/processed/*` if you have built the real datasets; otherwise use the bundled `data/sample/*` fallbacks.
- Use the **Checkpoint (Self-Check)** section to catch mistakes early.
- Use **Solutions (Reference)** only to unblock yourself; then re-implement without looking.
- Use the matching guide (`docs/guides/07_causal/02_difference_in_differences_event_study.md`) for the math, assumptions, and deeper context.



<a id="environment-bootstrap"></a>
## Environment Bootstrap
Run this cell first. It makes the repo importable and defines common directories.



In [None]:
from __future__ import annotations

from pathlib import Path
import sys


def find_repo_root(start: Path) -> Path:
    p = start
    for _ in range(8):
        if (p / 'src').exists() and (p / 'docs').exists():
            return p
        p = p.parent
    raise RuntimeError('Could not find repo root. Start Jupyter from the repo root.')


PROJECT_ROOT = find_repo_root(Path.cwd())
if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

DATA_DIR = PROJECT_ROOT / 'data'
RAW_DIR = DATA_DIR / 'raw'
PROCESSED_DIR = DATA_DIR / 'processed'
SAMPLE_DIR = DATA_DIR / 'sample'

PROJECT_ROOT



## Goal
Practice DiD and event studies using:
- a real county-year outcome (poverty rate)
- a **synthetic**, deterministic adoption schedule by state
- a **semi-synthetic** outcome with a known injected treatment effect

This is a method exercise, not a real policy evaluation.



## Primer: Panel + IV regression with `linearmodels` (FE, clustered SE, 2SLS)

This repo uses:
- `statsmodels` for classic OLS inference patterns, and
- `linearmodels` for **panel fixed effects** and **instrumental variables** (IV/2SLS).

The goal of this primer is to make you productive quickly (with the *minimum* theory needed to use the tools correctly). Deep math lives in the guides.

### Why `linearmodels`?

`linearmodels` provides clean APIs for:
- `PanelOLS`: fixed effects / TWFE
- `IV2SLS`: two-stage least squares

and it handles some panel-specific details (like absorbing FE) more naturally than `statsmodels`.

### Panel data shape (the #1 requirement)

Most panel estimators expect a **MultiIndex**:
- level 0: entity (e.g., county `fips`)
- level 1: time (e.g., `year`)

```python
# df has columns: fips, year, y, x1, x2, state, ...
df = df.copy()
df["fips"] = df["fips"].astype(str)
df["year"] = df["year"].astype(int)
df = df.set_index(["fips", "year"]).sort_index()
```

**Expected output / sanity check**
- `df.index.nlevels == 2`
- `df.index.is_monotonic_increasing` is `True`
- no duplicate index pairs: `df.index.duplicated().any()` is `False`

### TWFE model (PanelOLS)

Econometric form:

$$
Y_{it} = X_{it}'\\beta + \\alpha_i + \\gamma_t + \\varepsilon_{it}
$$

In code:

```python
from linearmodels.panel import PanelOLS
import statsmodels.api as sm

y = df["y"].astype(float)
X = df[["x1", "x2"]].astype(float)
X = sm.add_constant(X, has_constant="add")

res = PanelOLS(y, X, entity_effects=True, time_effects=True).fit(cov_type="robust")
print(res.summary)
```

### Clustered SE (common in applied panel/DiD work)

If errors are correlated within clusters (e.g., state-level shocks), use clustered SE:

```python
clusters = df["state"]  # must align row-for-row with y/X index

res_cl = PanelOLS(y, X, entity_effects=True, time_effects=True).fit(
  cov_type="clustered",
  clusters=clusters,
)
```

**Expected output / sanity check**
- clustered SE are often larger than robust SE (not guaranteed, but common)
- always report the number of clusters: `clusters.nunique()`

### IV / 2SLS (IV2SLS)

Structural equation (endogeneity motivation):
$$
Y = \\beta X + W'\\delta + u, \\quad \\mathrm{Cov}(X,u)\\neq 0
$$

In code (one endogenous regressor):

```python
from linearmodels.iv import IV2SLS
import statsmodels.api as sm

y = df["y"].astype(float)
endog = df[["x_endog"]].astype(float)
exog = sm.add_constant(df[["x_exog1", "x_exog2"]].astype(float), has_constant="add")
instr = df[["z1", "z2"]].astype(float)

res_iv = IV2SLS(y, exog, endog, instr).fit(cov_type="robust")
print(res_iv.summary)
```

**Expected output / sanity check**
- `res_iv.params` contains coefficients for exog + endogenous variables
- `res_iv.first_stage` (if printed) shows instrument relevance diagnostics

### Common pitfalls (and quick fixes)

- **MultiIndex mismatch:** if `clusters` is not aligned to the same index as `y/X`, you’ll get errors or wrong results.
  - Fix: construct clusters from the same `df` after indexing/sorting.
- **Non-numeric dtypes:** strings in `X` silently break models.
  - Fix: `astype(float)` on model columns.
- **Missing data:** panels often have missing rows after merges/transforms.
  - Fix: build a modeling table with `.dropna()` for required columns.
- **Too few clusters:** cluster-robust inference is fragile with very small cluster counts.
  - Fix: treat p-values as fragile; report cluster count; consider alternative designs.


<a id="synthetic-adoption-treatment"></a>
## Synthetic adoption + treatment

### Background
A real DiD design needs a real policy change and careful context.
Here we use a **synthetic adoption schedule** so you can focus on mechanics:
- how to build treatment indicators,
- how to think about identification (parallel trends),
- and how to diagnose pre-trends.

We also create a **semi-synthetic outcome** by injecting a known post-treatment effect into a real outcome.
That gives you a ground truth target for checking the estimator.

### What you should see
- `treated` equals 1 only for treated states in post-adoption years.
- `poverty_rate_semi` differs from `poverty_rate_real` by about `true_effect` when treated.

### Interpretation prompts
- In one sentence, define the causal question this notebook is pretending to answer.
- What assumption would be needed for the TWFE DiD coefficient to be causal on the real outcome?

### Goal
Define a deterministic adoption year by state and build:
- `treated_it`
- `poverty_rate_semi` (known post-treatment effect)



### Your Turn: Load panel and create synthetic adoption


In [None]:
import numpy as np
import pandas as pd

path = PROCESSED_DIR / 'census_county_panel.csv'
if path.exists():
    df = pd.read_csv(path)
else:
    df = pd.read_csv(SAMPLE_DIR / 'census_county_panel_sample.csv')

df['fips'] = df['fips'].astype(str)
df['year'] = df['year'].astype(int)
df['state'] = df['state'].astype(str).str.zfill(2)

states = sorted(df['state'].unique())
# Deterministic adoption schedule (edit if you want):
adopt = {states[0]: 2018, states[1]: 2020}  # remaining states are never-treated

df['adopt_year'] = df['state'].map(adopt)
df['ever_treated'] = df['adopt_year'].notna().astype(int)
df['post'] = ((df['year'] >= df['adopt_year']).fillna(False)).astype(int)
df['treated'] = df['ever_treated'] * df['post']

true_effect = -0.02
df['poverty_rate_real'] = df['poverty_rate'].astype(float)
df['poverty_rate_semi'] = (df['poverty_rate_real'] + true_effect * df['treated']).clip(0, 1)

df[['state', 'year', 'treated', 'poverty_rate_real', 'poverty_rate_semi']].head()



<a id="twfe-did"></a>
## TWFE DiD

### Background
The simplest multi-period DiD estimator is a TWFE regression with a treatment indicator.
Under parallel trends (and related assumptions), the coefficient on `treated` can be interpreted as an average treatment effect.

### What you should see
- On the semi-synthetic outcome, the estimated `treated` coefficient should be in the neighborhood of `true_effect`.
- Standard errors should be clustered by state (treatment assignment/shocks).

### Interpretation prompts
- Compare the estimate to `true_effect`. Is it close? If not, why might it differ (small sample, noise, design)?
- Write the parallel trends assumption in words for this setting.

### Goal
Estimate the effect of treatment with TWFE DiD:
- county FE
- year FE
- clustered SE by state (common)



### Your Turn: Fit TWFE DiD


In [None]:
from src.causal import fit_twfe_panel_ols

# Panel index
df = df.set_index(['fips', 'year'], drop=False).sort_index()

# TODO: Fit DiD on semi-synthetic outcome
res_did = fit_twfe_panel_ols(
    df,
    y_col='poverty_rate_semi',
    x_cols=['treated'],
    entity_effects=True,
    time_effects=True,
    cluster_col='state',
)

res_did.params



<a id="event-study-leads-lags"></a>
## Event study (leads/lags)

### Background
An event study replaces a single post indicator with a set of lead/lag indicators.
This lets you:
- visualize dynamics after adoption, and
- test for pre-trends using lead coefficients.

### What you should see
- lead coefficients (k<0) near 0 on the semi-synthetic outcome.
- post coefficients (k>=0) around the injected effect.

### Interpretation prompts
- Which lead coefficients would worry you most, and why?
- Explain what the base period means (why one event-time dummy is omitted).

### Goal
Estimate dynamic effects around adoption and inspect pre-trends.



### Your Turn: Build leads/lags and fit


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

df_es = df.reset_index(drop=True).copy()
df_es['event_time'] = df_es['year'] - df_es['adopt_year']

window = list(range(-3, 4))
base = -1
event_cols = []
for k in window:
    if k == base:
        continue
    col = f'event_{k}'
    df_es[col] = ((df_es['ever_treated'] == 1) & (df_es['event_time'] == k)).astype(int)
    event_cols.append(col)

df_es = df_es.set_index(['fips', 'year'], drop=False).sort_index()

res_es = fit_twfe_panel_ols(
    df_es,
    y_col='poverty_rate_semi',
    x_cols=event_cols,
    entity_effects=True,
    time_effects=True,
    cluster_col='state',
)

coefs = res_es.params.filter(like='event_')
ses = res_es.std_errors.filter(like='event_')
out = coefs.to_frame('coef').join(ses.to_frame('se'))
out['k'] = out.index.str.replace('event_', '').astype(int)
out = out.sort_values('k')

# TODO: Plot coefficient path with 95% CI
plt.errorbar(out['k'], out['coef'], yerr=1.96*out['se'], fmt='o-')
plt.axhline(0, color='gray', linestyle='--')
plt.axvline(base, color='gray', linestyle=':')
plt.xlabel('Event time (years relative to adoption)')
plt.ylabel('Effect')
plt.title('Event study (semi-synthetic)')
plt.show()



<a id="diagnostics-pre-trends-placebo"></a>
## Diagnostics: pre-trends + placebo

### Background
DiD is only as credible as its diagnostics.
In real research, this is where most of the work lives:
- are treated and control trending similarly before treatment?
- are results robust to reasonable specification changes?
- do placebo tests behave as expected?

### What you should see
- a short diagnostic result (table/plot) and a written interpretation.

### Interpretation prompts
- If the placebo finds a large effect, what does that suggest about the design?
- Why is the real outcome analysis explicitly **not** a real policy evaluation here?

### Goal
Run at least one falsification / diagnostic.

Suggestions:
- Pre-trends: are lead coefficients near 0?
- Placebo: shift adoption years earlier for treated states.
- Re-run on the real outcome (`poverty_rate_real`) and reflect on why it is not causal.



### Your Turn: One diagnostic


In [None]:
# TODO: Implement one diagnostic and summarize what you found.
...



### Note: Staggered treatment timing and recent DiD literature

The TWFE DiD estimator used here assumes a **single adoption time** or that treatment effects are homogeneous across cohorts. Recent econometrics research has shown that standard TWFE can produce biased estimates under **staggered adoption** with **heterogeneous treatment effects**:

- **Callaway & Sant'Anna (2021)**: proposes group-time ATTs that avoid "forbidden comparisons" (using already-treated units as controls).
- **Sun & Abraham (2021)**: shows TWFE event studies can be contaminated by treatment effect heterogeneity across cohorts.
- **de Chaisemartin & D'Haultfoeuille (2020)**: demonstrates when TWFE gives a weighted average with potentially negative weights.

**Practical implication**: In this notebook's synthetic setup (only 2 adoption years), the issue is limited. But in real research with many adoption dates, consider using robust DiD estimators (e.g., `did` package in R, or the `pydid` Python package).

**Key takeaway**: Always check whether your setting involves staggered adoption and heterogeneous effects. If so, standard TWFE may not give you what you think.

<a id="checkpoint-self-check"></a>
## Checkpoint (Self-Check)
Run a few asserts and write 2-3 sentences summarizing what you verified.



In [None]:
import pandas as pd

# Expected output: (see notebook front matter)
# TODO: If you created a panel DataFrame, verify the indexing + core columns.
# Example (adjust variable names):
# assert isinstance(panel.index, pd.MultiIndex)
# assert panel.index.names[:2] == ['fips', 'year']
# assert panel['year'].astype(int).between(1900, 2100).all()
# assert panel['fips'].astype(str).str.len().eq(5).all()
#
# TODO: Write 2-3 sentences:
# - What is the identification assumption for your causal estimate?
# - What diagnostic/falsification did you run?
...



## Extensions (Optional)
- Try one additional variant beyond the main path (different features, different split, different model).
- Write down what improved, what got worse, and your hypothesis for why.



## Reflection
- What did you assume implicitly (about timing, availability, stationarity, or costs)?
- If you had to ship this model, what would you monitor?



<a id="solutions-reference"></a>
## Solutions (Reference)

Try the TODOs first. Use these only to unblock yourself or to compare approaches.

<details><summary>Solution: Synthetic adoption + treatment</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 02_difference_in_differences_event_study — Synthetic adoption + treatment
import numpy as np
import pandas as pd

df = pd.read_csv(SAMPLE_DIR / 'census_county_panel_sample.csv')
df['fips'] = df['fips'].astype(str)
df['year'] = df['year'].astype(int)

states = sorted(df['state'].astype(str).unique())
# Deterministic synthetic adoption schedule
adopt = {states[0]: 2018, states[1]: 2020}  # states[2] is never-treated

df['adopt_year'] = df['state'].astype(str).map(adopt)
df['ever_treated'] = df['adopt_year'].notna().astype(int)
df['post'] = ((df['year'] >= df['adopt_year']).fillna(False)).astype(int)
df['treated'] = df['ever_treated'] * df['post']

# Semi-synthetic outcome: add a known post effect.
true_effect = -0.02
df['poverty_rate_real'] = df['poverty_rate'].astype(float)
df['poverty_rate_semi'] = (df['poverty_rate_real'] + true_effect * df['treated']).clip(0, 1)

df[['state', 'year', 'treated', 'poverty_rate_real', 'poverty_rate_semi']].head()
```

</details>

<details><summary>Solution: TWFE DiD</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 02_difference_in_differences_event_study — TWFE DiD
from src.causal import fit_twfe_panel_ols

df = df.set_index(['fips', 'year'], drop=False).sort_index()

res = fit_twfe_panel_ols(
    df,
    y_col='poverty_rate_semi',
    x_cols=['treated'],
    entity_effects=True,
    time_effects=True,
    cluster_col='state',
)
res.params
```

</details>

<details><summary>Solution: Event study (leads/lags)</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 02_difference_in_differences_event_study — Event study (leads/lags)
import numpy as np

df_es = df.reset_index(drop=True).copy()
df_es['event_time'] = df_es['year'] - df_es['adopt_year']

window = list(range(-3, 4))
base = -1
event_cols = []
for k in window:
    if k == base:
        continue
    col = f'event_{k}'
    df_es[col] = ((df_es['ever_treated'] == 1) & (df_es['event_time'] == k)).astype(int)
    event_cols.append(col)

df_es = df_es.set_index(['fips', 'year'], drop=False).sort_index()

res_es = fit_twfe_panel_ols(
    df_es,
    y_col='poverty_rate_semi',
    x_cols=event_cols,
    entity_effects=True,
    time_effects=True,
    cluster_col='state',
)

coefs = res_es.params.filter(like='event_')
ses = res_es.std_errors.filter(like='event_')
out = (coefs.to_frame('coef').join(ses.to_frame('se')))
out
```

</details>

<details><summary>Solution: Diagnostics: pre-trends + placebo</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 02_difference_in_differences_event_study — Diagnostics: pre-trends + placebo
# Pre-trends: inspect lead coefficients (event_-3, event_-2).
# Placebo: shift adoption earlier and confirm estimated effect shrinks toward 0.
```

</details>

