# Difference-in-Differences
In Difference-in-Differences (DiD) specifications, we compare changes in outcomes over
time between treated and untreated units to isolate the causal effect of a treatment.
The key identifying assumption is that in the absence of treatment, the treated group 
would have experienced the same outcome trend as the control group. This is known as 
the **parallel trends** assumption, which formally states that:
$$
    \mathbb{E}[ Y_{i,t_1}(0) - Y_{i,t_0}(0) \mid \text{treated}]
    =
    \mathbb{E}[ Y_{i,t_1}(0) - Y_{i,t_0}(0) \mid \text{control}]
$$
for any two periods $t_0$ and $t_1$.

This means that if the treated group had not received treatment, its expected change in
outcomes over time would have mirrored that of the control group. Graphically, this
implies that the untreated potential outcomes for both groups would follow parallel
paths.

This model also assumes that once treatment begins, treated units remain treated. In
other words, a treated unit cannot opt out of treatment. This is a realistic assumption
for settings like state-level policies, but it may hold in scientific experiments where
attritors can lie about taking a drug.

## Two way fixed effects
Two way fixed effects (2FE) are a common extension of the DiD model. It sounds fancy,
but it only means that we are adding unit-level ($\alpha_i$) and time-level ($\lambda_t)
fixed effects to the base specificacion to control for variation within units as well as
time periods.

## 1. Basic Model
We start off by assuming that the treatment effect is homogeneous accross all treated
units and all post-treatment periods.
$$
    Y_{it} = \alpha_i + \lambda_t + \delta D_{it} + \varepsilon_{it}
$$

- $Y_{it}: outcome for unit $i$ at time $t$
- $\alpha_i$: unit FE
- $\lambda_t$: time FE
- $D_{it}$: Binary treatment indicator (1 if $i$ is treated in period $t$, 0 otherwise)
- $\delta$: Average treatment effect on the treated

This specification imposes a strong assumption: the treatment effect is immediate,
constant over time, and identical across units. It does not allow for anticipation,
lagged effects, or treatment effect heterogeneity.

## 2. Heterogeneous Treatment Effects
Suppose treatment begins at time $t^* \in \{0, 1, ..., T\}$. Then, the model
$$
    Y_{it} = \alpha_i + \lambda_t + \sum_{\substack{s = 0 \\ s \ne t^* - 1}}^T
    \delta_s \mathbf{1}\{t = s\} D_i + \varepsilon_{it}
$$
allows for heterogeneous treatment effects (over time).

- This model allows us to test the parallel trends assumption in pre-treatment periods!
- What happens if some units begin treatment at different periods?

## 3. Event Study
Model 2 uses natural time period $t$, and it works well when every unit in the treatment
group begins treatment in the same period. However, if at least one treatment unit
begins at a different period than the rest, then there is not a single $t^*$ that
applies to all units.

To solve this technical issue, we re-index time periods so that they're now measured in
**periods away from treatment**. In other words, if $i$ began treatment at time $t_i$,
then $G_i = t_i$, and thus:
$$k = t - G_i$$
and
$$
    Y_{it} = \alpha_i + \lambda_t +
    \sum_{\substack{k = k_{min} \\ k \ne - 1}}^{k_{max}}
    \beta_k \mathbf{1}\{t - G_i = k\} + \varepsilon_{it}
$$

Note that $G_i$ is not defined for control units! We have two optinos:
1. Exclude control units
2. Include all observations, but let $k = \text{NaN}$ for control units

---

Imports

In [None]:
import os
import pandas as pd
from linearmodels.panel import PanelOLS

Load data and set indexes

In [None]:
# Load data
df = pd.read_csv(os.path.join('..', 'data', 'card-krueger.csv'))

# Set idx
df = df.set_index(['i', 't'])

# Get dummies from encoded columns
df = pd.get_dummies(data=df, columns=['chain', 'meals'], dtype=int, drop_first=False)

# Masks to keep full panel
mask = df['type_1'].eq(1)  # Answered 2nd interview
mask_ft = ~df['empft'].isna().groupby(level=0).any()  # Full empft panel
mask_pt = ~df['emppt'].isna().groupby(level=0).any()  # Full emppt panel
mask_mg = ~df['nmgrs'].isna().groupby(level=0).any()  # Full nmgrs panel

# Declare D_{it}
df['Dit'] = df['state'] * df.index.get_level_values(1)  # 1=(NJ & t=1), 0=otherwise

1. $E(Y_{it}) = \alpha_i + \lambda_t + \delta D_{it}$

In [None]:
# Declare model
m0 = PanelOLS(
    dependent=df.loc[mask & mask_ft, 'empft'],
    exog=df.loc[mask & mask_ft, 'Dit'],
    entity_effects=True,
    time_effects=True
)

# Fit model
res0 = m0.fit(cov_type='clustered')

# View results
print(res0.summary)

2. $E(Y_{it}) = \alpha_i + \lambda_t + \gamma X_{it} + \delta D_{it}$

In [None]:
# Declare controls
X1 = ['open', 'hrsopen', 'Dit']  # Why not pa_1, pa_2, etc.?

# Declare model
m1 = PanelOLS(
    dependent=df.loc[mask & mask_ft, 'empft'],
    exog=df.loc[mask & mask_ft, X1],
    entity_effects=True,
    time_effects=True
)

# Fit model
res1 = m1.fit(cov_type='clustered')

# View results
print(res1.summary)

3. $E(Y_{it}) = \alpha_i + \lambda_t + \gamma \tilde{X}_{it} + \delta D_{it}$

In [None]:
# Declare controls
X2 = ['open', 'hrsopen', 'pfry', 'psoda', 'pentree', 'nregs', 'Dit']
mask_X2 = ~df[X2].isna().groupby(level=0).any().sum(axis=1).astype(bool)  # New mask

# Declare model
m2 = PanelOLS(
    dependent=df.loc[mask & mask_ft & mask_X2, 'empft'],
    exog=df.loc[mask & mask_ft & mask_X2, X2],
    entity_effects=True,
    time_effects=True
)

# Fit model
res2 = m2.fit(cov_type='clustered')

# View results
print(res2.summary)