# CPPGA3254 - Econometrics I
PPGECO — 2nd Semester, 2025 <br>
University of Brasília (UnB) <br>
Prof. Dr. Daniel Cajueiro <br>
Student: Matheus Fiuza de Alencastro — 251100926 <br>

This code is provided to satisfy the requirements of the assigned exercise "Exercício Empírico do Curso de Econometria I (2025)" and constitutes an attempt to replicate the analysis by D. Mark Anderson, Yang Liang, and Joseph J. Sabia on mandatory seatbelt laws and traffic fatalities, published in the Journal of Applied Econometrics.

> Anderson, D. M., Liang, Y., & Sabia, J. J. (2024). Mandatory seatbelt laws and traffic fatalities: A reassessment. Journal of Applied Econometrics, 39(3), 513–521. https://doi.org/10.1002/jae.3026

All data and replication code are available from the authors at: https://journaldata.zbw.eu/dataset/b5f4c0f6-dd9e-4cdc-bea4-04a7efdb50dd/resource/98aa0a14-b449-46b4-8cac-23a1ef0094fd


## Intro

The referenced article is a reassessment of Cohen and Einav (2003), which found that mandatory seatbelt laws were associated with a 4–6% reduction in traffic fatalities among motor vehicle occupants. In this reassessment, the authors replicate the estimates of the original study—which applied a two-way fixed effects (TWFE) model—while expanding the time window of analysis. Additionally, they employ event-study models and alternative methods to address potential biases inherent in TWFE. Their results are consistent with the findings of Cohen and Einav (2003) yet some reconsiderations are proposed.

The main objective of this exercise is to replicate the TWFE estimator robust to heterogeneous treatment effects, as proposed by Sun and Abraham (2021) and explored by Anderson et al. (2024). As noted, the original authors used Stata to obtain their results; therefore, the primary contribution of this exercise is the translation and implementation of their analysis in Python. All information regarding package versions and dependencies is available in the pyproject.toml file.

## Replication Code

In [1]:
# Package Imports

import pandas as pd
import statsmodels.formula.api as smf
import numpy as np

# Data Loading

data = pd.read_csv('data/master_1983to2019.csv')


### Cohorts and Treatment Identification

For the TWFE specification robust to time heterogeneity, we construct one parameter for each interaction between cohorts and relative time. In order to create the cohorts, we cluster the states which passed Primary Seatbelt Laws in the same year. In addition, for this estimation we only consider as treated those States that passed primary laws first. The control group are those states that never passed primary laws (PSLs). Switchers (i.e those states that first approved secondary seatbelt laws and moved to primary laws are discarted from our sample). This specification follows Anderson et al. (2024).

In [2]:
# Cohorts and Treatment Identification

# Identify first year of primary law
psl_years = data[data['primary'] == 1].groupby('fips')['year'].min().reset_index() # fips is the state code
psl_years.rename(columns={'year':'cohort_year'}, inplace=True)
data = data.merge(psl_years, on='fips', how='left')

# Identify first year of secondary law
ssl_years = data[data['secondary'] == 1].groupby('fips')['year'].min().reset_index()
ssl_years.rename(columns={'year':'ssl_start'}, inplace=True)
data = data.merge(ssl_years, on='fips', how='left')

# Define treated
data['is_treated'] = np.where(
    (data['cohort_year'].notna()) & (data['ssl_start'].isna()), 1, 0
) # PSL and NO SSL

# Define control
data['is_control'] = np.where(
    (data['cohort_year'].isna()), 1, 0
)

# Filter out Switchers
df_reg = data[(data['is_treated'] == 1) | (data['is_control'] == 1)].copy()


### Time Identification

For the model we must define the time as relative to the treatment. Also, following the article`s specification, the relative time dummy is set to 1 if t is 4 or more years prior to treatment. The same is valid for 4 or more years after treatment.

In [3]:
df_reg['rel_year'] = df_reg['year'] - df_reg['cohort_year']
df_reg['rel_year_binned'] = df_reg['rel_year'].clip(lower=-4, upper=4)

### Variables and Formula

The robust to heterogeneous time effects model commands the creation of the Cohort-RelativeYear interaction dummies. Note that the reference period is skipped to avoid perfect multicollinearity.

In [4]:
interaction_vars = []

cohorts = df_reg[df_reg['is_treated']==1]['cohort_year'].unique()
cohorts.sort()

for cohort in cohorts:
    for rel_t in range(-4, 5): # Range -4 to +4
        if rel_t == -1: continue # Skip reference period (t = -1)
        
        col_name = f'att_C{int(cohort)}_T{rel_t}'.replace("-", "m")
        
        # Create Dummy: 1 if unit is in this cohort AND relative time matches
        df_reg[col_name] = np.where(
            (df_reg['cohort_year'] == cohort) & (df_reg['rel_year_binned'] == rel_t), 
            1, 0
        )
        interaction_vars.append(col_name)

control_vars = [
    'secondary', 'ln_pctblack', 'ln_pcthisp', 'ln_medinc', 'ln_age', 
    'ln_gastax', 'ln_violentcrimerate', 'ln_propcrimerate', 
    'ln_unemprate', 'ln_ruralvmt', 'ln_urbanvmt', 'speed65',
    'speed70', 'mlda21', 'bac8', 'ln_ruralden', 'ln_urbanden'
]

formula = f"adj_occrate ~ {' + '.join(interaction_vars)} + {' + '.join(control_vars)} + C(fips) + C(year)"

### Regression

Standard errors are clustered also following Anderson et al. (2024)

In [5]:
model = smf.ols(formula, data=df_reg).fit(
    cov_type='cluster',
    cov_kwds={'groups': df_reg['fips']}
)

## References

Anderson, D. M., Liang, Y., & Sabia, J. J. (2024). Mandatory seatbelt laws and traffic fatalities: A reassessment. Journal of Applied Econometrics, 39(3), 513–521. https://doi.org/10.1002/jae.3026

Cohen, A., & Einav, L. (2003). The effects of mandatory seat belt laws on driving behavior and traffic fatalities. Review of Economics and Statistics, 85(4), 828–843. https://doi.org/10.1162/003465303772815754

Matheus Facure, "The Diff-in-Diff Saga," in Causal Inference for the Brave and True, accessed [Dec. 2025], https://matheusfacure.github.io/python-causality-handbook/24-The-Diff-in-Diff-Saga.html.

Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics, 225(2), 175–199. https://doi.org/10.1016/j.jeconom.2020.09.006


