# Case based on Card and Krueger (1993), which estimates the causal effect of an increase in the state minimum wage on the employment.

* On April 1, 1992, New Jersey raised the state minimum wage from 4.25 USD to 5.05 USD while the minimum wage in Pennsylvania stays the same at 4.25.
* Data about the employment in the fast food restaurants (total number of employees in each restaurant) in NJ (0) and PA (1) were collected in February 1992 and in November 1992.
* Total 384 restaurants after removing null values.

Original paper: Card, D., & Krueger, A. B. (1993). Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania. https://davidcard.berkeley.edu/papers/njmin-aer.pdf

## 1. Import libraries and data:

In [1]:
import pandas as pd
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.formula.api import ols

# Omiting WARNINGS
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Read data
df = pd.read_csv('employment.csv')

df.head()

Unnamed: 0,state,total_emp_feb,total_emp_nov
0,0,40.5,24.0
1,0,13.75,11.5
2,0,8.5,10.5
3,0,34.0,20.0
4,0,24.0,35.5


In [3]:
# Otra función que permite explorar datos:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 384 entries, 0 to 383
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   state          384 non-null    int64  
 1   total_emp_feb  384 non-null    float64
 2   total_emp_nov  384 non-null    float64
dtypes: float64(2), int64(1)
memory usage: 9.1 KB


## 2. Descriptive Analysis

In [4]:
# What is the mean of total number of employees in each restaurant by state?

df.groupby('state').mean()

Unnamed: 0_level_0,total_emp_feb,total_emp_nov
state,Unnamed: 1_level_1,Unnamed: 2_level_1
0,23.38,21.096667
1,20.430583,20.897249


In [5]:
# Check by calculating the mean for each group directly
# 0 PA control group, 1 NJ treatment group

mean_emp_pa_before = df.groupby('state').mean().iloc[0, 0]
mean_emp_pa_after = df.groupby('state').mean().iloc[0, 1]
mean_emp_nj_before = df.groupby('state').mean().iloc[1, 0]
mean_emp_nj_after = df.groupby('state').mean().iloc[1, 1]

print(f'mean PA employment before: {mean_emp_pa_before:.2f}')
print(f'mean PA employment after: {mean_emp_pa_after:.2f}')
print(f'mean NJ employment before: {mean_emp_nj_before:.2f}')
print(f'mean NJ employment after: {mean_emp_nj_after:.2f}')

mean PA employment before: 23.38
mean PA employment after: 21.10
mean NJ employment before: 20.43
mean NJ employment after: 20.90


In [6]:
# DiD
pa_diff = mean_emp_pa_after - mean_emp_pa_before
nj_diff = mean_emp_nj_after - mean_emp_nj_before
did = nj_diff - pa_diff

print(f'DID in mean employment is {did:.2f}')

DID in mean employment is 2.75


## Implementing Regression

In [7]:
# group D: 0 control group (PA), 1 treatment group (NJ)
# T: 0 before treatment (min wage raise), 1 after treatment
# DT: interaction of D * T

# data before the treatment
df_before = df[['total_emp_feb', 'state']]
df_before['T'] = 0
df_before.columns = ['total_emp', 'D', 'T'] # Rename columns

df_before

Unnamed: 0,total_emp,D,T
0,40.50,0,0
1,13.75,0,0
2,8.50,0,0
3,34.00,0,0
4,24.00,0,0
...,...,...,...
379,9.00,1,0
380,9.75,1,0
381,24.50,1,0
382,14.00,1,0


In [8]:
# data after the treatment
df_after = df[['total_emp_nov', 'state']]
df_after['T'] = 1
df_after.columns = ['total_emp', 'D', 'T']

df_after

Unnamed: 0,total_emp,D,T
0,24.00,0,1
1,11.50,0,1
2,10.50,0,1
3,20.00,0,1
4,35.50,0,1
...,...,...,...
379,23.75,1,1
380,17.50,1,1
381,20.50,1,1
382,20.50,1,1


In [9]:
# data for regression
df_reg = pd.concat( [df_before, df_after] )

# create the interaction 
df_reg['DT'] = df_reg["D"] * df_reg["T"]

df_reg

Unnamed: 0,total_emp,D,T,DT
0,40.50,0,0,0
1,13.75,0,0,0
2,8.50,0,0,0
3,34.00,0,0,0
4,24.00,0,0,0
...,...,...,...,...
379,23.75,1,1,1
380,17.50,1,1,1
381,20.50,1,1,1
382,20.50,1,1,1


In [10]:
# regression via 1

Y = df_reg["total_emp"]
X = df_reg[['D', 'T', 'DT']]
X = sm.add_constant(X)
est = sm.OLS(Y,X).fit()
print(est.summary())

                            OLS Regression Results                            
Dep. Variable:              total_emp   R-squared:                       0.008
Model:                            OLS   Adj. R-squared:                  0.004
Method:                 Least Squares   F-statistic:                     1.947
Date:                Thu, 20 Apr 2023   Prob (F-statistic):              0.121
Time:                        19:33:49   Log-Likelihood:                -2817.6
No. Observations:                 768   AIC:                             5643.
Df Residuals:                     764   BIC:                             5662.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         23.3800      1.098     21.288      0.0

In [11]:
# regression via 2

ols = ols('total_emp ~ D + T + DT', data = df_reg).fit()
print(ols.summary())

                            OLS Regression Results                            
Dep. Variable:              total_emp   R-squared:                       0.008
Model:                            OLS   Adj. R-squared:                  0.004
Method:                 Least Squares   F-statistic:                     1.947
Date:                Thu, 20 Apr 2023   Prob (F-statistic):              0.121
Time:                        19:33:49   Log-Likelihood:                -2817.6
No. Observations:                 768   AIC:                             5643.
Df Residuals:                     764   BIC:                             5662.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     23.3800      1.098     21.288      0.0