This Notebook is intended for a PhD subject of Econometric Methods in order to apply the Differences-in-Differences (DID) methodology.

In their article entitled 'Employment Effects of Minimum and Subminimum Wages: Panel Data on State Minimum Wage Laws', Card and Krueger (1994) investigate how minimum wage legislation affects employment. On April 1, 1992, New Jersey (NJ) raised the state minimum wage from 4.25 USD to 5.05 USD, while the minimum wage in Pennsylvania (PA) stayed at 4.25 USD. Data about employment in fast-food restaurants in NJ and PA were collected in February 1992 and November 1992.

Their analysis looks at both traditional minimum wages and subminimum wages paid to disabled workers or those with specific work conditions across several states using a thorough empirical approach that accounts for regional differences in industry-specific factors. We have applied this methodology to the same dataset from 1994.

The authors analysed the full-time equivalent (FTE) employees by adding full-time employees with managers with half of the part-time workers. This was performed by using Jupiter Notebook.

In [1]:
import pandas as pd
df = pd.read_table('Card and Krueger DataSet/public.dat',delim_whitespace=True, names=['SHEET','CHAIN','CO_OWNED','STATE','SOUTHJ','CENTRALJ','NORTHJ','PA1','PA2','SHORE','NCALLS','EMPFT','EMPPT','NMGRS','WAGE_ST','INCTIME','FIRSTINC','BONUS','PCTAFF','MEALS','OPEN','HRSOPEN','PSODA','PFRY','PENTREE','NREGS','NREGS11','TYPE2','STATUS2','DATE2','NCALLS2','EMPFT2','EMPPT2','NMGRS2','WAGE_ST2','INCTIME2','FIRSTIN2','SPECIAL2','MEALS2','OPEN2R','HRSOPEN2','PSODA2','PFRY2','PENTREE2','NREGS2','NREGS112'])
df = df.set_index('SHEET')
#We are looking for the observations where we had a response in the second round of interviews.
df = df[df['STATUS2']==1]

#We have a total of 399 observations as panel data
df.replace('.',0, inplace=True)
for col in df.columns:
    df[col] = pd.to_numeric(df[col])
    
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 399 entries, 46 to 428
Data columns (total 45 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   CHAIN     399 non-null    int64  
 1   CO_OWNED  399 non-null    int64  
 2   STATE     399 non-null    int64  
 3   SOUTHJ    399 non-null    int64  
 4   CENTRALJ  399 non-null    int64  
 5   NORTHJ    399 non-null    int64  
 6   PA1       399 non-null    int64  
 7   PA2       399 non-null    int64  
 8   SHORE     399 non-null    int64  
 9   NCALLS    399 non-null    int64  
 10  EMPFT     399 non-null    float64
 11  EMPPT     399 non-null    float64
 12  NMGRS     399 non-null    float64
 13  WAGE_ST   399 non-null    float64
 14  INCTIME   399 non-null    float64
 15  FIRSTINC  399 non-null    float64
 16  BONUS     399 non-null    int64  
 17  PCTAFF    399 non-null    float64
 18  MEALS     399 non-null    int64  
 19  OPEN      399 non-null    float64
 20  HRSOPEN   399 non-null    float

In [2]:
#The authors assume a part time employee to count as 0.5
df['Tot Emp Feb']=df['EMPPT']*.5 + df['EMPFT'] + df['NMGRS']                                            
df['Tot Emp Nov']=df['EMPPT2']*.5 + df['EMPFT2'] + df['NMGRS2']

In [3]:
df[['STATE','Tot Emp Feb','Tot Emp Nov']].groupby('STATE').mean()

Unnamed: 0_level_0,Tot Emp Feb,Tot Emp Nov
STATE,Unnamed: 1_level_1,Unnamed: 2_level_1
0,23.25,20.958333
1,20.344393,21.179907


In [4]:
# check by calculating the mean for each group directly
# 0 PA control group, 1 NJ treatment group

mean_emp_pa_before = df[['STATE','Tot Emp Feb','Tot Emp Nov']].groupby('STATE').mean().iloc[0, 0]
mean_emp_pa_after = df[['STATE','Tot Emp Feb','Tot Emp Nov']].groupby('STATE').mean().iloc[0, 1]
mean_emp_nj_before = df[['STATE','Tot Emp Feb','Tot Emp Nov']].groupby('STATE').mean().iloc[1, 0]
mean_emp_nj_after = df[['STATE','Tot Emp Feb','Tot Emp Nov']].groupby('STATE').mean().iloc[1, 1]

print(f'mean PA employment before: {mean_emp_pa_before:.2f}')
print(f'mean PA employment after: {mean_emp_pa_after:.2f}')
print(f'mean NJ employment before: {mean_emp_nj_before:.2f}')
print(f'mean NJ employment after: {mean_emp_nj_after:.2f}')

pa_diff = mean_emp_pa_after - mean_emp_pa_before
nj_diff = mean_emp_nj_after - mean_emp_nj_before
did = nj_diff - pa_diff

print(f'DID in mean employment is {did:.2f}')

mean PA employment before: 23.25
mean PA employment after: 20.96
mean NJ employment before: 20.34
mean NJ employment after: 21.18
DID in mean employment is 3.13


Contradicting economic theory, with the increase in the minimum wage in NJ, the results show an average increase in FTE. In PA, there was a decrease in the average FTE even though the minimum wage remained the same. However, the dataset is not equally split as there are more observations in NJ than in PA, which can lead to misleading results. Besides this, there is still the issue of assuming that employment in PA restaurants has similar behaviour to NJ restaurants.

The same DID result can be obtained via regression, which allows adding control variables if needed:

$y = \beta_0 + \beta_1 * g + \beta_2 * t + \beta_3 * (t * g) + \varepsilon$

- g is 0 for the control group and 1 for the treatment group
- t is 0 for before and 1 for after

we can insert the values of g and t using the table below and see that coefficient ($\beta_3$) of the interaction of g and t is the value for DID：

|              | Control Group (g=0) | Treatment Group (g=1)                   |                 |
|--------------|---------------------|-----------------------------------------|-----------------|
| Before (t=0) | $\beta_0$           | $\beta_0 + \beta_1$                     |                 |
| After (t=1)  | $\beta_0 + \beta_2$ | $\beta_0 + \beta_1 + \beta_2 + \beta_3$ |                 |
| Difference   | $\beta_2$           | $\beta_2 + \beta_3$                     | $\beta_3$ (DID) |


In [5]:
# group g: 0 control group (PA), 1 treatment group (NJ)
# t: 0 before treatment (min wage raise), 1 after treatment
# gt: interaction of g * t

# data before the treatment
df_before = df[['Tot Emp Feb', 'STATE']]
df_before['t'] = 0
df_before.columns = ['Tot_Emp', 'g', 't']
df_before = df_before.reset_index(drop=True)

# data after the treatment
df_after = df[['Tot Emp Nov', 'STATE']]
df_after['t'] = 1
df_after.columns = ['Tot_Emp', 'g', 't']
df_after = df_after.reset_index(drop=True)
# data for regression
df_reg = pd.concat([df_before, df_after])

# create the interaction 
df_reg['gt'] = df_reg.g * df_reg.t

df_reg

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_before['t'] = 0
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_after['t'] = 1


Unnamed: 0,Tot_Emp,g,t,gt
0,40.50,0,0,0
1,13.75,0,0,0
2,8.50,0,0,0
3,34.00,0,0,0
4,24.00,0,0,0
...,...,...,...,...
394,23.75,1,1,1
395,17.50,1,1,1
396,20.50,1,1,1
397,20.50,1,1,1


With this change we have a total of 798 observations (399 × 2). Then we estimated the model using Ordinary Least Squares (Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lag and without small sample correction) and got the following output:

In [6]:
# regression via statsmodels
# result is not significant 

from statsmodels.formula.api import ols
ols = ols('Tot_Emp ~ g + t + gt', data=df_reg).fit(cov_type='HAC',cov_kwds={'maxlags':1})
print(ols.summary())

                            OLS Regression Results                            
Dep. Variable:                Tot_Emp   R-squared:                       0.008
Model:                            OLS   Adj. R-squared:                  0.004
Method:                 Least Squares   F-statistic:                     1.417
Date:                Tue, 23 May 2023   Prob (F-statistic):              0.236
Time:                        13:40:29   Log-Likelihood:                -2917.6
No. Observations:                 798   AIC:                             5843.
Df Residuals:                     794   BIC:                             5862.
Df Model:                           3                                         
Covariance Type:                  HAC                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     23.2500      1.393     16.686      0.0

The p-value in this example is not significant for a significance level of 5%. Therefore, we reject the null hypothesis, which means that the average total number of employees per restaurant increased after the minimal salary raise by 3.12 FTE. However, the result may be just due to random factors. Even if we consider a level of significance of 10%, we do not reject the null hypothesis and consider this coefficient equal to zero and not negative.

The authors of the original paper did not use HAC robust standard errors. However, these results are consistent with them, as there is no evidence that the rise in NJ minimum wage reduced employment at fast-food restaurants in the state, as this coefficient is statistically higher than 0. Moreover, the authors found an increase in prices of fast-food meals in NJ relative to PA, suggesting that much of the burden of the minimum-wage rise was passed on to consumers. Furthermore, the employment of low-wage workers rose in NJ compared to PA, implying a shift of people from PA to NJ to look for better salaries.
