  <h2>Earned Income Tax Credit and Labor Force Participation</h2>
  <ul>
    <li>EITC expanded in 1994 to include single women with children.</li>
    <li>Designed for low to moderate-income individuals and couples.</li>
    <li>Predicted to encourage labor force participation.</li>
  </ul>

  <h3>Research Question: Does EITC incentivize employment?</h3>

  <h3>Methodology - Econometrics:</h3>
  <ul>
    <li>Studies the relationship between discrete output and independent variables.</li>
    <li>Analyzes the impact of a variable (X) on a specific outcome (Y).</li>
  </ul>

  <h3>Comparison with Linear Regression:</h3>
  <ul>
    <li><strong>Linear Regression:</strong></li>
    <ul>
      <li>Suitable for continuous variables.</li>
      <li>Fits a straight line.</li>
      <li>Assumes normal distribution.</li>
    </ul>
    <li><strong>Logistic Regression:</strong></li>
    <ul>
      <li>Designed for discrete variables.</li>
      <li>Fits a curve.</li>
      <li>Assumes binomial distribution.</li>
    </ul>
  </ul>

In [13]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

In [4]:
df = pd.read_stata('csv/eitc.dta')

df.head()

Unnamed: 0,state,year,urate,children,nonwhite,finc,earn,age,ed,work,unearn
0,11.0,1991.0,7.6,0,1,18714.394273,18714.394273,26,10,1,0.0
1,12.0,1991.0,7.2,1,0,4838.568282,471.365639,22,9,1,4.367203
2,13.0,1991.0,6.4,2,0,8178.193833,0.0,33,11,0,8.178194
3,14.0,1991.0,9.1,0,1,9369.570485,0.0,43,11,0,9.36957
4,15.0,1991.0,8.6,3,1,14706.60793,14706.60793,23,7,1,0.0


In [8]:
df['post93'] = np.where(df['year'] > 1993, 1, 0)
df['mom'] = np.where(df['children'] > 0, 1, 0)
df['mom_post93'] = df['post93'] * df['mom']

df.head()

Unnamed: 0,state,year,urate,children,nonwhite,finc,earn,age,ed,work,unearn,post93,mom,mom_post93
0,11.0,1991.0,7.6,0,1,18714.394273,18714.394273,26,10,1,0.0,0,0,0
1,12.0,1991.0,7.2,1,0,4838.568282,471.365639,22,9,1,4.367203,0,1,0
2,13.0,1991.0,6.4,2,0,8178.193833,0.0,33,11,0,8.178194,0,1,0
3,14.0,1991.0,9.1,0,1,9369.570485,0.0,43,11,0,9.36957,0,0,0
4,15.0,1991.0,8.6,3,1,14706.60793,14706.60793,23,7,1,0.0,0,1,0


### Model1

In [21]:
# Isolate X and Y variables

Y = df.loc[:,'work'].values
X = df.loc[:,['post93', 'mom', 'mom_post93']].values

X_cons = sm.add_constant(X)
model1 = sm.Logit(Y,X_cons).fit()
model1.summary(yname = 'Work'
              ,xname = ('intercept', 'After 1993', 'Is mom', 'mom after 1993')
               , title = 'Impact of tax credit on employment - model1')

Optimization terminated successfully.
         Current function value: 0.686491
         Iterations 4


0,1,2,3
Dep. Variable:,Work,No. Observations:,13746.0
Model:,Logit,Df Residuals:,13742.0
Method:,MLE,Df Model:,3.0
Date:,"Wed, 20 Dec 2023",Pseudo R-squ.:,0.009118
Time:,21:23:53,Log-Likelihood:,-9436.5
converged:,True,LL-Null:,-9523.3
Covariance Type:,nonrobust,LLR p-value:,2.058e-37

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
intercept,0.3042,0.036,8.443,0.000,0.234,0.375
After 1993,-0.0085,0.053,-0.161,0.872,-0.112,0.095
Is mom,-0.5212,0.047,-10.985,0.000,-0.614,-0.428
mom after 1993,0.1885,0.070,2.708,0.007,0.052,0.325


### Model2

In [22]:
# Isolate X and Y variables

Y = df.loc[:,'work'].values
X = df.loc[:,['post93', 'mom', 'mom_post93','nonwhite','ed']].values

X_cons = sm.add_constant(X)
model2 = sm.Logit(Y,X_cons).fit()
model2.summary(yname = 'Work'
              ,xname = ('intercept', 'After 1993', 'Is mom', 'mom after 1993','Hisplanic or Black', 'Years of Education')
               , title = 'Impact of tax credit on employment - model2')

Optimization terminated successfully.
         Current function value: 0.680664
         Iterations 4


0,1,2,3
Dep. Variable:,Work,No. Observations:,13746.0
Model:,Logit,Df Residuals:,13740.0
Method:,MLE,Df Model:,5.0
Date:,"Wed, 20 Dec 2023",Pseudo R-squ.:,0.01753
Time:,21:28:22,Log-Likelihood:,-9356.4
converged:,True,LL-Null:,-9523.3
Covariance Type:,nonrobust,LLR p-value:,5.205e-70

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
intercept,-0.1687,0.071,-2.367,0.018,-0.308,-0.029
After 1993,-0.0046,0.053,-0.086,0.932,-0.108,0.099
Is mom,-0.5287,0.048,-10.986,0.000,-0.623,-0.434
mom after 1993,0.1973,0.070,2.817,0.005,0.060,0.335
Hisplanic or Black,-0.2199,0.036,-6.129,0.000,-0.290,-0.150
Years of Education,0.0687,0.007,10.270,0.000,0.056,0.082


In [25]:
# Dummy variables for placebo experiment

df['post92'] = np.where(df['year'] > 1992, 1, 0)
df['mom_post92'] = df['post92'] * df['mom']

# Prepare placebo dataset
df_placebo = df[df['year'] < 1994]

# Isolate X and Y variables

Y_placebo = df_placebo.loc[:,'work'].values
X_placebo = df_placebo.loc[:,['post92', 'mom', 'mom_post92']].values

X_placebo = sm.add_constant(X_placebo)
model_placebo = sm.Logit(Y_placebo,X_placebo).fit()
model_placebo.summary(yname = 'Work'
              ,xname = ('intercept', 'After 1992', 'Is mom', 'mom after 1992')
               , title = 'Impact of tax credit on employment - model Placebo')

Optimization terminated successfully.
         Current function value: 0.684872
         Iterations 4


0,1,2,3
Dep. Variable:,Work,No. Observations:,7401.0
Model:,Logit,Df Residuals:,7397.0
Method:,MLE,Df Model:,3.0
Date:,"Wed, 20 Dec 2023",Pseudo R-squ.:,0.01193
Time:,21:37:25,Log-Likelihood:,-5068.7
converged:,True,LL-Null:,-5130.0
Covariance Type:,nonrobust,LLR p-value:,2.29e-26

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
intercept,0.3124,0.044,7.154,0.000,0.227,0.398
After 1992,-0.0259,0.077,-0.335,0.737,-0.177,0.126
Is mom,-0.5138,0.057,-8.950,0.000,-0.626,-0.401
mom after 1992,-0.0239,0.102,-0.234,0.815,-0.224,0.176
