# Explorations

We show how to reproduce key results from:
    
    * Carneiro, P., Heckman, J. J, and Vytlacil, E. (2011). Estimating Marginal Returns to education
    
using the **grmpy** pacckage. They provide their own replication material on the AER website.

## Data Preparation

We do not have access to the exact dataset due to the confidentiality of the geographic information. However, we can create a mock dataset and merge the two datasets at random.

In [16]:
import pandas as pd

In [21]:
basic = pd.read_stata('basicvariables.dta')
local = pd.read_stata('localvariables.dta') 
df = pd.concat([basic, local], axis = 1)

# Delete redundant columns
for key_ in ['newid', 'caseid']:
    del df[key_]

# Add squared terms
for key_ in ['mhgc', 'cafqt', 'avurate', 'numsibs', 'lavlocwage17']:    
    str_ = key_ + 'sq'
    df[str_] = df[key_]**2

df.to_pickle('aer-replication-mock.pkl')

df.head()

Unnamed: 0,urban14,numsibs,mhgc,school,d57,d58,d59,d60,d61,d62,...,tuit4c,lwage5,lurate,lwage5_17,lurate_17,mhgcsq,cafqtsq,avuratesq,numsibssq,lavlocwage17sq
0,1.0,1.0,12.0,4.0,0.0,0.0,0.0,1.0,0.0,0.0,...,21.351599,10.423455,6.9,9.999799,6.4,144.0,2.621313,44.029644,1.0,102.447792
1,1.0,1.0,12.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,...,21.351599,10.194988,5.0,10.013474,9.2,144.0,0.016474,43.389843,1.0,108.51152
2,1.0,4.0,12.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,...,20.034687,10.226495,5.9,10.040937,5.9,144.0,0.275769,44.029644,16.0,96.331223
3,1.0,1.0,12.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,...,18.422222,10.176557,6.9,10.021727,6.8,144.0,0.907398,44.029644,1.0,102.447792
4,1.0,2.0,12.0,3.0,0.0,1.0,0.0,0.0,0.0,0.0,...,14.816019,10.176557,6.9,9.977937,8.4,144.0,1.290971,44.029644,4.0,102.447792


## Estimation

What needs to be done?

* Create an initialization file that reflects the estimation setup in the paper.
* Run estimation using **grmpy** package.
* Reproduce Figure 1 from the original paper. Even though we only create a mock dataset, our results will be frightlingly close. Thus, most of the result are driven by the observable cost shifters and not the instruments.