# Explorations

We show how to reproduce key results from:
    
    * Carneiro, P., Heckman, J. J, and Vytlacil, E. (2011). Estimating Marginal Returns to education
    
using the **grmpy** pacckage. They provide their own replication material on the AER website.

## Data Preparation

We do not have access to the exact dataset due to the confidentiality of the geographic information. However, we can create a mock dataset and merge the two datasets at random.

In [None]:
import pandas as pd
import grmpy

In [None]:
basic = pd.read_stata('basicvariables.dta')
local = pd.read_stata('localvariables.dta') 
df = pd.concat([basic, local], axis = 1)

# Delete redundant columns
for key_ in ['newid', 'caseid']:
    del df[key_]

# Add squared terms
for key_ in ['mhgc', 'cafqt', 'avurate','lurate_17', 'numsibs', 'lavlocwage17']:    
    str_ = key_ + 'sq'
    df[str_] = df[key_]**2
    
for j in ['pub4','lwage5_17' , 'lurate_17', 'tuit4c']:
    for i in ['cafqt', 'mhgc', 'numsibs']:
        df[j + i] = df[j] * df[i] 

df.to_pickle('aer-replication-mock.pkl')


## Estimation

What needs to be done?

* Create an initialization file that reflects the estimation setup in the paper.
* Run estimation using **grmpy** package.


In [None]:
grmpy.estimate('replication.grmpy.ini')

* Reproduce Figure 1 from the original paper. Even though we only create a mock dataset, our results will be frightlingly close. Thus, most of the result are driven by the observable cost shifters and not the instruments.

## Further Remarks

* This is an opportunity to experience the package from the user perspective. So, please implement (or at least write-down) all improvements to the package that cross your mind when using it. This inlcudes, in particular, updates to the documentation.
