# Mincer returns

In [1]:
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

pd.options.display.float_format = '${:,.2f}'.format

## Simulation

We start by simulating a dataset based on the accounting identity model.

In [2]:
P_0 = 239.15215950404396
kappa = 1.0
rho_0 = 0.075
rho_s = 0.1250
T = 55
num_agents = 1000



def log_observed_earnings(s, x):
    """This function simulates logarithmic earnings directly from the accounting-identify model."""
    rslt = 0
    rslt += np.log(P_0) - kappa
    rslt += rho_s * s
    rslt += (rho_0 * kappa + (rho_0*kappa)/ (2 * T) + kappa / T) * x
    rslt -= (rho_0 * kappa / (2 * T)) * (x ** 2) + np.random.normal(scale=0.1)

    return rslt

data = []
for i in range(num_agents):
    s = np.random.choice(range(10, 16))
    x = np.random.choice(range(1, T))
    y = log_observed_earnings(s, x)
    age = s + x + 6
    
    data += [[i, age, np.exp(y), s, x]]

Now we are ready to store the dataset.

In [3]:
columns = ['Identifier', 'Age', 'Earnings', 'Schooling', 'Experience']
df = pd.DataFrame(data, columns=columns)
df.set_index('Identifier', inplace=True)
df.to_pickle('data.mincer.pkl')

## Estimation

We can now load our simulated dataset and run the conventional Mincer regression.

In [4]:
df = pd.read_pickle('data.mincer.pkl')
df.head()

Unnamed: 0_level_0,Age,Earnings,Schooling,Experience
Identifier,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,55,"$3,866.34",13,36
1,27,$776.89,11,10
2,65,"$7,997.17",15,44
3,70,"$7,506.51",13,51
4,58,"$4,405.83",10,42


Now we can run the baseline regression.

In [5]:
formula = 'np.log(Earnings) ~ Schooling + Experience + np.square(Experience)'
model = smf.ols(formula=formula, data=df)
model.fit().summary()

0,1,2,3
Dep. Variable:,np.log(Earnings),R-squared:,0.987
Model:,OLS,Adj. R-squared:,0.987
Method:,Least Squares,F-statistic:,24990.0
Date:,"Fri, 29 Jun 2018",Prob (F-statistic):,0.0
Time:,11:48:59,Log-Likelihood:,874.72
No. Observations:,1000,AIC:,-1741.0
Df Residuals:,996,BIC:,-1722.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,4.4797,0.025,176.650,0.000,4.430,4.529
Schooling,0.1251,0.002,69.042,0.000,0.122,0.129
Experience,0.0920,0.001,108.370,0.000,0.090,0.094
np.square(Experience),-0.0007,1.5e-05,-44.821,0.000,-0.001,-0.001

0,1,2,3
Omnibus:,0.21,Durbin-Watson:,1.866
Prob(Omnibus):,0.9,Jarque-Bera (JB):,0.294
Skew:,-0.009,Prob(JB):,0.863
Kurtosis:,2.918,Cond. No.,10500.0


These results are designed so that they line up rather closely with the estimated coeffiecients reported in Table 2 for Whites in 1940.