# Implementing a model with observed state space components

This notebook shows how to introduce observable characteristics of an individual to the state space. A potential source of unobserved heterogeneity in the models of Keane and Wolpin (1994) and Keane and Wolpin (1997) stems from the fact that individual ability is not observed. The authors try to mitigate the influence by using a finite fixture model with four different types as the years of schooling at the start of the model horizon are potentially not exogenous. If we had data on ability, we could probably shift some of the explanatory power of types to an ability covariate. Furthermore, making type probabilities dependent on the ability level, types become more economically interpretable.

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import respy as rp

## Parameters, options and data

As we have no ability measure in the original data of Keane and Wolpin (1997), we assume that the initial years of schooling serve as a five point scale ability measure. The measure should start at 0 which makes it more suitable to the model.

In [2]:
params, options, df = rp.get_example_model("kw_97_base")

In [3]:
# We have to fill the NaNs in the initial period where lagged choices are unknown.
df.Lagged_Choice = df.Lagged_Choice.fillna("edu")

In [4]:
df["Ability"] = (
    df.groupby("Identifier").Experience_Edu.transform("first")
    .subtract(7)
    .astype(np.uint8)
)

Furthermore, we include covariates of our new ability measure in the parameter specification and define the covariates in the options. For simplification, we treat our ability measure similar to IQ which was originally used to determine basic mental capabilities. Our single covariate is thus having an ability level higher than zero. Still, we keep the five point scale instead of a simpler two point scale to determine the impact on the size of the state space.

The new covariate enters the wage component of working alternatives and the non-pecuniary component of non-working alternatives positively. 

In [5]:
# Add ability parameters to wage components.
for category in ["wage_a", "wage_b", "wage_mil"]:
    params.loc[(category, "at_least_one_ability"), :] = [
        0.1, np.nan, np.nan, "return to having at least ability level one"
    ]

# Add ability parameters to non-pecuniary components.
for category in ["nonpec_edu", "nonpec_home"]:
    params.loc[(category, "at_least_one_ability"), :] = [
        2000, np.nan, np.nan, "return to having at least ability level one"
    ]

# Add ability parameters to type proobabilities.
for category in ["type_2", "type_3", "type_4"]:
    params.loc[(category, "at_least_one_ability"), :] = [
        0.1, np.nan, np.nan, "return to having at least ability level one"
    ]

# Define the probability for ability levels for the simulation.
for name, val in zip(
    [f"level_{i}" for i in range(5)], [0.00981, 0.0431, 0.201, 0.6702, 0.0759]
):
    params.loc[("ability", name), :] = [
        val, np.nan, np.nan, "Probability of having the specified ability level"
    ]

As the ``"name"`` column in the parameter dataframe is matched to covariates, we have to define ``"at_least_one_ability"``. The string under ``options["covariates"]["at_least_one_ability"]`` is evaluated using [pandas.DataFrame.eval()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.eval.html). Under ``options["observables"]["ability"]`` we store the range of ability levels which is five.

In [6]:
options["covariates"]["at_least_one_ability"] = "ability >= 1"
options["observables"] = {"ability": 5}

In [7]:
# For simplification we restrict the model to 11 periods.
options["n_periods"] = 11

Here, we solve the model.

In [8]:
state_space = rp.solve(params, options)

Here, we calculate the likelihood value of the data.

In [9]:
criterion = rp.get_crit_func(params, options, df)
crit_val = criterion(params)



In [10]:
crit_val

-52.03291393696251

Here, we simulate a new data set given the proportions of ability levels specified in ``params`` under ``"ability"``.

In [11]:
simulate = rp.get_simulate_func(params, options)
df = simulate(params)

In [12]:
df.loc[:, :"Type"].head(20)

Unnamed: 0,Identifier,Period,Choice,Wage,Experience_A,Experience_B,Experience_Mil,Experience_Edu,Lagged_Choice,Ability,Type
0,0,0,edu,,0,0,0,7,edu,3.0,3
1,0,1,edu,,0,0,0,8,edu,3.0,3
2,0,2,edu,,0,0,0,9,edu,3.0,3
3,0,3,edu,,0,0,0,10,edu,3.0,3
4,0,4,edu,,0,0,0,11,edu,3.0,3
5,0,5,edu,,0,0,0,12,edu,3.0,3
6,0,6,edu,,0,0,0,13,edu,3.0,3
7,0,7,edu,,0,0,0,14,edu,3.0,3
8,0,8,edu,,0,0,0,15,edu,3.0,3
9,0,9,edu,,0,0,0,16,edu,3.0,3


## References

> Keane, M. P. and  Wolpin, K. I. (1994). [The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence](https://doi.org/10.2307/2109768). *Federal Reserve Bank of Minneapolis*, No. 181.
>
> Keane, M. P. and Wolpin, K. I. (1997). [The Career Decisions of Young Men](https://doi.org/10.1086/262080>). *Journal of Political Economy*, 105(3): 473-522.