# Tutorial

We now illustrate the basic capabilities of the ``respy`` package in a simple tutorial. The tutorial covers the following topics:

1. [How to specify a model - parameters and options.](#How-to-specify-a-model---parameters-and-options.)
2. [How to solve a model.](#How-to-solve-a-model.)
3. [How to simulate a data set.](#How-to-simulate-a-data-set.)
4. [How to estimate model parameters.](#How-to-estimate-model-parameters.)

After the [installation](installation.rst), the convention is to import ``respy`` as follows to get access to all exposed functions.

In [2]:
import respy as rp

## How to specify a model - parameters and options.

In order to define a model, ``respy`` relies on two objects - parameters and options. The difference between the two is that parameters are affected by the optimization whereas options stay the same. Here, only some parameters and options are explained. For a complete overview on parameters and options, visit the section [model specification](../software/model-specification.rst).

As an example, we turn to the first model of Keane and Wolpin (1994). The following function returns the parameters and options and the original dataset used by the authors. Right now, we do not need the data.

In [3]:
params, options, _ = rp.get_example_model("kw_94_one")

In [4]:
_, df = rp.simulate(params, options)

In [5]:
crit_func = rp.get_crit_func(params, options, df)

As said before, options are everything not affected by the optimization. The value under ``"n_periods"`` defines the number of periods in the model meaning the available time frame in which individual can make their decisions.

Choices which are available to individuals within the model are specified under ``"choices"``. An empty dictionary as a value signals that this choice relies on default values generated inside of ``respy`` or does not need information at all. Under ``"edu"`` we can see which options are available for each choice.

- ``"max"`` defines the maximum amount of experience which can be accumulated by individuals using this choice.
- ``"start"`` is a list and contains different initial experience levels of this choice. Here, individuals start with 10 years of experience.
- ``"lagged"`` defines the share of individual with this choice and initial experience level having ``"edu"`` as the initial lagged choice when entering the model.
- ``"share"`` influences the probability of being a particular type.

The rest of the options includes seeds, numbers of draws to simulate the flow utilities in each period and other model parameters. Furthermore, there is a lot of logic hidden from the user to offer a comprehensible starting point. For a deeper understanding of the mechanics, the user is refered to the section [model specfication](../software/model-specification.rst).

The parameter specification includes all parameters of the model which are affected by the optimization routine.

In [6]:
params.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,value,fixed,lower,upper,comment
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
delta,delta,0.95,False,0.7,1.0,discount factor
wage_a,constant,9.21,False,,,skill rental price if the base skill endowment...
wage_a,exp_edu,0.038,False,,,linear return to an additional year of schooli...
wage_a,exp_a,0.033,False,,,"return to experience, same sector, linear (wage)"
wage_a,exp_a_square,-0.05,False,,,"return to experience, same sector, quadratic (..."


## How to solve a model.

The solution of the model is the value of each state within the model assuming optimal decision in this and future periods. In ``respy``, the solution is represented with the :class:`~respy.state_space.StateSpace`, a class including the model solution. The solution combines the following steps:

- Building the state space.
- Solving the model by finding optimal decision in each state via backward induction.

The ``state_space`` is normally hidden from the user and only an internal construct. For a better understanding of how modeling works, the state space is exposed to the user.

In [7]:
state_space = rp.solve(params, options)

## How to simulate a data set.

Simulating a data set is as easy as passing model inputs to :func:`~respy.simulate.simulate`. The return values are the ``state_space`` of the model and ``df``, a data set of simulated agents as a :class:`pandas.DataFrame`. The number of simulated agents can be set in the options under ``"simulation_agents"``.

In [8]:
state_space, df = rp.simulate(params, options)

## How to estimate model parameters.

To estimate model parameters via maximum likelihood, ``respy`` relies on [``estimagic``](https://github.com/OpenSourceEconomics/estimagic), an open-source tool to estimate structural models and more. First, we load the model again, because now we need the data.

In [9]:
params, options, df = rp.get_example_model("kw_94_one")

We make some small adjustments to the options.

In [10]:
options['estimation_tau'] = 50
options['estimation_draws'] = 2000

The dataset is almost in the correct format for the estimation. Note that lagged choices for the first period are missing. As a preliminary fix, we assume that all individuals were in school the previous period which is not unreasonable for individuals with age 16.

In [11]:
df.Lagged_Choice = df.Lagged_Choice.fillna("edu")

Second, we need the criterion function of the model which returns the likelihood value of the data given the model.

In [12]:
options

{'choices': {'edu': {'max': 20, 'start': [10], 'lagged': [1], 'share': [1]}},
 'estimation_draws': 2000,
 'estimation_seed': 500,
 'estimation_tau': 50,
 'interpolation_points': -1,
 'n_periods': 40,
 'simulation_agents': 1000,
 'simulation_seed': 132,
 'solution_draws': 500,
 'solution_seed': 456}

In [13]:
crit_func = rp.get_crit_func(params, options, df)



The criterion function takes the parameters as an input and returns the likelihood value.

In [14]:
crit_val = crit_func(params)
crit_val

  draws[:, :n_wages] = np.clip(np.exp(draws[:, :n_wages]), 0.0, HUGE_FLOAT)


-599.6587249586594

To estimate the model parameters, import the :func:`~estimagic.optimization.optimize.minimize` from ``estimagic``.

In [15]:
import numpy as np
from estimagic.optimization.optimize import maximize

For ``estimagic``, we need to pass constraints on the parameters in a list containing dictionaries. Each dictionary is a constraint. A constraint includes two components: First, we need to tell ``estimagic`` which paramters we want to constrain. This is achieved by specifying an index location which will be passed to `df.loc`. Then, define the type of the constraint. Here, we only impose the constraint that the shock parameters have to be valid variances and correlations.

In [16]:
constr = [{"loc": "shocks", "type": "sdcorr"}, {"loc": "delta", "type": "fixed", "value": 0.95}]

Optionally, we can add a column ``"group"`` which is identical to the category column. The estimagic dashboard will then contain one parameter convergence plot per group instead of plotting all parameters in the same figure. Since respy has quite many parameters, this will make the plots much more readable.

In [17]:
params["group"] = params.index.get_level_values('category')
params['lower'].fillna(-np.inf, inplace=True)
params['upper'].fillna(np.inf, inplace=True)
params

Unnamed: 0_level_0,Unnamed: 1_level_0,value,fixed,lower,upper,comment,group
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
delta,delta,0.95,False,0.7,1.0,discount factor,delta
wage_a,constant,9.21,False,-inf,inf,skill rental price if the base skill endowment...,wage_a
wage_a,exp_edu,0.038,False,-inf,inf,linear return to an additional year of schooli...,wage_a
wage_a,exp_a,0.033,False,-inf,inf,"return to experience, same sector, linear (wage)",wage_a
wage_a,exp_a_square,-0.05,False,-inf,inf,"return to experience, same sector, quadratic (...",wage_a
wage_a,exp_b,0.0,False,-inf,inf,"return to experience, other civilian sector, l...",wage_a
wage_a,exp_b_square,0.0,False,-inf,inf,"return to experience, other civilian sector, q...",wage_a
wage_a,hs_graduate,0.0,False,-inf,inf,skill premium of having finished high school (...,wage_a
wage_a,co_graduate,0.0,False,-inf,inf,skill premium of having finished college (wage),wage_a
wage_a,period,0.0,False,-inf,inf,linear age effect (wage),wage_a


In [None]:
maximize(crit_func, params, "nlopt_newuoa_bound", db_options={"rollover": 200}, constraints=constr, dashboard=True)

  draws[:, :n_wages] = np.clip(np.exp(draws[:, :n_wages]), 0.0, HUGE_FLOAT)


Bokeh app running at: http://localhost:37183/


  draws[:, :n_wages] = np.clip(np.exp(draws[:, :n_wages]), 0.0, HUGE_FLOAT)


## References

> Keane, M. P. and  Wolpin, K. I. (1994b). [The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence](https://www.minneapolisfed.org/research/staff-reports/the-solution-and-estimation-of-discrete-choice-dynamic-programming-models-by-simulation-and-interpolation-monte-carlo-evidence). *Federal Reserve Bank of Minneapolis*, No. 181.