# Tutorial

We now illustrate the basic capabilities of the ``respy`` package in a simple tutorial. The tutorial covers the following topics:

1. [How to specify a model - parameters and options.](#How-to-specify-a-model---parameters-and-options.)
2. [How to solve a model.](#How-to-solve-a-model.)
3. [How to simulate a data set.](#How-to-simulate-a-data-set.)
4. [How to estimate model parameters.](#How-to-estimate-model-parameters.)

After the [installation](installation.rst), the convention is to import ``respy`` as follows to get access to all exposed functions.

In [1]:
import respy as rp

## How to specify a model - parameters and options.

In order to define a model, ``respy`` relies on two objects - parameters and options. The difference between the two is that parameters are affected by the optimization whereas options stay the same. Here, only some parameters and options are explained. For a complete overview on parameters and options, visit the section [model specification](../software/model-specification.rst).

As an example, we turn to the first model of Keane and Wolpin (1994). The following function returns the parameters and options and the original dataset used by the authors. Passing ``with_data=False`` prevents the function to return a data set.

In [2]:
params, options = rp.get_example_model("kw_94_one", with_data=False)

As said before, options are everything not affected by the optimization. The value under ``"n_periods"`` defines the number of periods in the model meaning the available time frame in which individual can make their decisions.

The rest of the options includes seeds, numbers of draws to simulate the flow utilities in each period and other model parameters. Furthermore, there is a lot of logic hidden from the user to offer a comprehensible starting point. For a deeper understanding of the mechanics, the user is refered to the section [model specfication](../software/model-specification.rst).

In [3]:
options

{'estimation_draws': 200,
 'estimation_seed': 500,
 'estimation_tau': 500,
 'interpolation_points': -1,
 'n_periods': 40,
 'simulation_agents': 1000,
 'simulation_seed': 132,
 'solution_draws': 500,
 'solution_seed': 1,
 'core_state_space_filters': ["period > 0 and exp_{i} == period and lagged_choice_1 != '{i}'",
  "period > 0 and exp_a + exp_b + exp_edu == period and lagged_choice_1 == '{j}'",
  "period > 0 and lagged_choice_1 == 'edu' and exp_edu == 0",
  "lagged_choice_1 == '{k}' and exp_{k} == 0",
  "period == 0 and lagged_choice_1 == '{k}'"],
 'covariates': {'constant': '1',
  'exp_a_square': 'exp_a ** 2',
  'exp_b_square': 'exp_b ** 2',
  'at_least_twelve_exp_edu': 'exp_edu >= 12',
  'not_edu_last_period': "lagged_choice_1 != 'edu'",
  'edu_ten': 'exp_edu == 10'}}

The parameter specification includes all parameters of the model which are affected by the optimization routine and some more.

In [4]:
params

Unnamed: 0_level_0,Unnamed: 1_level_0,value,comment
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1
delta,delta,0.95,discount factor
wage_a,constant,9.21,log of rental price
wage_a,exp_edu,0.038,return to an additional year of schooling
wage_a,exp_a,0.033,return to same sector experience
wage_a,exp_a_square,-0.0005,"return to same sector, quadratic experience"
wage_a,exp_b,0.0,return to other sector experience
wage_a,exp_b_square,0.0,"return to other sector, quadratic experience"
wage_b,constant,8.48,log of rental price
wage_b,exp_edu,0.07,return to an additional year of schooling
wage_b,exp_b,0.067,return to same sector experience


## How to solve a model.

The solution of the model is the value of each state within the model assuming optimal decision in this and future periods. In ``respy``, the solution is represented with the :class:`~respy.state_space.StateSpace`, a class including the model solution. The solution combines the following steps:

- Building the state space.
- Solving the model by finding optimal decision in each state via backward induction.

The ``state_space`` is normally hidden from the user and only an internal construct. For a better understanding of how modeling works and for debugging, the state space is exposed to the user by :func:`~respy.solve.solve`.

In [5]:
state_space = rp.solve(params, options)

## How to simulate a data set.

The first step for simulating data is to build the simulation function. Then, pass the parameter vector to the function and retrieve a simulated data set.

In [6]:
simulate = rp.get_simulate_func(params, options)

In [7]:
df = simulate(params)

The step of building a simulate function may seem odd at first, but it has two advantages. First, this function can be easily used for estimation with simulated method of moments. An optimizer passes a new parameter vector to the function and receives data for matching moments. Second, while building the function, auxiliary components are created and attached to ``simulate()`` via :func:`functools.partial`. Thus, repetitive calls will be much faster. But, to simulate a different model where not only parameter values have changed, you need to build a new ``simulate()`` function.

## How to estimate model parameters.

To estimate model parameters via maximum likelihood, ``respy`` relies on [``estimagic``](https://github.com/OpenSourceEconomics/estimagic), an open-source tool to estimate structural models and more. First, we load the model again, because now we need the data. Actually, the data is the same as the previously simulated data.

In [8]:
params, options, df = rp.get_example_model("kw_94_one")

Second, we need the criterion function of the model which returns the likelihood value of the data given the model.

In [9]:
crit_func = rp.get_crit_func(params, options, df)

The criterion function takes the parameters as an input and returns the likelihood value. Depending on your computer the evaluation of the likelihood takes 1-3 seconds. In the optimization process, this routine is repeated over and over again for some thousand different parameter vectors.

In [10]:
%time crit_val = crit_func(params)
crit_val

Wall time: 1.89 s


-381.5728797655344

To estimate the model parameters, import the :func:`~estimagic.optimization.optimize.minimize` from ``estimagic``.

In [11]:
import numpy as np
from estimagic.optimization.optimize import maximize

For ``estimagic``, we need to pass constraints on the parameters in a list containing dictionaries. Each dictionary is a constraint. A constraint includes two components: First, we need to tell ``estimagic`` which parameters we want to constrain. This is achieved by specifying an index location which will be passed to `df.loc` or `df.query`. Then, define the type of the constraint. First, we impose the constraint that the shock parameters have to be valid variances and correlations. The last three contraints ensures that all individuals start with ten years of schooling, being in school the last period and that they can only ten additional years in school.

In [12]:
constr = rp.get_parameter_constraints("kw_94_one")
constr

[{'loc': 'shocks_sdcorr', 'type': 'sdcorr'},
 {'loc': 'lagged_choice_1_edu', 'type': 'fixed'},
 {'loc': 'initial_exp_edu', 'type': 'fixed'},
 {'loc': 'maximum_exp', 'type': 'fixed'}]

Optionally, we can add a column ``"group"`` which is identical to the category column. The estimagic dashboard will then contain one parameter convergence plot per group instead of plotting all parameters in the same figure. Since respy has quite many parameters, this will make the plots much more readable.

In [13]:
params["group"] = params.index.get_level_values('category')

In [14]:
results, params = maximize(
    crit_func,
    params,
    "scipy_L-BFGS-B",
    db_options={"rollover": 200},
    algo_options={"maxfun": 1},
    constraints=constr,
    dashboard=True
)

  wages = np.clip(np.exp(log_wages), 0.0, HUGE_FLOAT)
  options["estimation_tau"],


You do not have to worry about warnings. During the estimation, the optimizer might try parameters which are to extreme and lead to infinite values or NaNs.

The estimation runs for approximately 90 evaluations and then stops because of ``"maxfun"``.

Look at the results by choosing the parameter vector or detailed information in the ``results`` dictionary.

## References

> Keane, M. P. and  Wolpin, K. I. (1994). [The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence](https://doi.org/10.2307/2109768). *The Review of Economics and Statistics*, 76(4): 648-672.