# Tutorial

We now illustrate the basic capabilities of the ``respy`` package in a simple tutorial. The tutorial covers the following topics:

1. [How to specify a model - parameters and options.](#How-to-specify-a-model---parameters-and-options.)
2. [How to solve a model.](#How-to-solve-a-model.)
3. [How to simulate a data set.](#How-to-simulate-a-data-set.)
4. [How to estimate model parameters.](#How-to-estimate-model-parameters.)

After the [installation](installation.rst), the convention is to import ``respy`` as follows to get access to all exposed functions.

In [1]:
import respy as rp

## How to specify a model - parameters and options.

In order to define a model, ``respy`` relies on two objects - parameters and options. The difference between the two is that parameters are affected by the optimization whereas options stay the same. Here, only some parameters and options are explained. For a complete overview on parameters and options, visit the section [model specification](../software/model-specification.rst).

As an example, we turn to the first model of Keane and Wolpin (1994). The following function returns the parameters and options and the original dataset used by the authors. Passing ``with_data=False`` prevents the function to return a data set.

In [2]:
params, options = rp.get_example_model("kw_94_one", with_data=False)

As said before, options are everything not affected by the optimization. The value under ``options["n_periods"]`` defines the number of periods in the model meaning the available time frame in which individual can make their decisions.

Choices which are available to individuals within the model are specified under ``options["choices"]``. An empty dictionary as a value signals that this choice relies on default values generated inside of ``respy`` or does not need information at all. Under ``"edu"`` we can see which options are available for each choice.

- ``"max"`` defines the maximum amount of experience which can be accumulated by individuals using this choice.
- ``"start"`` is a list and contains different initial experience levels of this choice. Here, individuals start with 10 years of experience.
- ``"lagged"`` defines the share of individual with this choice and initial experience level having ``"edu"`` as the initial lagged choice when entering the model.
- ``"share"`` influences the probability of being a particular type.

The rest of the options includes seeds, numbers of draws to simulate the flow utilities in each period and other model parameters. Furthermore, there is a lot of logic hidden from the user to offer a comprehensible starting point. For a deeper understanding of the mechanics, the user is refered to the section [model specfication](../software/model-specification.rst).

In [3]:
options

{'choices': {'edu': {'max': 20, 'start': [10], 'lagged': [1], 'share': [1]}},
 'estimation_draws': 200,
 'estimation_seed': 500,
 'estimation_tau': 500,
 'interpolation_points': -1,
 'n_periods': 40,
 'simulation_agents': 1000,
 'simulation_seed': 132,
 'solution_draws': 500,
 'solution_seed': 456}

The parameter specification includes all parameters of the model which are affected by the optimization routine.

In [4]:
params.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,value,lower,upper,comment
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
delta,delta,0.95,0.7,1.0,discount factor
wage_a,constant,9.21,,,log of rental price if the base skill endowmen...
wage_a,exp_edu,0.038,,,linear return to an additional year of schooli...
wage_a,exp_a,0.033,,,"return to experience, same sector, linear (wage)"
wage_a,exp_a_square,-0.05,,,"return to experience, same sector, quadratic (..."


## How to solve a model.

The solution of the model is the value of each state within the model assuming optimal decision in this and future periods. In ``respy``, the solution is represented with the [StateSpace()](../_generated/respy.state_space.StateSpace.rst#respy.state_space.StateSpace), a class including the model solution. The solution combines the following steps:

- Building the state space.
- Solving the model by finding optimal decision in each state via backward induction.

The ``state_space`` is normally hidden from the user and only an internal construct. For a better understanding of how modeling works and for debugging, the state space is exposed to the user by [solve()](../_generated/respy.solve.solve.rst#respy.solve.solve).

In [None]:
state_space = rp.solve(params, options)

## How to simulate a data set.

The first step for simulating data is to build the simulation function. Then, pass the parameter vector to the function and retrieve a simulated data set.

In [None]:
simulate = rp.get_simulate_func(params, options)

In [None]:
df = simulate(params)

The step of building a simulate function may seem odd at first, but it has two advantages. First, this function can be easily used for estimation with simulated method of moments. An optimizer passes a new parameter vector to the function and receives data for matching moments. Second, while building the function, auxiliary components are created and attached to ``simulate()`` via [functools.partial()](https://docs.python.org/3.7/library/functools.html#functools.partial). Thus, repetitive calls will be much faster. But, to simulate a different model where not only parameter values have changed, you need to build a new ``simulate()`` function.

## How to estimate model parameters.

To estimate model parameters via maximum likelihood, ``respy`` relies on [estimagic](https://github.com/OpenSourceEconomics/estimagic), an open-source tool to estimate structural models and more. First, we load the model again, because now we need the data.

In [None]:
params, options, df = rp.get_example_model("kw_94_one")

We make some small adjustments to the options. ``options["estimation_tau"]`` is a smoothing parameter in case choice probabilities become zero. ``options["estimation_draws"]`` defines the number draws for the Monte Carlo simulation of choice probabilities.

In [None]:
options['estimation_tau'] = 50
options['estimation_draws'] = 2000

Second, we need the criterion function of the model which returns the likelihood value of the data given the model.

In [None]:
crit_func = rp.get_crit_func(params, options, df)

The criterion function takes the parameters as an input and returns the likelihood value.

In [None]:
crit_val = crit_func(params)
crit_val

To estimate the model parameters, import the ``maximize()`` from ``estimagic``.

In [None]:
import numpy as np
from estimagic.optimization.optimize import maximize

For ``estimagic``, we need to pass constraints on the parameters in a list containing dictionaries. Each dictionary is a constraint. A constraint includes two components: First, we need to tell ``estimagic`` which paramters we want to constrain. This is achieved by specifying an index location which will be passed to `df.loc`. Then, define the type of the constraint. Here, we only impose the constraint that the shock parameters have to be valid variances and correlations.

In [None]:
constr = [{"loc": "shocks", "type": "sdcorr"}, {"loc": "delta", "type": "fixed", "value": 0.95}]

Optionally, we can add a column ``params["group"]`` which is identical to the category column. The estimagic dashboard will then contain one parameter convergence plot per group instead of plotting all parameters in the same figure. Since respy has quite many parameters, this will make the plots much more readable.

In [None]:
params["group"] = params.index.get_level_values('category')
params['lower'].fillna(-np.inf, inplace=True)
params['upper'].fillna(np.inf, inplace=True)
params

In [None]:
results, params = maximize(crit_func, params, "scipy_L-BFGS-B", db_options={"rollover": 200}, algo_options={"maxfun": 1}, constraints=constr, dashboard=True)

Look at the results by choosing the parameter vector or detailed information in the ``results`` dictionary.

In [None]:
results.keys()

In [None]:
results["fun"]

In [None]:
params

## References

> Keane, M. P. and  Wolpin, K. I. (1994). [The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence](https://doi.org/10.2307/2109768). *Federal Reserve Bank of Minneapolis*, No. 181.