# How to set up your own EKW-model

*Comment: We should start a liiitttle bit more general*

A model in respy is defined by two objects:

1. Parameters of the model reside in `params` which is a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). It includes the structural parameters of the model which define the model structure.  

    *Note:* It is not mandatory that parameters included in the `params` DataFrame have to be estimable. For example, a specified shock distribution may guide the model but exogenously set.
    

2. The object `options` specifies settings for the model solution and further restrictions on the model structure. Examples may include number of periods, type of numerical integration, unfeasible states etc. 


---
## `params`

The object `params` is a [multi-indexed pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html).
The first index, `category`, describes the parameters group and `name` specific parameters in the group. To talk about a more concrete example, we load the specification of the basic Robinson Crusoe economy.


In [2]:
import respy as rp

In [10]:
params, options = rp.get_example_model("kw_94_two", with_data=False)

In [11]:
params

Unnamed: 0_level_0,Unnamed: 1_level_0,value,comment
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1
delta,delta,0.95,discount factor
wage_a,constant,9.21,log of rental price
wage_a,exp_edu,0.04,return to an additional year of schooling
wage_a,exp_a,0.033,return to same sector experience
wage_a,exp_a_square,-0.0005,"return to same sector, quadratic experience"
wage_a,exp_b,0.0,return to other sector experience
wage_a,exp_b_square,0.0,"return to other sector, quadratic experience"
wage_b,constant,8.2,log of rental price
wage_b,exp_edu,0.08,return to an additional year of schooling
wage_b,exp_b,0.067,return to same sector experience


### The discount factor

The first parameter in every specification is the discount factor which is abbreviated with `delta`. The discount factor controls how utilities are aggregated over time periods.

In [4]:
params.loc[("delta", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value
category,name,Unnamed: 2_level_1
delta,delta,0.95


params

## The choice rewards

The choice rewards are central to define choices in general. First of all, each choice can have two categories of reward parameters which are named `wage_{choice}` and `nonpec_{choice}` in the `category` index level. `wage` and `nonpec` refer to the two components of utility functions for Eckstein-Keane-Wolpin models. In a nutshell, the utility of a working alternative is defined by the sum of a wage and a nonpecuniary component. The nonpecuniary reward $N$ is a vector product of parameters $\beta^N$ and covariates $x^N$. The wage $W$ is defined as the exponential function of a vector product of parameters $\beta^W$ and covariates $x^W$ plus a normally distributed shock $\epsilon$.

$$
    U = W + N = \exp\{x^W\beta^W + \epsilon\} + x^N\beta^N
$$

For non-working alternatives, $W = 0$ and the normally distributed shock is added to the nonpecuniary reward.

$$
    U = N = x^N\beta^N + \epsilon
$$

Going back to the reward groups, `wage_{choice}` and `nonpec_{choice}` contain the parameters $\beta$ for the vector product of the respective utility component.

Let us take a look at the reward parameter for `fishing` and focus on `wage_fishing` first. The group contains a single parameter whose value is 0.1. Now, how is the parameter $\beta$ multiplied with the correct covariate $x$? This is achieved by the `name` index level of the parameter. In this case, `exp_fishing` is the name of a column in an internal DataFrame which contains the experience in fishing for an individual. After the vector product is computed, a shock which is drawn internally is added to the sum and the value is exponentiated to receive the wage component for the choice `fishing`.

In [5]:
params.loc[("wage_fishing", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value
category,name,Unnamed: 2_level_1
wage_fishing,exp_fishing,0.1


`exp_{choice}` is a special variable in the sense that its name is predefined within respy. If experience can be accumulated by choosing a certain action, you can always refer to it with `exp_{choice}`. At the same time, whether a choice allows experience accumulation or not is inferred from whether a covariate with the name `exp_{choice}` is used. Also, if a choice has a wage, it automatically allows for experience accumulation.

Let's move to the nonpecuniary reward of `fishing`. It has one parameter as well which is named `constant` and has the value -1. It might be obvious by the name that the covariate named `constant` has the value one. Is `constant` also an automatically generated internal variable? No, it is not. Instead, this covariate has to be defined by the user. This will be later explained in [formulas of covariates](#The-formulas-of-covariates).

In [6]:
params.loc[("nonpec_fishing", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value
category,name,Unnamed: 2_level_1
nonpec_fishing,constant,-1.0


Let us go over the choice rewards for the second choice `hammock`. Since the choice has no `wage_hammock` entry it is a non-working alternative which has only a nonpecuniary utility component. Since no other choice parameter uses `exp_hammock`, the choice does not allow for experience accumulation. There exist two parameters. The first parameter also uses the covariate `constant` and has the value 2.5. The second parameter `not_fishing_last_period` is later explained in detail, but for now, it is sufficient to know that Robinson receives a penalty of -1 if he has not been fishing the last period.

In [7]:
params.loc[("nonpec_hammock", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value
category,name,Unnamed: 2_level_1
nonpec_hammock,constant,2.5
nonpec_hammock,not_fishing_last_period,-1.0


## The shock distribution

So far, we have said that $\epsilon$ in the utility function is drawn internally. The shocks to the utility functions are jointly normally distributed with $\mathcal{N}(0, \Sigma)$ and independent over time. To specify $\Sigma$, there exist three options.

For all options, imagine a matrix with as many rows and columns as there are choices. The choices have the following order.

1. All working alternatives alphabetically sorted.
2. All non-working alternatives with experience accumulation alphabetically sorted.
3. All remaining alternatives alphabetically sorted.

Because covariance matrices are symmetric, it is sufficient to specify the lower triangular of the matrix.

The first option is to specify the lower triangular of the standard deviation/correlation matrix of $\Sigma$ under the index category `shocks_sdcorr`. The first parameters in this category are the standard deviations with parameter names `sd_{choice}`. The following parameters are the correlations ordered by rows with the name `corr_{choice_2}_{choice_1}` and so on.

In [8]:
params.loc[("shocks_sdcorr", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value
category,name,Unnamed: 2_level_1
shocks_sdcorr,sd_fishing,1.0
shocks_sdcorr,sd_hammock,1.0
shocks_sdcorr,corr_hammock_fishing,-0.2


The second option is to specify the variance-covariance matrix. The parameters are ordered by appearance in the lower triangular. Variances have the name `var_{choice}` and covariances `cov_{choice_2}_{choice_1}` and so forth.

The third option is the Cholesky factor of the variance-covariance matrix ordered by appearance in the lower triangular. The labels are either `chol_{choice}` or `chol_{choice_2}_{choice_1}` and so forth.

## Previous choices

See the how-to guide on [initial conditions](how_to_specify_the_initial_conditions.ipynb) explains this feature in more detail.

In [10]:
params.loc[("lagged_choice_1_hammock", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value
category,name,Unnamed: 2_level_1
lagged_choice_1_hammock,constant,1.0


# `options`

## The formulas of covariates

In the subsection on the [parameterization of the choice rewards](#The-choice-rewards), two covariates were discussed, `constant` and `not_fishing_last_period`, which are not internally defined. Instead, the user has to provide information to compute the covariates. In respy, this is done by using :func:`pandas.eval` which takes a formula and a DataFrame and computes the results.

For the variable named `constant`, :func:`pandas.eval` returns 1 for every individual.

The covariate `not_fishing_last_period` is an boolean variable and should be active if Robinson was not fishing the previous period. The formula takes the internal name `lagged_choice_1` and compares it to a choice name. `lagged_choice_1` was previously explained in the subsection on [previous choices](#Previous-choices).

In [11]:
options["covariates"]

{'constant': '1', 'not_fishing_last_period': "lagged_choice_1 != 'fishing'"}

The order in which you specify the covariates is not important.