<div class="alert alert-block alert-info">
 <h1> How to set up your own Eckstein-Keane-Wolpin model in respy </h1></div>


In this notebook we will explore the interface of respy in greater detail. In the following we will use the basic specification of the model established by [Keane and Wolpin (1994)](https://www.jstor.org/stable/2109768?seq=1#metadata_info_tab_contents), henceforth KW(1994), in their seminal paper. In respy, the specifications come pre-specified.

In general, a model in respy is defined by two objects:

1. Parameters of the model reside in `params` which is a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). It includes the structural parameters of the model which define the model structure.  

    *Note:* It is not mandatory that parameters included in the `params` DataFrame have to be estimable. For example, a specified shock distribution may guide the model but exogenously set.
    

2. The object `options` specifies settings for the model solution and further restrictions on the model structure. Examples may include number of periods, type of numerical integration, unfeasible states etc. 

As a first step, we will load the KW(94) basic specification from the `respy` example models.   
(Maybe: add location where to specify your own model).


In [11]:
import respy as rp

In [12]:
params, options = rp.get_example_model("kw_94_one", with_data=False)

<div class="alert alert-block alert-info">
 <h2> Params </h2></div>

The object `params` is a [multi-indexed pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html).

- The first index `category` describes the parameter group, e.g. (non-)pecuniary return equations, shock distribution etc.
- The second index `name` specifies the particular components of the model features, e.g. the return to state variables, particular distributional parameters etc.

In the case of the KW(94) model, individuals can choose between four alternatives:
1. Working in occupation a
2. Working in occupation b
3. Getting education
4. Staying at home. 

The corresponding `params` DataFrame is illustrated in the next cell. We will discuss each type of `category` chronologically in order of appearance.

**Note:** The order of entries in `category` has no further implications.

In [3]:
params

Unnamed: 0_level_0,Unnamed: 1_level_0,value,comment
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1
delta,delta,0.95,discount factor
wage_a,constant,9.21,log of rental price
wage_a,exp_edu,0.038,return to an additional year of schooling
wage_a,exp_a,0.033,return to same sector experience
wage_a,exp_a_square,-0.0005,"return to same sector, quadratic experience"
wage_a,exp_b,0.0,return to other sector experience
wage_a,exp_b_square,0.0,"return to other sector, quadratic experience"
wage_b,constant,8.48,log of rental price
wage_b,exp_edu,0.07,return to an additional year of schooling
wage_b,exp_b,0.067,return to same sector experience


### Discount Factor

In [4]:
params.loc[("delta", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value,comment
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1
delta,delta,0.95,discount factor


In `respy` the discount factor has a pre-defined and unmutable name: `delta`. (! Add link AM Explanation !)

## Choice Rewards

Recalling the explanation on EKW-models (! Add link AM Explanation !) structural models consists of two building blocks: states and choices. Choices in general can have two types of rewards: 
- **pecuniary rewards**, i.e. wages, with corresponding `category`: `wage_{choice}`
- **non-pecuniary rewards**, i.e. intrinsic value of education, , with corresponding `category`: `nonpec_{choice}`

**<span style="text-decoration:underline">Example</span>**

In the KW(94) case the **choices with pecuniary rewards** are 
- working in occupation a: {choice} = a, hence `wage_a`
- working in occupation b: {choice} = b, hence `wage_b`

The **choices with non-pecuniary rewards** are
- education: {choice} = edu, hence `nonpec_edu`
- staying at home: {choice} = home, hence `nonpec_edu`


### A pecuniary reward

Since the structure within rewards is similar, we will focus on the `category` `wage_a` for further exposition of a pecuniary reward.

In [5]:
params.loc[("wage_a", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value,comment
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1
wage_a,constant,9.21,log of rental price
wage_a,exp_edu,0.038,return to an additional year of schooling
wage_a,exp_a,0.033,return to same sector experience
wage_a,exp_a_square,-0.0005,"return to same sector, quadratic experience"
wage_a,exp_b,0.0,return to other sector experience
wage_a,exp_b_square,0.0,"return to other sector, quadratic experience"


The pecuniary reward associated with working in occupation a, `wage_a` is determined by state-specific returns. The index `name` collects all covariates where `value` captures the associated return. At its core feature `respy` facilitates a one-to-one mapping between theoretical model equation and computational implementation.

**<span style="text-decoration:underline">Example</span>**

KW(94) assume that `wage_a` for individual $i$ at time $t$ is determined as product of the skill price in that occupation, $r_{a}$, and the individual's skill level, captured by an exponential function. Formally the pecuniary reward is given by

$$ 
    R_{it}(a) = W_{it}(a) =r_{\alpha} \exp \Big( \beta_{a0} + \beta_{a1} h_{it} + \beta_{a2} k_{it}(a) +  \beta_{a3} k_{it}(a)^2  \beta_{a4} k_{it}(b) +  \beta_{a5} k_{it}(b)^2 + \epsilon_{it}(a) \Big),
$$

where $h_{it}$ captures the schooling in periods, and $k_{it}(o) $ captures the cumulated work experience in occupation $o \in \{a, b \}$ at period $t$. We will follow KW(94) and set $r_{\alpha} \equiv 1$. The state-variables and returns are mapped to the entries in `category` `wage_a` according to the following table


|    Covariate   	|    `name`    	||    Return     	|  `value`  	|
|:-------------:	|:------------:	||:------------:	|:---------:	|
|      $1$      	|   constant   	|| $\beta_{a0}$ 	|  $9.2100$ 	|
|    $h_{it}$   	|    exp_edu   	|| $\beta_{a1}$ 	|  $0.0380$ 	|
|  $k_{it}(a)$  	|     exp_a    	|| $\beta_{a2}$ 	|  $0.0330$ 	|
| $k_{it}(a)^2$ 	| exp_a_square 	|| $\beta_{a3}$ 	| $-0.0005$ 	|
|  $k_{it}(b)$  	|     exp_b    	|| $\beta_{a4}$ 	|  $0.0000$ 	|
| $k_{it}(b)^2$ 	| exp_b_square 	|| $\beta_{a5}$ 	|  $0.0000$ 	|


We can imagine the equation to be written as

$$
 \text{wage}_a = 9.2100 \cdot \text{constant} + 0.0380 \cdot \text{exp_edu} + 0.0330 \cdot \text{exp_a} -0.0005 \cdot \text{exp_a_square} + 0.0000 \cdot \text{exp_b} + 0.0000 \cdot \text{exp_b_square}.
$$



**Note:** The choice-specific shock to the skill level in occupation a is denoted $\epsilon_{it}(a)$. We will explain how to include those idiosyncratic, serially uncorrelated shocks when we arrive at the discussion of `category` `shocks_sdcorr`.

<div class="alert alert-block alert-warning">
  
<b>Modeling accumulation of experience - the special prefix "exp_{choice}"</b></div>

The variable construct `exp_{choice}` is pre-defined within respy. In the framework of EKW-models, experience can be accumulated by following a certain choice. To account for accumulation effect, e.g. the accumulation of occupation-specific experience, it is necessary to implement the covariate as `exp_{choice}`. Also, if using a model that was implemented with `respy` a covariate with structure `exp_{choice}` automatically indicates that the accumulation of experience is modelled.


**<span style="text-decoration:underline">Example</span>**

In the KW(94) specification the covariate `exp_a` allows for accumulation of experience obtained in occupation a, where the covariate `exp_edu` indicated that experience from choosing education in a period can be accumulated. However, the inspection of the params DataFrame shows that there is no `exp_` prefix for a home covariate. Hence, there is no accumulation when an individual decides to stay at home.


???? Ask Tobi: Also, if a choice has a wage, it automatically allows for experience accumulation ????

### A non-pecuniary reward

The structure of non-pecuniary rewards differs in its functional form. For expositional purpose we will focus on the non-pecuniary reward for education `category` `nonpec_edu`. 

In [13]:
params.loc[("nonpec_edu", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value,comment
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1
nonpec_edu,constant,0.0,constant reward for choosing education
nonpec_edu,at_least_twelve_exp_edu,0.0,"reward for going to college (tuition, etc.)"
nonpec_edu,not_edu_last_period,-4000.0,reward for going back to school


The non-pecuniary reward associated with education, `nonpec_edu` is determined by state-specific returns. The index `name` collects all covariates, and `value` captures the associated return. 

**<span style="text-decoration:underline">Example</span>**

In the basic specification of KW(94) the reward function of education is assumed to be linearly dependent upon finishing college and a penalty of re-enrollment to college. Formally the non-pecuniary reward is given by:

$$
    R_{it}(edu) = \beta_{e0} - \beta_{e1} {\bf{I}}(h_{it} \geq 13) + \beta_{e2}(1 - {\bf{I}}(d_{i,t-1} == edu)) + \epsilon_{it}(edu),
$$

where $h_{it}$ denotes the periods of education, and $d_{t-1}$ reflects the chosen alternative in period $t-1$. Hence, $(1 - {\bf{I}}(d_{t-1} == edu))$ reflect that education was not chosen in period $t-1$. The following table represents the mapping from the economic formulation into the computational implementation

|                        Covariate                       	|          `name`         	|    Return    	|  `value`  	|
|:------------------------------------------------------:	|:-----------------------:	|:------------:	|:---------:	|
|                           $1$                          	|         constant        	| $\beta_{e0}$ 	|   $0.0$   	|
|                 ${\bf{I}}(h_{it} \geq 13)$                	| at_least_twelve_exp_edu 	| $\beta_{e1}$ 	|   $0.0$   	|
| \begin{align}(1 - {\bf{I}}(d_{t-1} == edu))\end{align} 	|   not_edu_last_period   	| $\beta_{e2}$ 	| $-4000.0$ 	|


We can imagine the equation to be written as

$$
 \text{nonpec}_{\text{edu}} = 1 \cdot \text{constant} + 0.0 \cdot \text{at_least_twelve_exp_edu} - 4000.0 \cdot \text{not_edu_last_period}.
$$




????? Tobi-Question: Why do we specify the value as negative one? Is there a rule behind it ?????

### Specification Shock Distribution

For each choice reward, idiosyncratic and serially uncorrelated shocks alter the respective return. Those alternative-specific shocks are specified jointly in `category` `shocks_sdcorr`. Test [params](#Params)

In [17]:
params.loc[("shocks_sdcorr", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value,comment
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1
shocks_sdcorr,sd_a,0.2,"Element 1,1 of standard-deviation/correlation ..."
shocks_sdcorr,sd_b,0.25,"Element 2,2 of standard-deviation/correlation ..."
shocks_sdcorr,sd_edu,1500.0,"Element 3,3 of standard-deviation/correlation ..."
shocks_sdcorr,sd_home,1500.0,"Element 4,4 of standard-deviation/correlation ..."
shocks_sdcorr,corr_b_a,0.0,"Element 2,1 of standard-deviation/correlation ..."
shocks_sdcorr,corr_edu_a,0.0,"Element 3,1 of standard-deviation/correlation ..."
shocks_sdcorr,corr_edu_b,0.0,"Element 3,2 of standard-deviation/correlation ..."
shocks_sdcorr,corr_home_a,0.0,"Element 4,1 of standard-deviation/correlation ..."
shocks_sdcorr,corr_home_b,0.0,"Element 4,2 of standard-deviation/correlation ..."
shocks_sdcorr,corr_home_edu,0.0,"Element 4,3 of standard-deviation/correlation ..."


Shocks are **assumed to follow a mutivariate normal distribution** with zero mean and covariance matrix $\Sigma$. The **dimensionality** of the symmetric covariance matrix equals the number of modeled choices. The specification of $\Sigma$ remains in the discretion of the user. Because the symmetry of covariance matrices, it is sufficient to specify the lower triangular matrix. However, it is mandatory to follow the order which is prescribed by `respy`. 

???? Tobi-question: Is this order mandatory ????

<div class="alert alert-block alert-warning">
  
<b> Specification of the shock distribution - the order matters </b></div>

First, the **diagonal elements (standard deviations)** are specified via `sd_{choice}` according to the order: *???? Need to rewrite this: What is the correct meaning for "working alternatives" ????*

1. All working alternatives alphabetically sorted.
2. All non-working alternatives with experience accumulation alphabetically sorted.
3. All remaining alternatives alphabetically sorted.

Second, the **off-diagonal elements (correlations)** are specified **by rows**. 

???? Tobi-question: Is this order really mandatory. The notation is a little bit confusing. For the elements (x,y), x denotes the column and y the row ???? 

???? Could we shortly speak about this on Zoom ???? 
The second option is to specify the variance-covariance matrix. The parameters are ordered by appearance in the lower triangular. Variances have the name `var_{choice}` and covariances `cov_{choice_2}_{choice_1}` and so forth.

The third option is the Cholesky factor of the variance-covariance matrix ordered by appearance in the lower triangular. The labels are either `chol_{choice}` or `chol_{choice_2}_{choice_1}` and so forth.

**<span style="text-decoration:underline">Example</span>**

Add the KW(94) example.



### Previous choices

See the how-to guide on [initial conditions](how_to_specify_the_initial_conditions.ipynb) explains this feature in more detail.

???? Could you add the general intention of this? For more details the link is great. But there are not basis intuitions provided ????

In [18]:
params.loc[("lagged_choice_1_edu", slice(None)),]

Unnamed: 0_level_0,Unnamed: 1_level_0,value,comment
category,name,Unnamed: 2_level_1,Unnamed: 3_level_1
lagged_choice_1_edu,edu_ten,1.0,Probability that the first lagged choice is ed...


### Utility

???? Tobi-question: Could you shortly how utility is aggregated? In particular, how pecuniary and non-pecuniary rewards enter, and how utility is aggregated across periods ????

<div class="alert alert-block alert-info">
<h2> Options </h2></div>
 
The object `options` is a [dictionary](https://docs.python.org/3/tutorial/datastructures.html) and allows to **tailor the estimation, simulation, and solution procedure** of the specified model. In the following we will provide a description of the most important `options` and provide references. Some more sophisticated concepts are outlined in this notebook.

In [24]:
options
#list(options.keys())

['estimation_draws',
 'estimation_seed',
 'estimation_tau',
 'interpolation_points',
 'n_periods',
 'simulation_agents',
 'simulation_seed',
 'solution_draws',
 'solution_seed',
 'monte_carlo_sequence',
 'core_state_space_filters',
 'covariates']

Although it may be a little bit arduous, we should shortly explain each of the options, and provide a link to a how-to-tutorial where it is used. 

???? We could shortly meet - I (RS) can populate the table, but we should agree on wording ????

|                 `option` 	|                                                                 Explanation                                                                	|             Domain            	|                                                  Example and Application                                                  	|
|-------------------------:	|:------------------------------------------------------------------------------------------------------------------------------------------:	|:-----------------------------:	|:-------------------------------------------------------------------------------------------------------------------------:	|
|          estimation_draw 	|                                                                                                                                            	|                               	|                                                                                                                           	|
|          estimation_seed 	|                                                                                                                                            	|                               	|                                                                                                                           	|
|           estimation_tau 	|                                                                                                                                            	|                               	|                                                                                                                           	|
|     interpolation_points 	|                                                                                                                                            	|                               	|                                                                                                                           	|
|                n_periods 	|                                                                                                                                            	|                               	|                                                                                                                           	|
|        simulation_agents 	|                                                                                                                                            	|                               	|                                                                                                                           	|
|          simulation_seed 	|                                                                                                                                            	|                               	|                                                                                                                           	|
|           solution_draws 	|                                                                                                                                            	|                               	|                                                                                                                           	|
|            solution_seed 	|                                                                                                                                            	|                               	|                                                                                                                           	|
|     monte_carlo_sequence 	| The calculation of the value function is conducted via <br>quasi-Monte Carlo procedure. It is specified which <br>sequence should be used. 	| ["halton", "sobol", "random"] 	| [Tutorial Numerical Integration Methods](https://respy.readthedocs.io/en/latest/how_to_guides/numerical_integration.html) 	|
| core_state_space_filters 	|                                                                                                                                            	|                               	|                                                                                                                           	|
| covariates               	|                                                                                                                                            	|                               	|                                                                                                                           	|

### The formulas of covariates



In [25]:
options["covariates"]

{'constant': '1',
 'exp_a_square': 'exp_a ** 2',
 'exp_b_square': 'exp_b ** 2',
 'at_least_twelve_exp_edu': 'exp_edu >= 12',
 'not_edu_last_period': "lagged_choice_1 != 'edu'",
 'edu_ten': 'exp_edu == 10'}