# Introduction

This notebook contains a tutorial on how to use transition matrices of exogenous processes in respy. The corresponding function in respy is `parse_transition_matrix_for_exogenous_processes`. It translates a transition matrix to the specification files `params`and `options` used in respy.

In [1]:
from respy import parse_transition_matrix_for_exogenous_processes
import pandas as pd

This notebook contains three showcases. A exogenous process wit probabilities depending *only* on the current state of the process, a process depending on a combination of general state variables and the current state of the process and one process only depending on general state variables and not the current state of the process.

The case of an exogenous process with constants probabilities across states is quite easy and explained in the general tutorial on exogenous processes. The distinction of this tutorial is the use of transition matrices. A general specification of also complex exogenous processes depending on logit coefficients instead of probabilities are also included in the general tutorial on exogenous processes.

To fully understand this tutorial it is advised to first study the general tutorial on exogenous processes.

# Transition matrices

In a first step, the three transition matrices are specified. Throughout this tutorial, the exogenous process used for demonstration, will be th one of a `health_shock`. We assume two states of the process: `healthy` and `sick`.

In [2]:
process_name = "health_shock"
process_states = ["healthy", "sick"]

Now the three transition matrices are defined:

In [3]:
# First the process is only dependent on the current state of the exogenous process
df_only_process = pd.DataFrame(
    columns=process_states,
    index=["sick", "healthy"],
    data=[[0.8, 0.2],
         [0.6, 0.4]],
)
df_only_process

Unnamed: 0,healthy,sick
sick,0.8,0.2
healthy,0.6,0.4


In [4]:
# Second the process is only dependent on some "general" state variable. Here old and young.
df_only_state = pd.DataFrame(
    columns=process_states,
    index=["old", "young"],
    data=[[0.8, 0.2],
         [0.6, 0.4]],
)
df_only_state

Unnamed: 0,healthy,sick
old,0.8,0.2
young,0.6,0.4


In [5]:
# Third the process depending on current state of exogenous process and some "general" state variable.
df_state_and_process = pd.DataFrame(
    columns=process_states,
    index=["sick_and_old", "sick_and_young", "healthy_and_old", "healthy_and_young"],
    data=[[0.8, 0.2],
          [0.6, 0.4],
          [0.3, 0.7],
          [0.4, 0.6]],
)
df_state_and_process

Unnamed: 0,healthy,sick
sick_and_old,0.8,0.2
sick_and_young,0.6,0.4
healthy_and_old,0.3,0.7
healthy_and_young,0.4,0.6


All transition matrices have the two exogenous process outcomes as columns. This is the most important convention for this transition matrix dataframes. As some outcome has to realize the rows of the matrix have to sum to 1. The rows differ from process to process and describe the state on which the process is defined.

# Parsing

Now the difference of parsing for each process is demonstrated.

In [6]:
params_state_and_process, covariates_state_and_process = parse_transition_matrix_for_exogenous_processes(df_state_and_process, process_name)
params_only_state, covariates_only_state = parse_transition_matrix_for_exogenous_processes(df_only_state, process_name)
params_only_process, covariates_only_process = parse_transition_matrix_for_exogenous_processes(df_only_process, process_name)

## params

First the params objects.

In [7]:
params_state_and_process

Unnamed: 0_level_0,Unnamed: 1_level_0,value
category,name,Unnamed: 2_level_1
exogenous_process_health_shock_healthy,sick_and_old,-0.223144
exogenous_process_health_shock_healthy,sick_and_young,-0.510826
exogenous_process_health_shock_healthy,healthy_and_old,-1.20397
exogenous_process_health_shock_healthy,healthy_and_young,-0.916291
exogenous_process_health_shock_sick,sick_and_old,-1.60944
exogenous_process_health_shock_sick,sick_and_young,-0.916291
exogenous_process_health_shock_sick,healthy_and_old,-0.356675
exogenous_process_health_shock_sick,healthy_and_young,-0.510826


In [8]:
params_only_state

Unnamed: 0_level_0,Unnamed: 1_level_0,value
category,name,Unnamed: 2_level_1
exogenous_process_health_shock_healthy,old,-0.223144
exogenous_process_health_shock_healthy,young,-0.510826
exogenous_process_health_shock_sick,old,-1.60944
exogenous_process_health_shock_sick,young,-0.916291


In [9]:
params_only_process

Unnamed: 0_level_0,Unnamed: 1_level_0,value
category,name,Unnamed: 2_level_1
exogenous_process_health_shock_healthy,sick,-0.223144
exogenous_process_health_shock_healthy,healthy,-0.510826
exogenous_process_health_shock_sick,sick,-1.60944
exogenous_process_health_shock_sick,healthy,-0.916291


The index entries in `category` are the same for all three processes. They consist of the keyword `exogenous_process` plus the name of the process and the each state of the process. In the index entries in `name` the states are collected, which were before the row labels of the DataFrame. The values of each entry are just the log probabilities.

## Covariates

The second returned object is a dictionary containing covariate specifications. If one, wants to use that directly the `covariates` dictionary in options needs to be updated. Note, that the returned objects are not ready and are more of a template.

In [10]:
covariates_state_and_process

{'sick_and_old': 'health_shock == sick & ?',
 'sick_and_young': 'health_shock == sick & ?',
 'healthy_and_old': 'health_shock == healthy & ?',
 'healthy_and_young': 'health_shock == healthy & ?'}

In [11]:
covariates_only_state

{'old': '?', 'young': '?'}

In [12]:
covariates_only_process

{'sick': 'health_shock == sick', 'healthy': 'health_shock == healthy'}

So the only case, where one could use the dictionary directly is when the process only depends on its own current state. The parsing function recognizes the exogenous process state as keyword and writes it as a logical condition in the dictionary. If there exists another word or no state of the exogenous process is in the general state on which the process is conditioned, the parsing function produces a `?` indicating the need of further information.

In the following I want to give an example how this could be done.

In [13]:
covariates = covariates_state_and_process.copy()
covariates['sick_and_old'] = 'health_shock == sick & old'
covariates['sick_and_young'] = 'health_shock == sick & young'
covariates['healthy_and_old'] = 'health_shock == healthy & old'
covariates['healthy_and_young'] = 'health_shock == healthy & young'
covariates

{'sick_and_old': 'health_shock == sick & old',
 'sick_and_young': 'health_shock == sick & young',
 'healthy_and_old': 'health_shock == healthy & old',
 'healthy_and_young': 'health_shock == healthy & young'}

In [14]:
# Now lets define the covariate entry in options. This has to include the definition of young and old.
options = {}
options["covariates"] = {"age": "16 + period",
                        "old": "age > 50",
                        "young": "age <= 50"}
options

{'covariates': {'age': '16 + period', 'old': 'age > 50', 'young': 'age <= 50'}}

In [15]:
# Now merge both dictionaries
options["covariates"] = {**options["covariates"], **covariates}
options

{'covariates': {'age': '16 + period',
  'old': 'age > 50',
  'young': 'age <= 50',
  'sick_and_old': 'health_shock == sick & old',
  'sick_and_young': 'health_shock == sick & young',
  'healthy_and_old': 'health_shock == healthy & old',
  'healthy_and_young': 'health_shock == healthy & young'}}