## Exogenous Processes
This notebook contains prototypes for the implementation of exogenous processes.
I think the easiest is to start implementing the processes form a new branch that departs form current main. 
The only part that we can take over one to one is the model specification and parsing. 
The rest serves as good inspiration but has to be adapted substantially due to the new state space structure. 

In [2]:
import numpy as np
from functools import wraps
from scipy import special

import respy as rp

from respy.config import COVARIATES_DOT_PRODUCT_DTYPE
from respy.parallelization import parallelize_across_dense_dimensions
from respy.shared import create_dense_state_space_columns
from respy.pre_processing.model_processing import process_params_and_options

## Model Processing
- I would just take over the model processing from the old PR
- We have to copy all the functions that deal with exog processes to the new branch
- That is probably the last error prone way and it allows us to spot potential improvements

In [18]:
# Load model.
params, options = rp.get_example_model("robinson_crusoe_extended", with_data=False)

# Extend with observable characteristic.
params.loc[("observable_health_well", "probability"), "value"] = 0.9
params.loc[("observable_health_sick", "probability"), "value"] = 0.1
params.loc[("observable_ability_good", "probability"), "value"] = 0.9
params.loc[("observable_ability_bad", "probability"), "value"] = 0.1
params.loc[("observable_ability_horrible", "probability"), "value"] = 0.1



# Create internal specification objects.
optim_paras, options = process_params_and_options(params, options)



In [24]:
optim_paras["observables"].keys()

dict_keys(['ability', 'health'])

In [19]:
# Add exog processes
sp = rp.state_space.create_state_space_class(optim_paras, options)

In [22]:
sp.dense_covariates_to_dense_index


DictType[UniTuple(int64 x 2),int64]<iv=None>({(0, 0): 0, (0, 1): 1, (1, 0): 2, (1, 1): 3, (2, 0): 4, (2, 1): 5})

## Implementation
I think we face a memory speed trade off here after all.
Either we directly create the dataframe for all potential densen indices or we keep different processes seperated. 
The former could be faster while the latter uses much less memory. (Just a hunch tho) We should try both anyways!

Imortant features to ensure: 
- We need to be certain that the dense grid always has fixed dense vars in the leading positions and exog processes in the last!
dense_covraite = (observable_position, process_position)

In [None]:
# We need an efficient way to map dense covariates to dense indices!
# We need a dict indicating position in dense vector  

In [None]:
# Option 1 we build a df for all potential locations on the dense grid
@parallelize_across_dense_dimensions
def compute_process_specific_transition_probabilities(states,
                                                      core_key,
                                                      dense_covariates_to_dense_index,
                                                      dense_index_and_core_key_to_dense_key,
                                                      optim_paras
                                                     ):
    exogenous_processes = optim_paras["exogenous_processes"]
    
    # How does the accounting work here again? Would that actually work?
    static_dense_columns = optim_paras["observables"] # We also still need to add types. Rethink parsing to an extent?
    
    static_dense = list(states.loc[0,static_dense].values())
    
    dense_columns = create_dense_state_space_columns(optim_paras)
    
    levels_of_processes = [range(len(i)) for i in optim_paras["observables"].values()]
    comb_exog_procs = itertools.product(*levels_of_processes)
    
    # Needs to be created in here since that is dense-period-choice-core specific. 
    dense_index_to_exogenous = {dense_covariates_to_dense_index[(*static_dense, *exog)]:exog for exog in comb_exog_procs}
    dense_key_to_exogenous = {dense_index_and_core_key_to_dense_key[(core_key,key)]:vaue for key,value in dense_index_to_exogenous.items()}
    
    
    
    # Compute the probabilities for every exogenous process.
    probabilities = []
    for exog_proc in exogenous_processes:

        # Create the dot product of covariates and parameters.
        x_betas = []
        for params in exogenous_processes[exog_proc].values():
            x_beta = np.dot(
                states[params.index].to_numpy(dtype=COVARIATES_DOT_PRODUCT_DTYPE),
                params.to_numpy(),
            )
            x_betas.append(x_beta)

        probs = special.softmax(np.column_stack(x_betas), axis=1)
        probabilities.append(probs)
    
    # Prepare full Dataframe
    df = pd.Dataframe(index=states.index)
    for dense in dense_index_to_exogenous:
        array = np.product.reduce(probs[proc][:,val] for proc,val in enumerate(dense_key_to_exogenous[dense]))
        df[dense] = array
    
    # We can maybe  directly dump that dataset? 
    
    
    

    return df

# Position of creation
One question is whether we want to create the transition at runtime or whether we want to create it before and store it on disk. I slightly prefer the latter. Altough it is slightly slower I think it allows for more flexibility down the road. 
Especially when we want to allow for fixed processes it is crucial to create transitions before.
That is why I propose the following changes to the dumping of functions 

In [None]:
def _create_file_name_from_complex_index(topic, complex_):
    """Create a file name from a complex index."""
    choice = "".join([str(int(x)) for x in complex_[1]])
    if len(complex_) == 3:
        file_name = f"{topic}_{complex_[0]}_{choice}_{complex_[2]}.parquet"
    elif len(complex_) == 2:
        file_name = f"{topic}_{complex_[0]}_{choice}.parquet"
    else:
        raise NotImplementedError

    return file_name


In [None]:
def dump_states(container, topic, complex_, options):
    """Dump states."""
    file_name = _create_file_name_from_complex_index(complex_)
    states.to_parquet(
        options["cache_path"] / file_name, compression=options["cache_compression"],
    )


def load_states(topic, complex_, options):
    """Load states."""
    file_name = _create_file_name_from_complex_index(topic, complex_)
    directory = options["cache_path"]
    return pd.read_parquet(directory / file_name)


# We need a way to efficiently combine values!
I would propose to use another decorator to do the weighting. I think that avoids a large mess in the solve module and 
it does justice to the abstract extension of the model. 

In [None]:
# TODO: Think of a less specific name. Something like 
def weight_dense_cores(func):
    """Wrapper around get continuation values"""
    @wraps(func)
    def decorator_weight_dense_cores(state_space, dense_key_to_complex, *args):
        is_exogenous = "exogenous_processes" in state_space.optim_paras.keys()
        continuation_values = func(state_space, complex, *args) 
        if is exogenous:
            weighted_continuation_values = dict() # Will probably have to become a numba typed dict
            for dense_key in dense_key_to_complex:
                complex_ = dense_key_to_complex[dense_key]
                transition_df = load_states("transition",complex_)
                #TODO: Find a more elegant way than list comprehension
                weighted_columns = [np.dot(transition_df[ftr_key].to_numpy(),continuation_values[ftr_key]) for ftr_key in transition_df.columns]
                weighted_continuation_values[dense_key] = np.sum.reduce(weighted_columns)
            return weighted_continuation_values
        else:
            return continuation values
    return wrap_continuation_values

In [25]:
@weight_continuation_values
@parallelize_across_dense_dimensions
@nb.njit
def _get_continuation_values(
    core_indices,
    dense_complex_index,
    child_indices,
    core_index_and_dense_vector_to_dense_index,
    expected_value_functions,
):
    """Get continuation values from child states.

    The continuation values are the discounted expected value functions from child
    states. This method allows to retrieve continuation values that were obtained in the
    model solution. In particular the function assigns continuation values to state
    choice combinations by using the child indices created in
    :func:`_collect_child_indices`.

    Returns
    -------
    continuation_values : numpy.ndarray
        Array with shape ``(n_states, n_choices)``. Maps core_key and choice into
        continuation value.

    """
    if len(dense_complex_index) == 3:
        period, choice_set, dense_idx = dense_complex_index
    elif len(dense_complex_index) == 2:
        period, choice_set = dense_complex_index
        dense_idx = 0

    n_choices = sum_over_numba_boolean_unituple(choice_set)

    n_states = core_indices.shape[0]

    continuation_values = np.zeros((len(core_indices), n_choices))
    for i in range(n_states):
        for j in range(n_choices):
            core_idx, row_idx = child_indices[i, j]
            idx = (core_idx, dense_idx)
            dense_choice = core_index_and_dense_vector_to_dense_index[idx]

            continuation_values[i, j] = expected_value_functions[dense_choice][row_idx]

    return continuation_values


24

In [None]:
zzzzz