# Method of Simulated Moments (MSM)

In [1]:
import pandas as pd 
import respy as rp

from method_of_simulated_moments import get_msm_func
from method_of_simulated_moments import get_diag_weighting_matrix
from method_of_simulated_moments import get_flat_moments

This notebook contains a step by step tutorial to simulated method of moments estimation using respy.

Respy can construct a msm function using `get_msm_func`. The function requires the following arguments:

* params (pandas.DataFrame)
* options (dict)
* calc_moments (callable, list, dict)
* replace_nans (callable, list, dict)
* empirical_moments (pandas.DataFrame, pandas.Series, list, dict)
* weighting_matrix (numpy.ndarray)
* return_scalar (bool)

`get_msm_func` returns a function where all arguments except *params* are held fixed. The returned function can then easily be passed on to an optimizer for estimation.

## Introductory Example

The following section discusses all the arguments in detail using an example model.

### Arguments

#### The *params* and *options* Arguments

The first step to msm estimation is the simulation of data using a specified model. Respy simulates data using a vector of parameters *params*, which will be the variable of interest for estimation, and a set of *options* that help define the underlying model.

Respy provides a number of example models. For this tutorial we will be using the parameterization from Keane and Wolpin (1994).

In [None]:
params, options, df_emp = rp.get_example_model("kw_94_one")

In [None]:
params

In [None]:
options

#### The *calc_moments* Argument

The *calc_moments* argument is the function that will be used to calculate moments from the simulated data. It can also be specified as a list or dictionary of multiple functions if different sets of moments should be calculated from different functions.

In this case, we will calculate two sets of moments: choice frequencies and parameters that characterize the wage distribution. The moments are saved to a pandas.DataFrame with time periods as the index and the moments as columns.

In [None]:
def calc_moments(df):
    choices = df.groupby("Period").Choice.value_counts(normalize=True).unstack()
    wages = df.groupby(['Period'])['Wage'].describe()[['mean', 'std']]
    
    return pd.concat([choices, wages], axis=1)

####  The *replace_nans* Argument

Next we define *replace_nans* is a function or list of functions that define how to handle missings in the data. 

In [None]:
def fill_nans_zero(df):
    return df.fillna(0)

#### The *empirical_moments* Argument

The empirical moments are the moments that are calculated from the observed data which the simulated moments should be matched to. The *empirical_moments* argument requires a pandas.DataFrame or pandas.Series as inputs. Alternatively, users can input lists or dictionaries containing DataFrames or Series as items. It is necessary that *calc_moments*, *replace_nans* and *empirical_moments* correspond to each other i.e. *calc_moments* should output moments that are of the same structure as *empirical_moments*.

For this example we calculate the empirical moments the same way that we calculate the simulated moments, so we can be sure that this condition is fulfilled. 

In [None]:
empirical_moments = calc_moments(df_emp)
empirical_moments = fill_nans_zero(empirical_moments)

In [None]:
empirical_moments.head()

#### The *weighting_matrix* Argument

For the msm estimation, the user has to define a weighting matrix. `get_diag_weighting_matrix` allows users to  create a diagonal weighting matrix that will match the moment vectors used for estimation. The required inputs are *empirical_moments* that are also used in `get_msm_func` and a set of weights that are of the same form as *empirical_moments*. If no weights are specified, the function will return the identity matrix. 

In [None]:
weighting_matrix = get_diag_weighting_matrix(empirical_moments)

In [None]:
pd.DataFrame(weighting_matrix)

If the user prefers to compute a weighting matrix manually, the respy function `get_flat_moments` may be of use. This function returns the empirical moments as an indexed pandas.Series which is the form they will be passed on to the loss function as. 

In [None]:
flat_empirical_moments = get_flat_moments(empirical_moments)
flat_empirical_moments

#### The *return_scalar* Argument

The *return_scalar* argument allows us to return the moment errors in vector form. `get_msm_func` will return the moment error vector if *return_scalar* is set to **False** and will return the value of the weighted square product of the moment errors if *return_scalar* is set to **True**. 

### MSM Function
We can now compute the msm function. The function is constructed using `get_msm_func`. Adding all arguments to `get_msm_func` will return a function that holds all elements but the *params* argument fixed and can thus easily be passed on to an optimizer. The function will return a value of 0 if we use the true parameter vector as input.

In [None]:
msm = get_msm_func(params=params, 
                   options=options, 
                   calc_moments=calc_moments, 
                   replace_nans = fill_nans_zero,
                   empirical_moments=empirical_moments, 
                   weighting_matrix = weighting_matrix, 
                   return_scalar=True
                   )

msm(params)

Using a different parameter vector will result in a value different from 0.

In [None]:
params_sim = params.copy()
params_sim.loc['delta', 'value'] = 0.8

In [None]:
msm(params_sim)

If we set *return_scalar* to **False**, the function will return the vector of moment errors instead.

In [None]:
msm_vector = get_msm_func(params=params_sim, 
                            options=options, 
                            calc_moments=calc_moments, 
                            replace_nans = fill_nans_zero,
                            empirical_moments=empirical_moments, 
                            weighting_matrix = weighting_matrix, 
                            return_scalar=False
                            )

moment_errors = msm_vector(params_sim)
moment_errors

## Inputs as Lists or Dictionaries

In the example above we used single elements for all inputs i.e. we used one function to calculate moments, one function to replace missing moments and saved all sets of moments in a single pandas.DataFrame. This works well for the example at hand because the inputs are relatively simple, but other applications might require more flexibility. `get_msm_func` thus alternatively accepts lists and dictionaries as inputs. This way, different sets of moments can be stored separately. Using lists or dictionaries also allows the use of different replacement functions for different moments. 

For the sake of this example, we add another set of moments to the estimation. In addition to the choice frequencies and wage distribution, we include the final education of agents. Here, the index is given by the educational experience agents have accumulated in period 39. The moments are given by the frequency of each level of experience in the dataset. Since this set of moments is not grouped by period, it cannot be saved to a DataFrame with the other moments. We hence give each set of moments its own function and save them to a list. The choice frequencies and wage distribution are saved to pandas.DataFrame with multiple columns, the final education is given by a pandas.Series.

Instead of lists, the functions and moments may also be saved to a dictionary. **Dictionaries will be sorted according to keys** before being passed on the loss function. Using dictionaries therefore has the advantage that the user does not have to pay attention to storing items in the correct order as with lists, where inputs are matched according to position. For the same reason, however, it is not recommended to mix lists and dictionaries as inputs.

In [None]:
def calc_choice_freq(df):
    return df.groupby("Period").Choice.value_counts(normalize=True).unstack()

def calc_wage_distr(df):
    return df.groupby(['Period'])['Wage'].describe()[['mean', 'std']]

def calc_final_edu(df):
    return df[df.Period == max(df.Period)].Experience_Edu.value_counts(normalize=True,sort=False)

calc_moments = [calc_choice_freq, calc_wage_distr, calc_final_edu]

We can additionally specify different replacement functions for each set of moments and save them to a list just like *calc_moments*. However, here we will use the same replacement function for all moments and thus just need to specify one. Respy will automatically apply this function to all sets of moments.

Note that this only works if only one replacement function is given. Otherwise *replace_nans* must be a list of the same length as *calc_moments* with each replacement function holding the same position as the moment function it corresponds to.

In [None]:
def fill_nans_zero(df):
    return df.fillna(0)

replace_nans = [fill_nans_zero]

We now calculate the empirical_moments. They are saved to a list as well. We can calculate the weighting_matrix as before.

In [None]:
params, options, df_emp = rp.get_example_model("kw_94_one")
empirical_moments = [calc_choice_freq(df_emp), calc_wage_distr(df_emp), calc_final_edu(df_emp)]

empirical_moments = [fill_nans_zero(df) for df in empirical_moments]

In [None]:
weighting_matrix = get_diag_weighting_matrix(empirical_moments)

Finally, we can construct the msm function from the defined inputs.

In [None]:
msm = get_msm_func(params=params, 
                   options=options, 
                   calc_moments=calc_moments, 
                   replace_nans = replace_nans,
                   empirical_moments=empirical_moments, 
                   weighting_matrix = weighting_matrix, 
                   return_scalar=True
                   )

msm(params)

The result for the simulated moments slightly deviates from the introductory example because we added an additional set of moments.

In [None]:
msm(params_sim)

In [None]:
msm_vector = get_msm_func(params=params, 
                        options=options, 
                        calc_moments=calc_moments, 
                        replace_nans = replace_nans,
                        empirical_moments=empirical_moments, 
                        weighting_matrix = weighting_matrix, 
                        return_scalar=False
                        )
moment_errors = msm_vector(params_sim)
moment_errors

## References

> Keane, M. P. and  Wolpin, K. I. (1994). [The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence](https://doi.org/10.2307/2109768). *The Review of Economics and Statistics*, 76(4): 648-672.
