# Skillmodels Quickstart

In [1]:
from skillmodels.likelihood_function import get_maximization_inputs
from skillmodels.config import TEST_DIR
import pandas as pd
import numpy as np
import yaml
from time import time
from estimagic import maximize

## Loading Model Specification and Data

Model specifications are python dictionaries that can be safed in yaml or json files. For a moment, just assume you know how to write a model specification and have a skillmodels compatible dataset. Both are 
explained in different tutorials.

Next we load the model specification and the dataset. 

In [2]:
with open(TEST_DIR / "model2.yaml") as y:
    model_dict = yaml.load(y, Loader=yaml.SafeLoader)

In [3]:
data = pd.read_stata(TEST_DIR / "model2_simulated_data.dta")
data.set_index(["caseid", "period"], inplace=True)

## Getting the inputs for ``estimagic.maximize``

Skillmodels basically just has one public function called ``get_maximization_inputs``. When called with a model specification and a dataset it contains a dictionary with everything you need to maximize the likelihood function using estimagic. 

By everything you need I mean everything model specific. You should still use the optional arguments of ``maximize`` to tune the optimization.

In [4]:
max_inputs = get_maximization_inputs(model_dict, data)

  coro.send(None)


## Filling the Params Template

Often you can greatly reduce estimation time by choosing good start parameters. What are good start parameters depends strongly on the model specifications, the scaling of your variables and the normalizations you make. 

If you have strong difficulties to pick good start values, you probably want to think again about the interpretability of your model parameters and possibly change the normalizations and scaling of your 
measurements. 

As a rule of thumb: If all measurements are standardized and, all fixed loadings are 1 and all fixed intercepts are 0 then one is a good start value for all free loadings and 0 is a good start value for all free intercepts. 

Measurement and shock standard deviations are better started slightly larger than you would expect them. 

Below I just load start parameters for the CHS example model that I filled out manually. 

In [5]:
params_template = max_inputs["params_template"]
params_template.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,value,lower
category,period,name1,name2,Unnamed: 4_level_1,Unnamed: 5_level_1
controls,0,y1,constant,,-inf
controls,0,y1,x1,,-inf
controls,0,y2,constant,,-inf
controls,0,y2,x1,,-inf
controls,0,y3,constant,,-inf


In [34]:
index_cols = ["category", "period", "name1", "name2"]
chs_path = TEST_DIR / "regression_vault" / "chs_results.csv"
chs_values = pd.read_csv(chs_path)
chs_values.set_index(index_cols, inplace=True)
chs_values = chs_values[["chs_value", "good_start_value", "bad_start_value"]]
chs_values.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,chs_value,good_start_value,bad_start_value
category,period,name1,name2,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
controls,0,y1,constant,1.001618,1.0,0.0
controls,0,y1,x1,1.005455,1.0,0.0
controls,0,y2,constant,1.031439,1.0,0.0
controls,0,y2,x1,0.975992,1.0,0.0
controls,0,y3,constant,0.994091,1.0,0.0


In [35]:
params = params_template.copy()
params["value"] = chs_values["chs_value"]
params.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,value,lower
category,period,name1,name2,Unnamed: 4_level_1,Unnamed: 5_level_1
controls,0,y1,constant,1.001618,-inf
controls,0,y1,x1,1.005455,-inf
controls,0,y2,constant,1.031439,-inf
controls,0,y2,x1,0.975992,-inf
controls,0,y3,constant,0.994091,-inf


## Time compilation speed

Skillmodels uses jax to just-in-time compile the numerical code and get a gradient of the likelihood function by automatic differentiation. 

This has the downside that it does not work on windows and that it has a relatively long compile time before the model. If you just want 

In [8]:
debug_loglike = max_inputs["debug_loglike"]
loglike = max_inputs["loglike"]
gradient = max_inputs["gradient"]
loglike_and_gradient = max_inputs["loglike_and_gradient"]

In [9]:
start = time()
debug_loglike_value = debug_loglike(params)
print(time() - start)
debug_loglike_value

9.000244617462158


{'value': -270698.20833821566,
 'contributions': array([-76.03610715, -59.00404373, -68.70897926, ..., -64.07606563,
        -72.37184836, -66.84698267])}

In [10]:
start = time()
loglike_value = loglike(params)
print(time() - start)
loglike_value

59.465219020843506


{'contributions': array([-76.03610715, -59.00404373, -68.70897926, ..., -64.07606563,
        -72.37184836, -66.84698267]), 'value': -270698.2083382156}

In [11]:
start = time()
gradient_value = gradient(params)
print(time() - start)

1028.6215641498566


In [12]:
start = time()
loglike_and_gradient_value = loglike_and_gradient(params)
print(time() - start)

0.604588508605957


In [13]:
%timeit loglike_and_gradient(params)

593 ms ± 4.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Insights

The ``debug_loglike`` takes about 2 seconds for one evaluation. The first and all subsequent evaluations have the same speed. 

You should use this if you want to quickly find out if your start parameters produce a valid likelihood or want to debug why it does not. As the name suggests, you can step through the ``debug_loglike`` with a debugger. 

The jitted ``loglike`` takes long for the first evaluation (on my machine about 50 seconds, but under 200 milliseconds for all subsequent evaluations. 

The ``gradient`` or ``loglike_and_gradient`` (which ever you call first) will take about 15 minutes to compile but under 700 milliseconds for all subsequent evaluations. In particular, evaluating just the gradient and evaluating ``loglike_and_gradient`` takes exactly the same time! If you have an optimizer that always evaluates gradient and criterion simultaneously, use it!

## A few additionl constraints

To get the same values as CHS we will have to do a little more work. The reason is that on top of the many constraints skillmodels generates atuomatically from the model specification, CHS impose two more constraints:

1. All but the self productivity paramet in the linear transition equaltion are fixed to zero
2. The initial mean of the states is not estimated but assumed to be zero.
3. The anchoring parameters (intercepts, control variables, loadings and SDs of measurement error are pairwise equal across periods).

Fortunately, estimagic makes it easy to express such constraints:

In [22]:
constraints = max_inputs["constraints"]

additional_constraints = [
    {"query": "category == 'transition' & name1 == 'fac2' & name2 != 'fac2'",
     "type": "fixed", "value": 0},
    {"loc": "initial_states", "type": "fixed", "value": 0},
    {"queries": [f"period == {i} & name1 == 'Q1_fac1'" for i in range(7)], 
     "type": "pairwise_equality"}
]


In [23]:
constraints = constraints + additional_constraints

## Generating a group column for better dashboard output

In [40]:
from estimagic.optimization.process_constraints import process_constraints
pc, pp = process_constraints(constraints, params)
params["group"] = params.index.get_level_values("category")
params.loc["controls", "group"] = params.loc["controls"].index.get_level_values("name2")

params["group"] = params["group"].astype(str) + "_" + params.index.get_level_values("period").astype(str)
params["group"] = params["group"].str.replace("_", "-")
params["group"] = params["group"].astype("O")
params.loc[~pp["_internal_free"], "group"] = None 
params

  raw_cell, store_history, silent, shell_futures)


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,value,lower,group
category,period,name1,name2,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
controls,0,y1,constant,1.001618,-inf,constant-0
controls,0,y1,x1,1.005455,-inf,x1-0
controls,0,y2,constant,1.031439,-inf,constant-0
controls,0,y2,x1,0.975992,-inf,x1-0
controls,0,y3,constant,0.994091,-inf,constant-0
...,...,...,...,...,...,...
transition,6,fac1,phi,-0.407018,-inf,
transition,6,fac2,fac1,0.000000,-inf,
transition,6,fac2,fac2,0.608871,-inf,
transition,6,fac2,fac3,0.000000,-inf,


## Estimating the model

In [49]:
params["value"] = chs_values["bad_start_value"]
loglike(params)

{'contributions': array([-77.34433453, -74.99310014, -76.35249581, ..., -75.51475901,
        -80.42696344, -74.94717614]), 'value': -303496.05590529623}

In [51]:
res = maximize(
    criterion=loglike,
    params=params,
    algorithm="scipy_lbfgsb",
    criterion_and_derivative=loglike_and_gradient,
    logging="model2.db",
    constraints=constraints,
    log_options={"if_exists": "replace"}
)

In [53]:
res["message"]

b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'