# Variance decomposition into the signal and the measurement error

This notebook contains information on how to decompose variance of each skill measured into the measurement error and the signal. The calculations implemented follow section 4.2.2. The Empirical Importance of Measurement Error of CHS paper (*Cuhna, et al. 2010, 907*).

In [1]:
import numpy as np
import pandas as pd
from skillmodels.config import TEST_DIR
import yaml
from skillmodels.likelihood_function import get_maximization_inputs
from skillmodels.simulate_data import simulate_dataset
from skillmodels.variance_decomposition import create_dataset_with_variance_decomposition

In [2]:
with open(TEST_DIR/"model2.yaml") as y:
        model_dict = yaml.load(y, Loader=yaml.FullLoader)

params = pd.read_csv(TEST_DIR / "regression_vault" / f"one_stage_anchoring.csv")
params = params.set_index(["category", "period", "name1", "name2"])

data = pd.read_stata(TEST_DIR / "model2_simulated_data.dta")
data.set_index(["caseid", "period"], inplace=True)

In [3]:
max_inputs = get_maximization_inputs(model_dict, data)
debug_loglike = max_inputs["debug_loglike"]
debug_data = debug_loglike(params)
filtered_states = debug_data["filtered_states"]
state_ranges = debug_data["state_ranges"]

  df.index = df.index.set_levels(range(len(df.index.levels[level])), level)


The following formula from CHS paper (*Cuhna, et al. 2010, 907*) is used to decompose variance:

$$
\begin{equation}
Var(Z_{1,C,t,j}) = \alpha^2_{1,C,t,j}*Var(ln\theta_{C,t}) + Var(\epsilon_{1,C,t,j})
\end{equation}
$$

The fraction of the variance due to measurement error and due to signal are the following:

$$
\begin{equation}
s^\epsilon_{1,C,t,j}=\frac{Var(\epsilon_{1,C,t,j})}{\alpha^2_{1,C,t,j}*Var(ln\theta_{C,t}) + Var(\epsilon_{1,C,t,j})}
\end{equation}
$$

$$
\begin{equation}
s^\theta_{1,C,t,j}=\frac{\alpha^2_{1,C,t,j}Var(ln\theta_{C,t})}{\alpha^2_{1,C,t,j}*Var(ln\theta_{C,t}) + Var(\epsilon_{1,C,t,j})}
\end{equation}
$$



where:
* $Var(\epsilon_{1,C,t,j})$ is variance of the standard error (**meas_sds^2** from the filtered states dataset)
* $Var(ln\theta_{C,t}$ is factor variance
* $\alpha$ is loadings from the filtered states dataset 


In [4]:
create_dataset_with_variance_decomposition(filtered_states, params).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,loadings,variance of factor,meas_sds,fraction due to meas error,fraction due to factor var
period,name1,name2,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,y1,fac1,1.0,0.07808,0.868389,0.906174,0.093826
0,y2,fac1,0.890828,0.07808,1.189555,0.958049,0.041951
0,y3,fac1,1.418478,0.07808,1.111846,0.887244,0.112756
0,Q1_fac1,fac1,1.187757,0.07808,0.721278,0.825263,0.174737
0,y4,fac2,1.0,0.054363,0.764474,0.914896,0.085104


Change in the measurement error affects the variance decomposition. Two cases where measurement error is equal to **0** and **10** respectively are presented below. In the first case all skill variance is related to the factor variance while in the second case most variance is reffered to the measurement error.

In [5]:
params.loc[('meas_sds')] = 0
create_dataset_with_variance_decomposition(filtered_states, params).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,loadings,variance of factor,meas_sds,fraction due to meas error,fraction due to factor var
period,name1,name2,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,y1,fac1,1.0,0.07808,0.0,0.0,1.0
0,y2,fac1,0.890828,0.07808,0.0,0.0,1.0
0,y3,fac1,1.418478,0.07808,0.0,0.0,1.0
0,Q1_fac1,fac1,1.187757,0.07808,0.0,0.0,1.0
0,y4,fac2,1.0,0.054363,0.0,0.0,1.0


In [6]:
params.loc[('meas_sds')] = 10
create_dataset_with_variance_decomposition(filtered_states, params).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,loadings,variance of factor,meas_sds,fraction due to meas error,fraction due to factor var
period,name1,name2,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,y1,fac1,1.0,0.07808,10.0,0.99922,0.00078
0,y2,fac1,0.890828,0.07808,10.0,0.999381,0.000619
0,y3,fac1,1.418478,0.07808,10.0,0.998431,0.001569
0,Q1_fac1,fac1,1.187757,0.07808,10.0,0.9989,0.0011
0,y4,fac2,1.0,0.054363,10.0,0.999457,0.000543
