# Detecting Identification Failure in the Moment Condition Models.

This tutorial shows you how to do an identification check for MSM in estimagic. In order to get the estimates by MSM, you must have at least as many moments as parameters to estimate. If you have fewer moments than parameters to be estimated, the model is said to be underidentified.  Besides that, when not all moments are orthogonal it may also lead to identification failure.

In the tutorial here, we will use a simple linear regression model where two of the regressors are correlated. Thus, the identification problem is encountered.

Throughout the tutorial, we perform the testing procedure described in Forneron, J. J. (2019). 

## Outline of the testing procedure
1. Uniform Sampling on Level Sets
2. Linear Approximations and the quasi-Jacobean Matrix
3. Test procedure


## Example: Estimate the parameters of a regression model

The model we consider here is a simple regression model with two explanatory variables (plus a constant). The goal is to estimate the slope coefficients and the error variance from a simulated data set.


### Model:

$$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon, \text{ where } \epsilon \sim N(0, \sigma^2)$$

We aim to estimate $\beta_0, \beta_1, \beta_2,\sigma^2$.

In [1]:
import numpy as np
import pandas as pd
import math

import estimagic as em

rng = np.random.default_rng(seed=0)

## 1. Simulate data

In [2]:
def simulate_data(params, n_draws, rng,correlation=0.7):

    mu = np.array([0.0, 0.0])
    var_cov = np.array([
            [  1, correlation],
            [ correlation,  1],
        ])
    x = rng.multivariate_normal(mu, var_cov, size=n_draws)
    x1 = x[:,0]
    x2 = x[:,1]
    e = rng.normal(0, params.loc["sd", "value"], size=n_draws)
    y = params.loc["intercept", "value"] + params.loc["slope1", "value"] * x1 + params.loc["slope2", "value"] + e
    return pd.DataFrame({"y": y, "x1": x1, "x2": x2})

In [3]:
true_params = pd.DataFrame(
    data=[[2, -np.inf], [-1, -np.inf], [-1, -np.inf], [1, 1e-10]],
    columns=["value", "lower_bound"],
    index=["intercept", "slope1", "slope2", "sd"],
)

data = simulate_data(true_params, n_draws=1000, rng=rng)

In [4]:
data

Unnamed: 0,y,x1,x2
0,1.484008,-0.064754,-0.167082
1,1.128824,-0.631068,-0.549813
2,-0.211518,0.353818,0.633908
3,0.968726,-1.569032,-0.835426
4,-1.819241,1.138907,0.158716
...,...,...,...
995,-0.135039,-0.394427,0.654440
996,3.625212,-1.629129,-1.578598
997,0.785314,-1.442909,-0.861470
998,-2.289846,1.294319,-0.269446


## 2. Calculate Moments

In [5]:
def calculate_moments(sample):
    moments = {
        "y_mean": sample["y"].mean(),
        "x1_mean": sample["x1"].mean(),
        "x2_mean": sample["x2"].mean(),
        "yx1_mean": (sample["y"] * sample["x1"]).mean(),
        "yx2_mean": (sample["y"] * sample["x2"]).mean(),
        "y_sqrd_mean": (sample["y"] ** 2).mean(),
        "x1_sqrd_mean": (sample["x1"] ** 2).mean(),
        "x2_sqrd_mean": (sample["x1"] ** 2).mean(),
    }
    return pd.Series(moments)

In [6]:
empirical_moments = calculate_moments(data)
empirical_moments

y_mean          0.916422
x1_mean         0.037716
x2_mean         0.017491
yx1_mean       -0.938064
yx2_mean       -0.695191
y_sqrd_mean     2.736050
x1_sqrd_mean    1.014383
x2_sqrd_mean    1.014383
dtype: float64

``get_moments_cov`` mainly just calls estimagic's bootstrap function. See our [bootstrap_tutorial](../../how_to_guides/inference/how_to_do_bootstrap_inference.ipynb) for background information. 



## 3. Define a function to calculate simulated moments

In a real world application, this is the step that would take most of the time. However, in our very simple example, all the work is already done by numpy.

In [7]:
def simulate_moments(params, n_draws=10_000, seed=0):
    rng = np.random.default_rng(seed)
    sim_data = simulate_data(params, n_draws, rng)
    sim_moments = calculate_moments(sim_data)

    return sim_moments

In [8]:
simulate_moments(true_params)

y_mean          1.009276
x1_mean        -0.006568
x2_mean        -0.003578
yx1_mean       -0.977183
yx2_mean       -0.683988
y_sqrd_mean     2.976694
x1_sqrd_mean    0.981403
x2_sqrd_mean    0.981403
dtype: float64

In [9]:
moments_cov = em.get_moments_cov(
    data, calculate_moments, bootstrap_kwargs={"n_draws": 5_000, "seed": 0}
)

moments_cov

Unnamed: 0,y_mean,x1_mean,x2_mean,yx1_mean,yx2_mean,y_sqrd_mean,x1_sqrd_mean,x2_sqrd_mean
y_mean,0.0019,-0.000915,-0.000664,-0.000545,-0.000228,0.003379,-0.00022,-0.00022
x1_mean,-0.000915,0.00099,0.000729,0.000813,0.00051,-0.001564,0.000114,0.000114
x2_mean,-0.000664,0.000729,0.001028,0.0005,0.000724,-0.001004,0.000171,0.000171
yx1_mean,-0.000545,0.000813,0.0005,0.003613,0.002616,-0.00466,-0.001869,-0.001869
yx2_mean,-0.000228,0.00051,0.000724,0.002616,0.003223,-0.003065,-0.001376,-0.001376
y_sqrd_mean,0.003379,-0.001564,-0.001004,-0.00466,-0.003065,0.012983,0.001532,0.001532
x1_sqrd_mean,-0.00022,0.000114,0.000171,-0.001869,-0.001376,0.001532,0.002004,0.002004
x2_sqrd_mean,-0.00022,0.000114,0.000171,-0.001869,-0.001376,0.001532,0.002004,0.002004


## 4. Estimation with ``estimate_msm``

In [10]:
start_params = true_params.assign(value=[100, 100, 100, 100])

res = em.estimate_msm(
    simulate_moments,
    empirical_moments,
    moments_cov,
    start_params,
    optimize_options={"algorithm":"scipy_lbfgsb"},
)

res.summary() # !check that standard_error is without NA

Unnamed: 0,value,standard_error,ci_lower,ci_upper,p_value,free,stars
intercept,0.453675,5484072.0,-10748580.0,10748580.0,0.9999999,True,
slope1,-0.980684,0.06418333,-1.106481,-0.8548871,1.0487030000000001e-52,True,***
slope2,0.453675,1713761.0,-3358908.0,3358909.0,0.9999998,True,
sd,0.987262,0.08738111,0.8159981,1.158526,1.337153e-29,True,***


## 4. Identification Check

For more background check out Forneron, J. J. (2019). 



### 4.1 1. Uniform Sampling on Level Sets
The computation of the quasi-Jacobean requires uniform draws over the level set. 

The *direct approach* suggests drawing the parameters values uniformly distributed on a parameter space and assign weights based on bandwidth values. The weighted sample is uniformly distributed on the level set. The draws can be random or pseudo-random using quasi-Monte Carlo sequences such as Sobol or Halton sequence. The main drawback of of this approach is that the effective sample size can be very small, namely, the effective sample size tends to be small when the dimentions of the estimated parameters is moderately large.

The *adaptive sample approach* by Population Monte Carlo aims to preserve the simplicity of importance sampling while constructing a sequence of proposal distributions with a higher acceptance rate. It includes several tuning parameters for flexibility.

This step can be done with ``sampling_level_sets`` function. The output it produces consists of the selected draws and simulated moments for those draws.

In [11]:
from identification_check import sampling_level_sets

n = data.shape[0] # the number of observation in the data; required for the default calculation of bandwidth
grid_sub,moms_sub = sampling_level_sets(
    simulate_moments = simulate_moments,
    msm_res = res,
    moments_cov = moments_cov,
    draws = 10000,
    bandwidth = math.sqrt(2 * math.log(math.log(n)) / n),
    weights = "diagonal",
    sampling = "sobol")

grid_sub,moms_sub



(array([[ 2.12960082e+06, -1.01707067e+00, -2.12989041e+06,
          9.19440608e-01]]),
 array([[-2.89584567e+02, -6.56805122e-03, -3.57764133e-03,
          9.13828912e-01,  3.42874722e-01,  8.38610603e+04,
          9.81403169e-01,  9.81403169e-01]]))

### 4.2 Linear Approximations and the quasi-Jacobean Matrix
The central idea behind the identification check is that the quasi-Jacobean provides the best linear approximation of the sample moment function over a region of the parameters where these moments are close to zero. To find the best linear approximation a sup-norm (or $l_{\infty}$-norm) loss is used to minimize the largest deviation from the linear approximation. The calculation of quasi-Jacobean matrix as the slope of a linear approximation of the moments on an estimate of the identified set as well as the variance can be done with the help of ``calculate_quasi_jacobian`` function.

In [12]:
from identification_check import calculate_quasi_jacobian
Bn, phi = calculate_quasi_jacobian(grid_sub, moms_sub, len(res.params["value"])) # the inverse square root variance matrix instead of variance
Bn, phi



(array([[-6.54341344e-05, -1.49202883e-09, -8.34654712e-10,
          2.06791346e-07,  7.75686291e-08,  1.89491082e-02,
          2.21739592e-07,  2.21739592e-07],
        [ 3.60176638e+00,  1.43663148e-04,  2.91733640e-05,
         -1.13726248e-02, -4.19968710e-03, -1.04303562e+03,
         -1.22159733e-02, -1.22159733e-02],
        [ 6.54272893e-05,  1.49189158e-09,  8.34568110e-10,
         -2.06170530e-07, -7.74328371e-08, -1.89471767e-02,
         -2.21726787e-07, -2.21726787e-07],
        [-3.88500164e+00, -3.52711663e-05,  4.01862992e-06,
          1.21392658e-02,  4.65610658e-03,  1.12504454e+03,
          1.31097384e-02,  1.31097384e-02]]),
 array([[-3.64774532e+00, -3.45145149e-05,  3.35512485e-06,
          1.15971602e-02,  4.20865668e-03,  1.05636135e+03,
          1.24544603e-02,  1.24544603e-02],
        [-6.54341344e-05, -1.49202883e-09, -8.34654712e-10,
          2.06791346e-07,  7.75686291e-08,  1.89491082e-02,
          2.21739592e-07,  2.21739592e-07],
        [ 3.60

### 4.3 Test procedure

Identification category selection (ICS) procedure is based on the quasi-Jacobean and its singular values. The procedure evaluates the number of nuisance parameters that are potentially weakly identified. The role of the normalized quasi-Jacobean and the cutoff value is analogous to the ICS procedure in Andrews and Cheng (2012) and the subsequent literature. 

The procedure is implemented using ``category_selection`` which output is the singular values for normalized quasi-Jacobean, the cutoff and the number of identified parameters.

In [13]:
from identification_check import category_selection
category_selection(moments_cov,  len(res.params["value"]), Bn, phi, cutoff = math.sqrt(2 * math.log(n) / n))

(array([4.50244813e+07, 5.14947753e-04, 1.83550438e-09, 2.07951693e-14]),
 0.11753940002383997,
 1)

# 4.4 ``check_msm_identification``

All the previous steps described are included in ``check_msm_identification`` function:

In [15]:
from identification_check import check_msm_identification

check_msm_identification(
        simulate_moments,
        res,
        moments_cov,
        10000,
        n_obs = data.shape[0],
)



(array([[ 1.74063835e-03, -3.95759549e-09, -2.16109639e-09,
         -1.19862649e-05, -6.61478796e-06,  5.06181541e+00,
          5.87460157e-07,  5.87460157e-07],
        [-4.80287322e+01,  1.33603134e-04,  9.41384381e-05,
          3.30708073e-01,  1.82504812e-01, -1.39668665e+05,
         -1.62309070e-02, -1.62309070e-02],
        [-1.74542515e-03,  3.92688912e-09,  2.12517955e-09,
          1.20189456e-05,  6.63361957e-06, -5.07573555e+00,
         -5.88983745e-07, -5.88983745e-07],
        [ 4.37603064e+01, -8.37655368e-05, -3.87053661e-05,
         -3.01334028e-01, -1.66308881e-01,  1.27255979e+05,
          1.47703984e-02,  1.47703984e-02]]),
 array([6.95640673e+11, 6.89869192e-05, 2.39633529e-05, 8.19518777e-10]),
 0.11753940002383997,
 1)