# Identification Task 

## What we need

We need a function called check_msm_identification that it easy to use and performs the identification check described in [this paper](https://arxiv.org/pdf/1907.13093.pdf). The different variants (e.g. different methods of sampling uniformly from likelihood level sets) can be selected via the optional arguments of check_ml_identification.  The output will either be a dictionary (if it is a small set of outputs that every user will want) or a results object similar to the result of estimate_ml (if there are many different test statistics).

## Task 1: Planning

- Write down which model specific inputs a user has to supply in order to do an identification check. The names should be aligned with estimate_ml where possible. It will definitely be a likelihood function and a result of estimate_msm but there might be more. 
- Write down which kinds of outputs a user will get, what they mean and how they should be visualized in a paper (plots, tables, ...). 
- Write docstrings for check_ml_identification before you actually implement it
- Adjust our [simple example](https://estimagic.readthedocs.io/en/stable/getting_started/estimation/first_msm_estimation_with_estimagic.html) such that it has a second variable that can be arbitrarily correlated with x (i.e. add an identification problem)
- Start to write a tutorial in a notebook that shows how the new function will be used and what the outputs mean

## Remarks

- You can for now assume that the model parameters (params) are a 1d numpy array. We talk about making this more flexible later. 
- The idea behind writing the documentation first is that it lets you focus completely on a user friendly interface and a high level understanding. Also, we will probably ask for changes after you show us your proposed interface. If you had already implemented it, you would have to change it.

In [None]:
# task 1 - interface - inputs and outputs
def check_msm_identification():
    """ Do detecting of identification failure in moment condition models. 
    Performs the identification check as described in Forneron, J. J. (2019).
    Introduces the quasi-Jacobian matrix computed as a slope of a linear 
    approximation of the moments of the estimate of the identified set. It is 
    asymptotically singular when local or global identification fails and equivalent 
    to the usual Jacobian matrix which has full rank when model is point and locally 
    identified.

    Args
        simulate_moments (callable) – Function that takes as inputs model parameters, 
            data and potentially other keyword arguments and returns a pytree with 
            simulated moments.  If the function returns a dict containing the key 
            "simulated_moments" we only use the value corresponding to that key. Other 
            entries are stored in the log database if you use logging.
        simulate_moments_kwargs (dict) – Additional keyword arguments for simulate_moments 
            with, for example, data on dependent and independent variables from the model 
            specification.
        objective (callable) - the GMM objective function. By default, based on L2 norm.
        params (pytree) – A pytree containing the estimated parameters of the model.
            Pytrees can be a numpy array, a pandas Series, a DataFrame with “value” column, a 
            float and any kind of (nested) dictionary or list containing these elements.
        weights (str) – One of “diagonal” (default), “identity” or “optimal”.  Note that 
            “optimal” refers to the asymptotically optimal weighting matrix and is often 
            not a good choice due to large finite sample bias.
        bandwidth (float) - By default is calculated in the form of sqrt(2log(log[n])/n). 
            Required for the calculation of quasi-jacobian matrix.
        kernel (callable) - By default  is the uniform kernel K(U) which is indicator 
            function for |U|<=1. Required for the calculation of quasi-jacobian matrix.
        cutoff (float) - By default is calculated in the form sqrt(2log[n]/n). Required 
            for identification category selection.
        draws (float)  - The number of draws for sampling on level sets. Supposed to be 
            sufficiently large.
        sampling (str) - Methods of sampling uniformly from likelihood level sets. One of 
            the available options for direct approach using "sobol" or "halton" sequence 
            or adaptive sampling by "population_mc".
        significance (float) - The significance level with default level 5%.
        H0 (dict) - Required for subvector inference. For example, b10 = 0.
        reparametrization - .
        logging (pathlib.Path, str or False) – Path to sqlite3 file (which typically has 
            the file extension .db. If the file does not exist, it will be created. The 
            dashboard can only be used when logging is used.
        log_options (dict) – Additional keyword arguments to configure the logging.

    Returns
        dict: The estimated quasi-Jacobian, singular values, identification category, subvestor 
            inference test output and confidence set.
"""


    # 1. quasi-Jacobian Matrix
    # 1.1 set the integration grid and evaluate the moments on thr grid, select draws on the level set
    # 1.2 compute the intercept and slope
    # 1.3 compute the variance


    # 2. Identification Category Selection
    # 2.1 compute singular values
    # 2.2 number of values grater than cutoff



    # 3. Subvector Inference
    # 3.1 test statistic
    # 3.2 hypothesis, confidence set

estimate_msm
https://estimagic.readthedocs.io/en/stable/reference_guides/index.html