# MSQ: Necessary Structures

This notebook outlines and creates the necessary code structures to implement the MSQ.

NOTE: this code is written in Python 2.7.x, but all attempts are made to use Python3-compliant syntax. 

In [1]:
# Import relevant libraries
from __future__ import division, print_function
import numpy as np

The following is a very compact description of the method. See the authors' paper for a much more in-depth discussion. 

## Main Method Objects

The key idea is that we define two objects: 

- a set of *quantiles* of some distribution, denote this $\mathbf{q}$, of size $s$;
- a set of *functions of quantiles*, which map a set of quantiles to a vector of reals. denote this $\Phi$. 

This skips a lot of details. For example:

- the above set of quantiles and function of quantiles should be thought of as being applied to a *vector* of random variables of length $J$. 
- For each $j$ random variable, the functions of quantiles should produce a vector of length $M$. Thus the total number of elements in the final vector of reals will be $JM$.

That said, I will use the minimal notation needed to describe the process, and use an example to explore and illustate the details of the process. 

Assume we have two things: 

1. A set of M vectors of emperical realizations of the DGP we are trying to fit -- i.e. empirical distributions drawn from the true unknown DGP
    - call these the "empirical" values
    - denote empirical quantiles of these data $\hat{\mathbf{q}}$
    - denote empirical functions of these quantiles $\hat{\Phi}_j$
2. A parameterized, simulation-based DGP, from which we can simulate draws conditional on a parameter $\theta$, 
    - call these the "theoretical" values
    - denote theoretical quantiles of these data $\mathbf{q}_{\theta}$
    - denote theoretical functions of these quantiles $\Phi_{\theta, j}$


We will explore this in more detail in the example below. 

To fit the theoretical model to the empirical data, choose the parameter $\theta$ to minimize the following quadratic objective function:

$$
\hat{\theta} = \underset{\theta \in \Theta}{\textrm{argmin}} \; \left(\hat{\mathbf{\Phi}} -  \mathbf{\Phi}_{\theta}\right)^{\textrm{T}} \mathbf{W}_{\theta} \left(\hat{\mathbf{\Phi}} -  \mathbf{\Phi}_{\theta}\right).
$$

Here $\mathbf{W}_{\theta}$ is a symmetric positive definite weighting matrix.

In addition, the bolded $\Phi$ values are defined as the "stacked" vectors, for each of the $J$ random variables:

$$
\mathbf{\Phi} = \left(\Phi_{1}^{\textrm{T}}, \Phi_{2}^{\textrm{T}}, ..., \Phi_{j}^{\textrm{T}}, ..., \Phi_{J}^{\textrm{T}} \right)^{\textrm{T}}
$$


for both $\hat{\Phi}$ and $\Phi_{\theta}$.

## Illustration By Example

We'll use the example of the $\alpha$-stable distribution to demonstrate the estimation method.