## Gaussian Mixture and Leptokurtotic Assets

Mixture of Gaussian distributions is an intuitive way to create 
distributions with skew and kurtosis by adding 
two or more normal distributions.
Such mixtures can capture leptokurtosis (fat tails) and nonzero skewness.

We illustrate how a structured zero-mean GM(2) can synthesize Gaussian
risk-equivalence for leptokurtotic assets (with "fat tail" returns).

*Dependencies:*

- Repository: https://github.com/rsvp/fecon235
- Python: matplotlib, pandas

*CHANGE LOG*

    2017-05-03  First rough draft.

In [1]:
from fecon235.fecon235 import *

In [2]:
#  PREAMBLE-p6.15.1223 :: Settings and system details
from __future__ import absolute_import, print_function
system.specs()
pwd = system.getpwd()   # present working directory as variable.
print(" ::  $pwd:", pwd)
#  If a module is modified, automatically reload it:
%load_ext autoreload
%autoreload 2
#       Use 0 to disable this feature.

#  Notebook DISPLAY options:
#      Represent pandas DataFrames as text; not HTML representation:
import pandas as pd
pd.set_option( 'display.notebook_repr_html', False )
from IPython.display import HTML # useful for snippets
#  e.g. HTML('<iframe src=http://en.mobile.wikipedia.org/?useformat=mobile width=700 height=350></iframe>')
from IPython.display import Image 
#  e.g. Image(filename='holt-winters-equations.png', embed=True) # url= also works
from IPython.display import YouTubeVideo
#  e.g. YouTubeVideo('1j_HxD4iLn8', start='43', width=600, height=400)
from IPython.core import page
get_ipython().set_hook('show_in_pager', page.as_hook(page.display_page), 0)
#  Or equivalently in config file: "InteractiveShell.display_page = True", 
#  which will display results in secondary notebook pager frame in a cell.

#  Generate PLOTS inside notebook, "inline" generates static png:
%matplotlib inline   
#          "notebook" argument allows interactive zoom and resize.

 ::  Python 2.7.13
 ::  IPython 5.1.0
 ::  jupyter_core 4.2.1
 ::  notebook 4.1.0
 ::  matplotlib 1.5.1
 ::  numpy 1.11.0
 ::  pandas 0.19.2
 ::  pandas_datareader 0.2.1
 ::  Repository: fecon235 v5.17.0221 devMixture
 ::  Timestamp: 2017-05-03, 08:50:47 UTC
 ::  $pwd: /media/yaya/virt15h/virt/dbx/Dropbox/ipy/fecon235/nb


## Gaussian mixture in general and its four moments

We build upon the cumulative distribution function (cdf) $\Phi$ of the standard Gaussian $N(0, 1)$.

For $0\leq p_i \leq 1$ such that $\sum_{i=1}^n{p_i} = 1$, 
we define **Gaussian mixture GM(n)** given the following cdf 
and the associated probability density function (pdf):

$$\begin{aligned}
F_n(x) & = \sum_{i=1}^{n}{p_i}\Phi (\frac{x-\mu_i}{\sigma_i}) \\ f_n(x) & = \sum_{i=1}^{n}{p_i}\phi (x;\mu_i,\sigma_i^2) \\ \phi (x;\mu_i,\sigma_i^2) & = \frac{1}{\sqrt{2\pi}\sigma_i}e^{-{(x-\mu_i)^2}/{(2\sigma_i^2)}}  
\end{aligned}$$

The *intuitive idea* boils down to having n jars, each containing real numbers
which are normally distributed, specified only by their mean and variance. 
We then randomly pick $x$ from the i-th jar with probability $p_i$.
This is essentially how we will later simulate GM(n).

**Proposition 1**: The moments of a Gaussian mixture GM(n) can be derived as follows, see Wang (2015):

$$\begin{aligned}
\mu &= \sum_{i=1}^{n}{p_i}\mu_i \\ \sigma^2 &= \sum_{i=1}^{n}{p_i}(\sigma_i^2+\mu_i^2)-\mu^2 \\ skewness =: s &= \frac{1}{\sigma^3}\sum_{i=1}^{n}{p_i}(\mu_i-\mu)[3\sigma_i^2+(\mu_i-\mu)^2] \\ kurtosis =: \kappa &= \frac{1}{\sigma^4}\sum_{i=1}^{n}{p_i}[3\sigma_i^4+6(\mu_i-\mu)^2\sigma_i^2+(\mu_i-\mu)^4]
\end{aligned}$$

For the standard Gaussian distribution, $s=0$ and $\kappa=3$.
The importance of Proposition 1 is that mixture GM(n) can be a
distribution with values for skew and kurtosis which are non-standard.

Returns on financial assets are said to be "non-Gaussian."
They are known to be "fat-tailed", or more technically,
***leptokurtotic***: $\kappa > 3$.
This is where our mixture model can be helpful for risk management.

## Moments of the zero-mean GM(2)

We focus our attention to a mixture built with two Gaussians,
and examine its *even* moments, $\sigma^2$ and $\kappa$.

**Corolllary 1**: For $p+q=1$ and $\mu_1=\mu_2=0$, 
the moments of this Gaussian mixture, which we shall call **zero-mean GM(2)**,
are given by:

$$\begin{aligned}
\mu &= 0 \\ \sigma^2 &= p\sigma_1^2 + q\sigma_2^2 \\ skewness =: s &= 0 \\ kurtosis =: \kappa &= \frac{3}{\sigma^4}[p\sigma_1^4 + q\sigma_2^4]
\end{aligned}$$

**Corolllary 2:** Combining the equations for even moments,
the following is a necessary condition for the **zero-mean GM(2)**: 

$$\rm K := \frac{\kappa}{3} = \frac{p\sigma_1^4 + q\sigma_2^4}{(p\sigma_1^2 + q\sigma_2^2)^2}$$

Note: Our defined $\rm K$ should not be confused with
"*excess kurtosis*" := $\kappa-3$
(which is sometimes erroneously reported as the kurtosis statistic).

## Structured zero-mean GM(2)

Given data set {$x$}, we can compute its variance and kurtosis.
Thus, $\sigma$ and $\rm K$ are observable.
Corollary 2 says that GM(2) parameters can indeed synthesize $\rm K$,
but its messiness suggests that we impose further structure for clarity.

A sensible requirement is a strict ordering,
$\sigma_1 < \sigma < \sigma_2$ due to the equation for $\sigma^2$.
We can impose this by constrained constants $a$ and $b$.

**Proposition 2**: For zero-mean GM(2), setting $0<a<1<b$ such that
$\sigma_1 = a\sigma$ and $b\sigma = \sigma_2$ yields

$$ p = \frac{\rm K - b^4}{a^4 - b^4}$$

*Proof*: Substitute the two required equalities into the kurtosis equation
of Corollary 1. Also note that by construction: $q=1-p$. $\square$

## Numerical application in finance

Suppose we are examining the returns of a financial asset XYZ
and find evidence of a leptokurtotic distribution where $\kappa=4.05$.
Clearly this is non-Gaussian (though kurtosis estimates
are not known to be stable due to their fourth power).
The annualized volatility was observed to be 20%, 
so $\sigma=0.20$ and we wish to use the GM(2) model
to manage risk.

We establish two synthetic assets on our books to represent XYZ:

- XYZ1 with volatility $a\sigma$ where $a=0.8979$
- XYZ2 with volatility $b\sigma$ where $b=2.0000$

The idea here is that XYZ1 models XYZ behaviour
during normal trading, while XYZ2 models XYZ in an
extremely volatile environment.

We can still use our structured zero-mean GM(2) with
non-zero mean returns because the mean is an *odd* moment
merely providing an additive shift -- without changing
the *shape* of the distribution. The pressing issue was 
actually getting suitable probability weights $p$ and $q$:

$$\begin{aligned}
p &= \frac{(4.05/3)-2^4}{0.8979^4-2^4} = \frac{1.35-16}{0.65-16} = 0.9544
\\ q &= 1-p = 0.0456
\end{aligned}$$

Note that if kurtosis should increase, then upon recomputation, 
$q$ increases because more raws should come from the risker jar.

Synthetic assets XYZ1 and XYZ2 are Gaussian assets, so we are
free to use the optimal linear mean-variance framework,
yet the leptokurtotic risks are approximately covered.

If \$100,000 of XYZ was on our books at 20% volatility, we use the
probability weights to replace it with: 
- \$95,440 of XYZ1 at 17.96% volatility, and
- \$4,560 of XYZ2 at 40% volatility.

# Test Code

In [5]:
#  Gaussian Mixture
#  http://www.nehalemlabs.net/prototype/blog/2014/04/03/quick-introduction-to-gaussian-mixture-models-with-python/

from sklearn import mixture

def fit_samples(samples):
    gmix = mixture.GMM(n_components=2, covariance_type='full')
    gmix.fit(samples)
    print(gmix.means_)
    colors = ['r' if i==0 else 'g' for i in gmix.predict(samples)]
    ax = plt.gca()
    ax.scatter(samples[:,0], samples[:,1], c=colors, alpha=0.8)
    plt.show()

## References

Wang, Jin (2015), [Multivariate Mixtures of Normal Distributions: Properties, Random Vector Generation, Fitting, and as Models of Market Daily Changes](https://www.researchgate.net/profile/Jin_Wang40/publication/274695842_Multivariate_Mixtures_of_Normal_Distributions_Properties_Random_Vector_Generation_Fitting_and_as_Models_of_Market_Daily_Changes/links/5525314c0cf22e181e73ee4f.pdf), Journal on Computing, Vol. 27, No. 2, Spring 2015, pp. 193–203. Cf. earlier version, Wang (2000), [Modeling and Generating Daily Changes in Market Variables Using A Multivariate Mixture of Normal Distributions](http://ww2.valdosta.edu/~jwang/paper/MixNormal.pdf).

Thanks for corrected [skew equation](http://onlyvix.blogspot.com/2011/05/mixture-of-normal-formulas-for-skew-and.html) to @onlyvixx.