# Example Usage for `mix_gamma_vi`

In [16]:
from mix_gamma_vi import mix_gamma_vi
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp

## Generate Dataset

Generate 10000 data from a mixture of gamma two gamma distributions. Called this tensor `x`.

In [17]:
N = 10000
pi_true = [0.5, 0.5]
a_true  = [20,  80 ]
B_true  = [20,  40 ]

mix_gamma = tfp.distributions.MixtureSameFamily(
    mixture_distribution=tfp.distributions.Categorical(probs=pi_true),
    components_distribution=tfp.distributions.Gamma(concentration=a_true, rate=B_true))

x = mix_gamma.sample(N)

## Variational Inference Under the Shape-Mean Parameterisation (Recommended)

The defualt parameterisation for the function `mix_gamma_vi` is the mean-shape parameterisation under which the variational approximations to the posterior are

\begin{align*}
q^*(\mathbf{\pi}) &= \mathrm{Dirichlet} \left( \zeta_1, ..., \zeta_K \right) ,  \\
q^*(\alpha_k) &= \mathcal{N}(\hat{\alpha}_k, \sigma_j^2) ,  \\
q^* (\mu_k) &=  \operatorname{Inv-Gamma} \left( \gamma_k, \lambda_k \right) .  
\end{align*}

The product approximates the joint posterior

\begin{align*}
p(\mathbf{\pi}, \mathbf{\alpha}, \mathbf{\mu} \mid \mathbf{x}) &= q^*(\mathbf{\pi}) \prod_{k=1}^K q^*(\alpha_k) q^*(\mu_k).
\end{align*}

In [18]:
# Fit a model
fit = mix_gamma_vi(x, 2)

# Get the fitted distribution
distribution = fit.distribution()

# Get the means of the parameters under the fitted posterior
distribution.mean()

{'pi': <tf.Tensor: id=2891, shape=(1, 2), dtype=float32, numpy=array([[0.5057488, 0.4942512]], dtype=float32)>,
 'mu': <tf.Tensor: id=2898, shape=(1, 2), dtype=float32, numpy=array([[1.0018914, 1.9988744]], dtype=float32)>,
 'alpha': <tf.Tensor: id=2902, shape=(1, 2), dtype=float32, numpy=array([[20.098001, 79.798294]], dtype=float32)>}

In [19]:
# Get the posterior standard deviations
distribution.stddev()

{'pi': <tf.Tensor: id=2912, shape=(1, 2), dtype=float32, numpy=array([[0.00499892, 0.00499892]], dtype=float32)>,
 'mu': <tf.Tensor: id=2925, shape=(1, 2), dtype=float32, numpy=array([[0.00314254, 0.00318285]], dtype=float32)>,
 'alpha': <tf.Tensor: id=2929, shape=(1, 2), dtype=float32, numpy=array([[0.3996685, 1.6052226]], dtype=float32)>}

## Variational Inference Under the Shape-Rate Parameterisation (Not Recommended)

The traditional parameterisation for gamma distribution is the shape-rate parameterisation which this package also supports (although it is not recommended). In this case, the variational approximations to the posterior are

\begin{align*}
q^*(\mathbf{\pi}) &= \mathrm{Dirichlet} \left( \zeta_1, ..., \zeta_K \right) ,  \\
q^*(\alpha_k) &= \mathcal{N}(\hat{\alpha}_k, \sigma_k^2) , \\
q^* (\beta_k) &=  \operatorname{Gamma} \left( \gamma_j, \lambda_j \right) .  
\end{align*}

The product approximates the joint posterior

\begin{align*}
p(\mathbf{\pi}, \mathbf{\alpha}, \mathbf{\beta} \mid \mathbf{x}) &= q^*(\mathbf{\pi}) \prod_{k=1}^K q^*(\alpha_k) q^*(\beta_k) .
\end{align*}

In [20]:
# Fit a model
fit = mix_gamma_vi(x, 2, parameterisation="shape-rate")

# Get the fitted distribution
distribution = fit.distribution()

# Get the means of the parameters under the fitted posterior
distribution.mean()

{'pi': <tf.Tensor: id=2940, shape=(1, 2), dtype=float64, numpy=array([[0.50572743, 0.49427257]])>,
 'beta': <tf.Tensor: id=2947, shape=(1, 2), dtype=float64, numpy=array([[0.05012314, 0.02520653]])>,
 'alpha': <tf.Tensor: id=2951, shape=(1, 2), dtype=float64, numpy=array([[19.99132836, 79.29992946]])>}

In [21]:
# Get the posterior standard deviations
distribution.stddev()

{'pi': <tf.Tensor: id=2961, shape=(1, 2), dtype=float64, numpy=array([[0.00499892, 0.00499892]])>,
 'beta': <tf.Tensor: id=2974, shape=(1, 2), dtype=float64, numpy=array([[1.57648090e-04, 4.02626111e-05]])>,
 'alpha': <tf.Tensor: id=2978, shape=(1, 2), dtype=float64, numpy=array([[0.06243915, 0.12553762]])>}

So, the standard deviation of $\mathbf{\alpha}$ under the shape-rate parameterisation is much lower than it is under the shape-mean parameterisation. It turns out that the shape-mean parameterisation produces a posterior approximation that is much closer to that of a Gibbs sampler (the baseline) than that of the shape-rate parameterisation. For this reason, we recommend using the shape-mean parameterisation.