In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import pymc3 as pm
import arviz as az

## Numerical Diagnostics

We will discuss 3 numerical diagnostics available in ArviZ, those are:

* Effective Sampler Size
* $\hat R$ (R hat)
* mcse error

To help us understand these diagnostics we are going to create two _synthetic posteriors_. The first one is a sample from a uniform distribution. We generate it using SciPy and we call it `good_chains`. This is an example of a "good" sample because we are generating independent and identically distributed (iid) samples and ideally this is what we want to approximate the posterior. The second one is called `bad_chains`, and it will represent a poor sample from the posterior. `bad_chains` is a poor _sample_ for two reasons:

* Values are not independent. On the contrary they are highly correlated, meaning that given any number at any position in the sequence we can compute the exact sequence of number both before and after the given number. Highly correlation is the opposite of independence.
* Values are not identically distributed, as you will see we are creating and array of 2 columns, the first one with numbers from 0 to 0.5 and the second one from 0.5 to 1.

In [2]:
good_chains = stats.uniform.rvs(0, 1,size=(2,500))
bad_chains = np.linspace(0, 1, 1000).reshape(2, -1)

## Effective Sample Size (ESS)

When using sampling methods like MCMC is common to wonder if a particular sample is large enough to compute what I want? Answering in terms of the number of samples is generally not a good idea, the main reason is that samples from MCMC methods will be autocorrelated and autocorrelation decrease the actual amount of information contained in a sample. Instead, a better idea is to try to estimate the **effective Sample Size**, this is the number of samples we would have if our sample were actually iid. 

Using ArviZ we can compute it using `az.ess(⋅)`

In [3]:
az.ess(good_chains), az.ess(bad_chains)

(1103.1612083905097, 2.284600376742084)

This is telling us that even when in both cases we have 1000 samples, `bad_chains` is somewhat equivalent to a iid sample of size $\approx 2$. While `good_chains` is equivalent to something closer to 1000. If you resample `good_chains` you will see that the effective sample size you get will be different for each sample. Nevertheless, on average number will be lower than the $N$ number of samples. Notice, however, that ESS could be in fact larger! When using NUTS as a sampler this can happen for parameters which posterior distribution close to Gaussian and which are almost independent of other parameters.

As a general rule of thumb we recommend an ess greater than 400 (!!!)

We can also compute the effective sample size using `az.summary(⋅)`

In [4]:
az.summary(good_chains)

Unnamed: 0,mean,sd,hpd_3%,hpd_97%,mcse_mean,mcse_sd,ess_mean,ess_sd,ess_bulk,ess_tail,r_hat
x,0.513,0.286,0.067,0.993,0.009,0.006,1113.0,1105.0,1103.0,958.0,1.0


## Effective Sample Size in depth

## $\hat R$ (aka R hat)


Under very general conditions MCMC methods have theoretical guarantees that you will get the right answer irrespective of the starting point. Unfortunately, we only have guarantee for infinite samples. One way to attempt to get a useful estimate of convergence for finite samples we can run more than one chain, starting from very different starting points and check if they _look similar_ to each other. $\hat R$ is a formalization of this idea. In it's most basic conception we want to compare the _in chain_ variance to the _between chain_ variance. Ideally they should be the same. Thus, if we compute the ratio $\frac{\text{in chain variance}}{\text{between chain variance}}$ we should get 1.

Conceptually $\hat R$ can be interpreted as the overestimation of variance due to MCMC finite sampling. If you continue sampling infinitely you should get a reduction of the variance of your estimation by a $\hat R$ factor.

From a practical point of view $\hat R \lessapprox 1.01$ are considered safe

Using ArviZ we can compute it using `az.rhat(⋅)`

In [5]:
az.rhat(good_chains), az.rhat(bad_chains)

(0.999163853551514, 3.0393728260009483)

## $\hat R$ in depth

# mcse error 

When using MCMC methods we introduce an additional layer of uncertainty, due to the finite sampling, we call this mcse error. The mcse error takes into account that the samples are not truly independent of each other. If we want to report the value of an estimated parameter to the second decimal we need to be sure the mcse error is below the second decimal otherwise we will be, wrongly, reporting a higher precision than we really have. We should check the mcse error once we are sure $\hat R$ is low enough and ESS is high enough, otherwise mcse error is of no use.  

Using ArviZ we can compute it using `az.mcse(⋅)`

In [6]:
az.mcse(good_chains), az.mcse(bad_chains)

(array([0.00856715]), array([0.19772141]))

## mcse in depth