# TALENT Course 11
## Learning from Data: Bayesian Methods and Machine Learning
### York, UK, June 10-28, 2019 

## Exercise: Write your own Metropolis-Hastings MCMC sampler

This exercise is intended for those that want to get familiar with object-oriented programming in python. The purpose of the exercise is to develop your own Metropolis, or Metropolis-Hastings, sampler by employing an abstract, but general, MCMC super class. Based on this MCMC class it is easy to define a subclass that contains the actual sample algorithm.

The general framework is already included via the python package in the 'sampler' sub-directory.

### Tasks
* The sampler should be implemented by modifying the 'mhsampler.py' file.
* The above file already contains the framework and employs the abstract MCMC class defined in mcmc.py. This abstract method implements various helper functions and makes sure that the sampler is compatible with the 'emcee' package.
* More specifically, the function 'sample' should be written such that:
    * the chain object is resized with the number of requested iterations: 
    ```python
    self.chain_object.extend(...)
    ```
    * the Metropolis, or Metropolis-Hastings, acceptance probability is computed
    * a proposed state is either accepted or rejected, depending on which it should call either
    ```python
    self.chain_object.accept(...)
    ```
    with the new state, or
    ```python
    self.chain_object.reject
    ```
    to make another copy of the current position.
    * for each iteration it should generate 'pos', 'lnprob', 'rstate' (these variables are described in the docstring). Hint:
    ```python
    yield pos, lnprob, np.random.get_state()
    ```

* Test your sampler, e.g. using the code below to sample a two-dimensional Gaussian distribution.

### Import of modules

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Not really needed, but nicer plots
import seaborn as sns
sns.set()
sns.set_context("talk")

In [None]:
# Modules needed for this exercise
import emcee
import corner

In [None]:
# Import your own sampler
from sampler.mhsampler import MHSAMPLER

# Sampling a multivariate Gaussian

If this all worked correctly, the array *samples* should contain a series of draws from the posterior. Let's plot them using a so called corner plot (to be discussed in much more detail during the course).

In [None]:
# Number of dimensions
ndim = 2

# Multivariate, correlated gaussian (random position and correlation)
means = 1. - 2. * np.random.rand(ndim)
cov = 0.5 - np.random.rand(ndim ** 2).reshape((ndim, ndim))
cov = np.triu(cov)
cov += cov.T - np.diag(cov.diagonal())
cov = np.dot(cov,cov)
inv_cov = np.linalg.inv(cov)

# The array of mean positions and the inverted covariance matrix should be input as positional
# arguments to the log-posterior function
lnprobfn_args=[means,inv_cov]

In [None]:
def lnprob(q,mu,icov):
    """
    The function ``lnprob`` returns the log-probability function at position ``q`` of a multivariate 
    Gaussian (mean array ``mu`` and inverse covariance matrix ``icov``).
    """
    diff = q-mu
    return -np.dot(diff,np.dot(icov,diff))/2.0

In [None]:
# We will test three different samplers:
# 1. The emcee builtin MH-sampler
# 2. The emcee affine-invariant ensample sampler
# 3. Your own MH sampler
def init_sampler(MCMC_sampler,ndim,nsamples,lnprobfn,lnprobfn_args,nwalkers=1):
    """
    Initialize a MCMC sampler. Returns a sampler object, an array of starting positions, number of steps and walkers.
    """
    # MH-Sampler
    stepsize = 0.5
    cov = stepsize * np.eye(ndim)
    p0 = np.random.rand(ndim)
    nsteps = nsamples
    if MCMC_sampler=='emcee MHSampler':
        sampler = emcee.MHSampler(cov, ndim, lnprobfn, args=lnprobfn_args)
        nwalkers = 1
    elif MCMC_sampler=='my MHSampler':
        sampler = MHSAMPLER(cov, ndim, lnprobfn, args=lnprobfn_args)
        nwalkers = 1
    elif MCMC_sampler=='EMCEE':
        # EEMCEE-sampler
        p0 = [np.random.rand(ndim) for i in range(nwalkers)]
        print('initializing EMCEE sampler with {0} walkers'.format(nwalkers))
        nsteps = int(nsamples/nwalkers)
        sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprobfn, args=lnprobfn_args)
    return sampler, p0, nsteps, nwalkers

In [None]:
# Sampling setup
nburnin = 1000
nsamples = 100000

# nwalkers definition will only be used for the emcee ensamble sampler
nwalkers = 10 # nsteps = nsamples / nwalkers

In [None]:
for MCMC_sampler in ['emcee MHSampler', 'my MHSampler', 'EMCEE']:
    print("MCMC sampler: {0}".format(MCMC_sampler))
    _sampler, _p0, _nsteps, _nwalkers = init_sampler(MCMC_sampler, ndim, nsamples, lnprob, \
                                                     lnprobfn_args, nwalkers=nwalkers)
    
    # Sample the posterior distribution

    # Burn-in
    print('Performing {0} burnin iterations.'.format(nburnin))
    pos, prob, state = _sampler.run_mcmc(_p0, nburnin)
    _sampler.reset()

    # Perform iterations, starting at the final position from the burnin.
    print('{1} walkers performing {0} iterations each.'.format(_nsteps, _nwalkers))
    %time _sampler.run_mcmc(pos, _nsteps)
    print("done")

    print("Mean acceptance fraction: {0:.3f}"
                    .format(np.mean(_sampler.acceptance_fraction)))

    samples = _sampler.flatchain
    
    # make a corner plot with the posterior distribution
    fig = corner.corner(samples, truths=means,quantiles=[0.16, 0.5, 0.84],
                       show_titles=True, title_kwargs={"fontsize": 12})
    plt.show()
