# Bayesian inference: MCMC hands-on

## VI. Implementing your own MCMC

As we have seen, the standard Bayesian approach to model fitting involves sampling the posterior, usually via a variant of Markov Chain Monte Carlo (MCMC). Though there are many very sophisticated MCMC samplers out there, the most simple algorithm (Metropolis-Hastings) is rather straightforward to code.

Here we'll walk through creating our own Metropolis-Hastings sampler from scratch, in order to better understand exactly what is going on under the hood.

In [None]:
# As usual, we start with some imports
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy
#from scipy import stats

### VI.1 Metropolis-Hastings Procedure

Recall the Metropolis-Hastings procedure:

1. Define a posterior $p(\theta~|~D, I)$
2. Define a *proposal density* $p(\theta_{i + 1}~|~\theta_i)$, which must be a symmetric function, but otherwise is unconstrained (a Gaussian is the usual choice).
3. Choose a starting point $\theta_0$
4. Repeat the following:

   1. Given $\theta_i$, draw a new $\theta_{i + 1}$ from the proposal distribution
   
   2. Compute the *acceptance ratio*
      $$
      a = \frac{p(\theta_{i + 1}~|~D,I)}{p(\theta_i~|~D,I)}
      $$
   
   3. If $a \ge 1$, the proposal is more likely: accept the draw and add $\theta_{i + 1}$ to the chain.
   
   4. If $a < 1$, then accept the point with probability $a$: this can be done by drawing a uniform random number $r$ and checking if $a < r$. If the point is accepted, add $\theta_{i + 1}$ to the chain. If not, then add $\theta_i$ to the chain *again*.
   
The goal is to produce a "chain", i.e. a list of $\theta$ values, where each $\theta$ is a vector of parameters for your model.
Here we'll write a simple Metropolis-Hastings sampler in Python.

Note that the ``np.random.randn()`` function will be useful: it returns a pseudorandom value drawn from a standard normal distribution (i.e. mean of zero and variance of 1).

### VI.2 Implementation

The cells below describe the various steps that you will implement by yourself

*Data:* We'll use data drawn from a straight line model

In [None]:
def make_data(intercept, slope, N=20, dy=2, rseed=42):
    rand = np.random.RandomState(rseed)
    x = 100 * rand.rand(20)
    y = intercept + slope * x
    y += dy * rand.randn(20)
    return x, y, dy * np.ones_like(x)

theta_true = (2, 0.5)
x, y, dy = make_data(*theta_true)

First plot the data to see what we're looking at (Use a ``plt.errorbar()`` plot with the provided data)

Our model will also be a line, as we have done in the previous lecture. 

In [None]:
def model(theta, x):
    # the `theta` argument is a list of parameter values, e.g., theta = [slope, intercept] for a line
    return 

We'll start with the assumption that the data are independent and identically distributed so that the likelihood is simply a product of Gaussians (one big Gaussian). We'll also assume that the uncertainties reported are correct, and that there are no uncertainties on the `x` data. We need to define a function that will evaluate the (ln)likelihood of the data, given a particular choice of your model parameters. A good way to structure this function is as follows:

In [None]:
def ln_likelihood(theta, x, y, dy):
    # we will pass the parameters (theta) to the model function
    # the other arguments are the data
    return

What about priors? Remember your prior only depends on the model parameters, but be careful about what kind of prior you are specifying for each parameter. Do we need to properly normalize the probabilities?

In [None]:
def ln_prior(theta):
    return

Now we can define a function that evaluates the (ln)posterior probability, which is just the sum of the ln prior and ln likelihood:

In [None]:
def ln_posterior(theta, x, y, dy):
    return ln_prior(theta) + ln_likelihood(theta, x, y, dy)

Now write a function to actually run a Metropolis-Hastings MCMC sampler. Ford (2005) includes a great step-by-step walkthrough of the Metropolis-Hastings algorithm, and we'll base our code on that. Fill-in the steps mentioned in the comments below:

In [None]:
def run_mcmc(ln_posterior, nsteps, theta0, stepsize, args=()):
    """
    Run a Markov Chain Monte Carlo
    
    Parameters
    ----------
    ln_posterior: callable
        our function to compute the posterior
    nsteps: int
        the number of steps in the chain
    theta0: list
        the starting guess for parameters theta
    stepsize: float
        a parameter controlling the size of the random step
        e.g. it could be the width of the Gaussian distribution
    args: tuple (optional)
        additional arguments (data) passed to ln_posterior
    """
    # Create the array of size (nsteps, ndims) to hold the chain; ndims is the number of params
    # Initialize the first row of this with theta0
    
    # Create the array of size nsteps to hold the ln-posterior for each point
    # Initialize the first entry of this with the ln-posterior at theta0
    
    # Loop for nsteps
    for i in range(1, nsteps):
        # Randomly draw a new theta from the proposal distribution.
        # for example, you can do a normally-distributed step by utilizing
        # the scipy.random.normal() function
        
        # Calculate the probability for the new state
        
        # Compare it to the probability of the old state
        # Using the acceptance probability function
        # (remember that you've computed the log probability, not the probability!)
        
        # Chose a random number r between 0 and 1 to compare with p_accept
        
        # If p_accept>1 or p_accept>r, accept the step
            # Save the position to the i^th row of the chain
            
            # Save the probability to the i^th entry of the array
            
        # Else, do not accept the step
            # Set the position and probability are equal to the previous values
            
    # Return the chain and probabilities

Now run the MCMC code on the data provided.

Plot the position of the walker as a function of step number for each of the parameters. Are the chains converged? 

Make histograms of the samples for each parameter. Should you include all of the samples? 

In [None]:
# To plot a 2D histogram, consider the use of plt.hist2d()  ; you can also use corner as last time

Report to us your constraints on the model parameters.   
This is the number for the abstract of the paper (or to report in the text of your thesis work) – the challenge is to figure out how to accurately summarize a multi-dimensional posterior (which is **the result** in Bayesianism) with a few numbers (which is what readers want to see as they skim the paper/thesis/report).

What numbers should you use?

## XX References:

**Chapter 5 ** (5.1, 5.2, 5.3, 5.8) of the book <a class="anchor" id="book"></a> *Statistics, data mining and Machine learning in astronomy* by Z. Ivezic et al. in Princeton Series in Modern Astronomy. 

- This notebook includes a large fraction of the material that J. Vander Plas gave during the "Bayesian Methods in Astronomy workshop", presented at the 227th meeting of the American Astronomical Society. The full repository with that material can be found on GitHub: http://github.com/jakevdp/AAS227Workshop

- About the variety of approaches to MCMC: Allison and Dunkley 2013: [Comparison of sampling techniques for Bayesian parameter estimation](https://arxiv.org/abs/1308.2675). See also [How to Be a Bayesian in Python](http://jakevdp.github.io/blog/2014/06/14/frequentism-and-bayesianism-4-bayesian-in-python/). 

- Andreon 2011 [Understanding better (some) astronomical data using Bayesian methods](https://arxiv.org/abs/1112.3652)