Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your collaborators below:

In [None]:
COLLABORATORS = ""

---

In [None]:
%matplotlib inline
import numpy as np
from scipy.stats import beta
import matplotlib.pyplot as plt
import math
from ipywidgets import interact, interactive
from IPython.display import display

# Bayesian statistics on a continuous probability distribution

In this notebook, we use what we have learned about continuous probability densities and Bayesian probability theory to model our beliefs about the <i>bias</i> of a coin: that is, the probability $\theta$ that the coin will land heads.

Recall that in order to update our beliefs, we follow Bayes' rule:

$$ P(h|d)=\frac{P(d|h)\cdot P(h)}{P(d)}$$

Here, our hypothesis is about $\theta$, the (continuous-valued) <i>bias</i> of the coin, based on our prior beliefs about the behaviour of coins, and the outcome of coin flips we subsequently observe. Bayes' rule therefore becomes:

$$ p(\theta|d)=\frac{p(d|\theta)\cdot p(\theta)}{p(d)}$$

We seek to find the value of $\theta$ that <i>maximizes</i> this <i>posterior probability density</i>: 
$$argmax_{\theta} \, p(\theta|d)$$. 

By examining the equation, we see that our candidate should maximize the likelihood function, $p(d|\theta)$, and our prior, $p(\theta)$. The name of this estimator is, therefore, the <i>maximum a posteriori</i> (MAP) estimator, as discussed in class.

### Prior probability density

We now need to find a functional form for our prior probability, $p(\theta)$, which represents our baseline beliefs about the behaviour of coins. Since the value for $\theta$ can only be between 0 and 1, we use a $Beta$ distribution to model it, as this has many nice properties. Our initial prior distribution is plotted by the code below.

In [None]:
# Beta distribution pdf evaluated at value(s) theta
def prior(theta, prior_tails, prior_heads):    
    return beta.pdf(theta, prior_heads + 1, prior_tails + 1)

In [None]:
# This function calculates the plots the prior distribution!
def plot_prior(prior_tails, prior_heads):
    
    x = np.arange(0,1,0.01)
    y = prior(x, prior_tails, prior_heads)
    
    plt.figure(1, figsize=(14,6))
    
    plt.plot(x, y, color='k')
    plt.xlabel('theta')
    plt.ylabel('Prior probability = P(theta)')
    plt.title("Prior distribution over theta")

In [None]:
w = interactive(plot_prior, prior_tails=(0,10), prior_heads=(0,10))
display(w)

The graph above shows our ***prior beliefs*** about the probability, $\theta$, the coin flip turns up heads. For mathematical convenience we use a Beta distribution, which can be shown to accurate reflect our belief about $\theta$ given that we have observed a certain number of heads and tails before we start our experiment.

## Likelihood function

We also need to model our data distribution given each candidate of $\theta$. That is, we seek a likelihood function, $p(d|\theta)$, that accuractely describes the density of coin-flip outcomes, in terms of heads and tails, if a coin is biased according to that particular value of $\theta$.

As this represents the outcome of $n$ independent and identically distributed $Bernoulli$ trials, the relevant probability distribution is therefore the $Binomial$:

$$\binom {n} {k} \, \theta^k \, (1-\theta)^{(n-k)}$$

In other words, when considering our data of $n$ flips, the chance that $k$ of the flips landed heads is given by the product of the individual $Bernoulli$ trials that would be needed to give that number of heads $and$ tails, multiplied by the number of different ways to arrange these outcomes. 

For more information on the Binomial distribution, see https://en.wikipedia.org/wiki/Binomial_distribution.

The code below will use these two distributions and Bayes' rule to plot the posterior density, and find the MAP estimate of $\theta$. Try altering each of the parameters, to get a sense of how the distribution and MAP change, and how the MAP compares to the posterior mean:

In [None]:
# Computes the beta-binomial posterior model pdf evaluated at value(s) theta
def posterior(theta, num_heads, num_tails, prior_heads, prior_tails):
    return beta.pdf(theta, prior_heads + num_heads + 1, num_tails + prior_tails + 1)

In [None]:
# This function calculates the plots the posterior distribution
def plot_prior(num_heads, num_tails, prior_heads, prior_tails):
        
    x = np.arange(0,1,0.01)
    y = posterior(x, num_heads, num_tails, prior_heads, prior_tails)
    
    plt.figure(1, figsize=(14,5))
    
    plt.plot(x,y, color='k')
    plt.xlabel('theta')
    plt.ylabel('Posterior probability = P(theta | sequence)')
    plt.title('Posterior distribution over theta')
    
    map_estimator = (prior_heads + num_heads) / (prior_tails + prior_heads + num_heads + num_tails)
    pm_estimator = (prior_heads + num_heads + 1) / (prior_tails + prior_heads + num_heads + num_tails + 2)
    
    map_line = plt.axvline(map_estimator, color='r', label='MAP')
    pm_line = plt.axvline(pm_estimator, color='b', label='Posterior Mean')
    
    plt.legend(handles=[map_line, pm_line])
    
    return {
        'MAP': map_estimator,
        'Posterior Mean': pm_estimator
    }



In [None]:
interactive(plot_prior, num_heads=(0,30), num_tails=(0,30), prior_heads=(0,10), prior_tails=(0,10))

## Outcome

And we're done - we have an estimate of the most likely value of $\theta$, according to our prior beliefs and data!

For further insight into the choice of probability distributions and densities above, think about the probability density of the posterior. Is is similar to the prior of likelilihood function? What properties of this pairing ensure it will have such a shape?

---

Before turning this problem in remember to do the following steps:

1. **Restart the kernel** (Kernel$\rightarrow$Restart)
2. **Run all cells** (Cell$\rightarrow$Run All)
3. **Save** (File$\rightarrow$Save and Checkpoint)

<div class="alert alert-danger">After you have completed these three steps, ensure that the following cell has printed "No errors". If it has <b>not</b> printed "No errors", then your code has a bug in it and has thrown an error! Make sure you fix this error before turning in your problem set.</div>

In [None]:
print("No errors!")