# Binomial model with Beta priors

This notebook plots the beta distribution curves so that we can see the effect of observing data on posterior for the binomial model.

Recall that the density of the $\text{Beta}(\alpha, \beta)$ distribution is defined as

$$
    p(\theta | \alpha, \beta) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta^{\alpha-1}(1-\theta)^{\beta-1}
$$

In [None]:
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

In [None]:
def dbeta(a, b, n_samples=10000):
    """
    Evaluate beta density with parameters a and b on a grid, normalise to have integral 1
    """
    x = np.linspace(0, 1, n_samples)
    y = x ** (a - 1) * (1 - x) ** (b - 1)
    return x, y * n_samples / np.sum(y)


def plot_beta(a, b, n_samples=10000, ax=None):
    """
    Make a plot of the beta distribution with parameters a and b. Note that the
    values of the density are normalised to lie in the range 0, 1, and so the
    integral will not be 1.
    """
    if ax is None:
        f, ax = plt.subplots()
    else:
        f = plt.gcf()
    ax.plot(*dbeta(a, b, n_samples=n_samples))
    return f, ax

## Interpretation 

Assume that $y \sim \mathrm{Binom}(n, \theta)$. Then we saw that $\theta \sim \mathrm{Beta}(\alpha, \beta)$ leads to $\theta \:|\: y \sim \mathrm{Beta}(\alpha+y, \beta+n-y)$.

The uniform distribution on $[0,1]$ can be characterised as $\mathrm{Beta}(1,1)$, and so with a uniform prior, the posterior distribution for $\theta$ having observed $y$ successes from $n$ trials is $\mathrm{Beta}(y+1, n+1-y)$.

Below we plot $\mathrm{Beta}(1,1)$, $\mathrm{Beta}(1,2)$ and $\mathrm{Beta}(1,5)$ which correspond to the prior; the posterior having observed one trial which was unsuccessful; and the posterior having observed 4 trials, all of which were unsuccessful. Note that the posterior probability of $\theta$ becomes more and more concentrated at $0$, as the data suggests success is unlikely.

In [None]:
f, ax = plt.subplots(nrows=1, ncols=3, figsize=(18, 6))

plot_beta(1, 1, ax=ax[0])
plot_beta(1, 2, ax=ax[1])
plot_beta(1, 5, ax=ax[2])
ax[0].set_title("Beta(1, 1)", fontdict={"fontsize": 16})
ax[0].set_ylim(0, 1.1)
ax[1].set_title("Beta(1, 2)", fontdict={"fontsize": 16})
ax[2].set_title("Beta(1, 5)", fontdict={"fontsize": 16});

As we observe more data, our uncertainty drops. Here are plots of $\mathrm{Beta}(1+1, 3+1)$, $\mathrm{Beta}(10+1, 30+1)$, $\mathrm{Beta}(100+1, 300+1)$. In each case we observe three times as many failures as successes.

In [None]:
f, ax = plt.subplots(nrows=1, ncols=3, figsize=(18, 6))

plot_beta(2, 4, ax=ax[0])
plot_beta(11, 31, ax=ax[1])
plot_beta(101, 301, ax=ax[2])
ax[0].set_title("Beta(2, 4)", fontdict={"fontsize": 16})
ax[1].set_title("Beta(11, 31)", fontdict={"fontsize": 16})
ax[2].set_title("Beta(101, 301)", fontdict={"fontsize": 16});

Let's now imagine we specify an informative prior. We can see visually that as we observe more and more data, the prior's influence reduces.

The below plots show the same data observed for two different priors. First a uniform prior (top left), then the posterior after observing 10 successes from 40 trials, and the posterior after observing 100 successes from 400 trials. The second row shows a non-uniform prior that represents the belief that the probability of success is high. As the data is observed, we see that very quickly the mass of the distribution moves to lower values of $\theta$, and after observing 400 samples the two posteriors are almost visually indistinguishable.

In [None]:
f, ax = plt.subplots(nrows=2, ncols=3, figsize=(18, 12))

plot_beta(1, 1, ax=ax[0, 0])
plot_beta(11, 31, ax=ax[0, 1])
plot_beta(101, 301, ax=ax[0, 2])
ax[0, 0].set_title("Beta(1, 1)", fontdict={"fontsize": 16})
ax[0, 0].set_ylim(0, 1.1)
ax[0, 1].set_title("Beta(11, 31)", fontdict={"fontsize": 16})
ax[0, 2].set_title("Beta(101, 301)", fontdict={"fontsize": 16})

plot_beta(10, 2, ax=ax[1, 0])
plot_beta(20, 32, ax=ax[1, 1])
plot_beta(110, 302, ax=ax[1, 2])
ax[1, 0].set_title("Beta(10, 2)", fontdict={"fontsize": 16})
ax[1, 1].set_title("Beta(20, 32)", fontdict={"fontsize": 16})
ax[1, 2].set_title("Beta(110, 302)", fontdict={"fontsize": 16});