# Mathematics Behind Beta Binomial Distribution

The **beta-binomial** distribution is a discrete probability distribution arising when the probability of success in each of a fixed or known number of Bernoulli trials is either unknown or random. Another way of saying this is that it is similar to the **binomial** distribution, except that in the binomial distribution, the probability of success is fixed, whereas in the beta-binomial distribution, the probability of success is random; in particular the probability of success follows the **beta distribution**.

In this notebook, we will go over some of the mathematics explaining how to update the beta prior to obtain the posterior. In particular, we will explain what it means to say that the beta distribution is the **conjugate prior** to the binomial distribution.

Consider a binomial random variable $X$, which will consist of number of successes in $n$ Bernoulli trials. Recall that the probability mass function of a binomial random variable is:

$$p(x) = {n \choose x}q^x (1-q)^{n-x}$$

1. $n$: Total number of trials
2. $x$: Number of successes
3. $q$: Some probability of success

We're going to want to estimate the distribution of $q$ and we will do so using a beta distribution. Using a beta distribution has nice mathematical properties, which will at least be partially explained below.

Recall that the probability density function of a beta distribution is given by
$$p(q) = \frac{q^{\alpha - 1} (1-q)^{\beta - 1}}{B(\alpha, \beta)}$$

1. $\alpha, \beta$: hyperparameters of the beta distribution
2. $B(\alpha, \beta)$ is the beta function (you can look this up here: https://en.wikipedia.org/wiki/Beta_function)

In bayesian probability theory, if the posterior distributions are in the same family as the prior probability distribution, the prior and posterior are then called **conjugate distributions** and the prior is called the **conjugate prior** for the likelihood function.

Generally, we are interested in estimating the probability distribution function of $q$ given some data (successes and failures). In other words, we are interested in estimating the posterior. Recall from Bayes Rule that the posterior is given by:

$$P(q = x | s, f) = \frac{P(s, f | x) P(x)}{\int P(s, f | x)P(x) dx}$$

But notice that $P(s, f | x)$, the likelihood function, actually follows the binomial probability distribution, and the $P(x)$ simply follows the beta distribution:

$$P(s, f | q = x) = {s + f \choose s}x^s (1-x)^{f}$$

$$P(x) = \frac{x^{\alpha - 1} (1-x)^{\beta - 1}}{B(\alpha, \beta)}$$

But notice that these are the only two terms we need to define to substitute into the expression for $P(q = x | s, f)$. In fact, let's do this substitution to see what the posterior distribution looks like!

$$P(q = x | s, f) = \frac{P(s, f | x) P(x)}{\int P(s, f | x)P(x) dx}$$

$$ = \frac{{s + f \choose s}x^s (1-x)^{f} x^{\alpha - 1} (1-x)^{\beta - 1} / B(\alpha, \beta)}{\int P(s, f | x)P(x) dx}$$

$$ = \frac{{s + f \choose s}x^{s + \alpha - 1} (1-x)^{f + \beta - 1} / B(\alpha, \beta)}{\int {s + f \choose s}x^{s + \alpha - 1} (1-x)^{f + \beta - 1} / B(\alpha, \beta) dx}$$

$$ = \frac{x^{s + \alpha - 1} (1-x)^{f + \beta - 1} / B(\alpha, \beta)}{\int x^{s + \alpha - 1} (1-x)^{f + \beta - 1} / B(\alpha, \beta) dx}$$

$$ = \frac{x^{s + \alpha - 1} (1-x)^{f + \beta - 1}}{B(s + \alpha, f + \beta)}$$

which is actually **another Beta distribution** with parameters $(\alpha + s, \beta + f)$.

Thus, we see that using the beta distribution for the prior with a binomial likelihood function is really nice mathematically because the beta distribution is a **conjugate prior** and thus the posterior is also a beta distribution with the very nice and easy update rule of $\alpha_{posterior} = \alpha + s$ and $\beta_{posterior} = \beta + f$.