# Bayesian Data Analysis

This notebook provides a very simple example of Bayesian parameter estimation using the Beta-Binomial model. Both analytical and simulation-based results are presented.

## Approaches to Data Analysis

><i>"All models are wrong, but some are useful"</i> -- George Box

When performing statistical analyses, probabilistic assumptions are often made. Broadly speaking, there are three commonly used levels of assumptions (listed below).  At each level, the assumptions made increase, as does the possibility that the added assumptions are wrong, but hopefully they are useful.

### 1. Exploratory
><p>Descriptive statistics without any probabilistic assumptions
><p>Examples include: Average, Median, Quantiles, Range, Variance, Minimum/Maximum, Histogram, and various other plots and charts

<i>[Note: Quantities in items 2 & 3, below, can be scalars or vectors.]</i>

### 2. Frequentist
><p>The basic frequentist probability model consists of a random variable or vector (RV), $X$, with a cumulative distribution function, $F$, and fixed, deterministic parameters, $\theta_1, ..., \theta_m$.</p>
><p>$$X \sim F(x;\theta_1, ..., \theta_m)$$</p>
><p>For example, the standard normal model is $F(x; \mu, \sigma) = \Phi(\frac{x-\mu}{\sigma})$,</p>
><p>where $\frac{d}{dz}\Phi(z) = \phi(z) = \frac{1}{\sqrt {2\pi}} e^{-z^2/2}$

### 3. Bayesian
><p>The basic Bayesian probability model is similar to the Frequentist model except that it goes a step further and assumes that the parameters, themselves, are RVs with their own cumulative distribution functions and parameters (e.g., $G$ and $\gamma$, resp. below).</p>
><p>For example,</p>
><p>$(X \mid \Theta=\theta) \sim F(x;\theta)$, called the <b>Likelihood Distribution</b></p>
><p>$\Theta \sim G(\theta;\gamma)$, called the <b>Prior Distribution</b></p>
><p>$\gamma$ is called a <b>hyperparameter</b> and is usually deterministic</p>
><p>The next section provides an example.

To make inferences about a Frequentist or Bayesian probability model of $X$ it is necessary to estimate the parameters of the model, $F$.

In the Frequentist case, the Maximum Likelihood Estimate (MLE) is typically derived. The MLE is a deterministic value.

In the Bayesian case, we assume we know the prior distributions of the parameters of $F$, so we seek to understand how observed values of $X$ affect that prior knowledge.  To do this, we need to obtain the conditional probability distribution of the parameters, given the observed data:
><p>$P(\Theta \mid X=x)$, called the <b>Posterior Distribution</b></p>

Depending on the type of likelihood and prior, the analytical derivation of the posterior might be intractable, so simulation is used to approximate it.  A simple example of such a simulation follows.

## Beta-Binomial Example

This is a simple and often-sited example of Bayesian parameter estimation. The posterior distribution can be analytically derived. Also, the prior and posterior are from the same family of distributions, [Beta](https://en.wikipedia.org/wiki/Beta_distribution), and so the prior distribution is called a <b>Conjugate Distribution for the likelihood</b>.

### The Data

Assume that we've conducted an experiment consisting of $n$ [binomial trials](https://en.wikipedia.org/wiki/Binomial_distribution) with an unknown probability of success, $\theta$, and that we've observed $k_{obs}$ successes.

We'll use the following values for this example:

In [1]:
n = 20  # Number of trials
k_obs = 6  # Number of observed successes in n trials

### The Frequentist Binomial Model

Here, the parameter, $\theta$, is an unknown deterministic value.
<p>$$K \sim Binomial_n(k;\theta) \equiv \binom{n}{k} \theta^k(1-\theta)^{n-k}$$</p>
<p>where $n \in \mathbb{N}$, $k \in \{0, ... ,n\}$, and $\theta \in [0,1]$</p>