# Bayesian Statistics

## Revisiting Bayes' Theorem

Basic conditional probability: 

$p(B|A)=\frac{p(B\cap A)}{p(A)}$ (1)

Since $p(B|A) \ne p(A|B)$, BUT $p(B \cap A) = p(A \cap B)$: 

$p(A|B)=\frac{p(B\cap A)}{p(B)}$ (2)

Then, we derive Bayes' rule: 

$p(A|B)=\frac{p(B|A)p(A)}{p(B)}$ (3)

Now, let's assume that our model parameters are $\theta$ and dataset is $y$:

$p({\boldsymbol{\theta }}|{\boldsymbol{y}})=\frac{p({\boldsymbol{y}}|{\boldsymbol{\theta }})p({\boldsymbol{\theta }})}{p({\boldsymbol{y}})}$ (4)

Often, we just use the proportional distribution, rather than the fully-normalized posterior:

$p({\boldsymbol{\theta }}|{\boldsymbol{y}})\propto p({\boldsymbol{y}}|{\boldsymbol{\theta }})p({\boldsymbol{\theta }})$ (5)

Now, we can break this down into different terms:

- $p(\theta)$ is known as the **prior**, probability of the model (hypothesis) before seeing data
- $p(\theta | y)$ is known as the **posterior**, probability of the model (hypothesis) after seeing data
- $p(y | \theta)$ is known as the **likelihood**, probability of the data given a hypothesis

**_So, what does this mean in terms of solving a statistical problem?_**

- **Prior:** use any background information we have, or any assumptions we have
- **Likelihood:** use any data we can collect to compute probability for each hypothesis
- **Posterior:** A result of a **Bayesian update** (using prior probabilities and new data to compute current probability)

## Bayesian statistics treat models as non-permanent (ever evolving) distributions

Benefits:

1. Prior information is captured
2. Updates can increase accuracy of model
3. Models are interpretable
4. Non-long-run statistics can be captured

Drawbacks:

1. Prior assumptions are weighed in
2. More complex setup

![Bayesian statistics workflow](bayesian_cycle.png)

*Figure 1: The Bayesian statistics workflow works similar to humanistic approaches to research problems.*

## Updates via Bayes Tables

> Suppose there are two bowls of cookies.
>
> * Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies. 
>
> * Bowl 2 contains 20 vanilla cookies and 20 chocolate cookies.
>
> Now suppose you choose one of the bowls at random and, without looking, choose a cookie at random. If the cookie is vanilla, what is the probability that it came from Bowl 1?

Let's set this up!  Here are the steps:

1. What are our variables and how can we relate them?  In other words, what information do we have, and what are we trying to figure out?

2. Tabularize our information:

3. Multiply to obtain the un-normalized posterior:

4. Normalize the posterior by dividing by the total probability so far:

## The power of Bayesian statistics!

### The Monty Hall Problem

The Monty Hall problem (one of the most contentious problems in probability!) is based on a game show called *Let's Make a Deal*. If you are a contestant on the show, here's how the game works:
1. The host, Monty Hall, shows you three closed doors -- numbered 1, 2, and 3 -- and tells you that there is a prize behind each door.
2. One prize is valuable (traditionally a car), the other two are less valuable (traditionally goats).
3. The object of the game is to guess which door has the car. If you guess right, you get to keep the car.

![Monty hall problem illustrated](monty_hall.png)

_Figure 2: The Monty Hall problem illustrated._

**The question:** Suppose you pick Door 1. Before opening the door you chose, Monty opens Door 3 and reveals a goat. Then Monty offers you the option to stick with your original choice or switch to the remaining unopened door. _To maximize your chance of winning the car, should you stick with Door 1 or switch to Door 2?_

### Bayes to the rescue! The solution to the Monty Hall problem:

To answer this question, we have to make some assumptions about the behavior of the host:

1.  Monty always opens a door and offers you the option to switch.

2.  He never opens the door you picked or the door with the car.

3.  If you choose the door with the car, he chooses one of the other
    doors at random.

We start with three hypotheses: the car might be behind Door 1, 2, or 3. 

According to the statement of the problem, the prior probability for each door is 1/3:

We also have new data, where Monty opened Door 3 and revealed a goat. Let's convert this data into the probability for each hypothesis (likelihood):

Now, we can perform a Bayesian update (compute the unnormalized posterior and normalized posterior):

**Our conclusion:** So, should we stick with Door 1 or switch to Door 2?

## Distribution functions

**pmf: Probability mass function (discrete), $f_X(x) = p(X = x)$**

**pdf: Probability density function (continuous), $f_X(x) = \frac{dF_X(x)}{dx}$**

**cdf: Cumulative distribution function, $F_X(x) = p(X \le x)$**

## Bayesian estimation example

I often see rabbits in the garden behind my house, but it’s not easy to tell them apart, so I don’t really know how many there are.

Suppose I deploy a motion-sensing camera trap that takes a picture of the first rabbit it sees each day. After three days, I compare the pictures and conclude that two of them are the same rabbit and the other is different.

How many rabbits visit my garden?

To answer this question, we have to think about the prior distribution and the likelihood of the data:

- I have sometimes seen four rabbits at the same time, so I know there are at least that many. I would be surprised if there were more than 10. So, at least as a starting place, I think a uniform prior from 4 to 10 is reasonable.

- To keep things simple, let’s assume that all rabbits who visit my garden are equally likely to be caught by the camera trap in a given day. Let’s also assume it is guaranteed that the camera trap gets a picture every day.

1. Setup the prior:

2. Incorporate data into a likelihood distribution.  Our data is that two rabbits were the same, and the third is different. This means that the probability for capturing the same rabbit can be found by multiplying two probabilities:
    1. That the second image is the same rabbit as the first: $1/N$ (uniform probability), where $N$ is the number of rabbits.
    2. That the third image is a different rabbit as the first two images: $\frac{N-1}{N}$ (uniform probability complement).

Normally, our data is from experiments, but in this case we have a model that is "simulated":

3. Now, perform our Bayesian update:

Let's plot our posteriors, and see how many numbers of rabbits are likely based on our data:

Just like with frequentist estimation, we want to know how confident we can be! Enter the Bayesian version of a confidence interval, the 90% **credible interval**, which are the quantities that bound the middle 90% probability, for instance:

**What can we draw from this?**

1. Our credible interval is incredibly large.  We probably need more data.
2. If we adjust the prior, we will get vastly different posteriors. The prior we have is called an uninformative prior.