# Introduction to Bayesian Statistics


---
<a id='properties'></a>
### Properties of Probability

**Joint Probability**

The joint probability of two events, $A$ and $B$, is

### $$ P(A \cap B) = P(A|B) \; P(B) = P(B|A) \; P(A) $$

If we want to know the probability that both $A$ and $B$ happen, we can multiply the probability that $B$ happens by the probability that $A$ occurs if $B$ does.

**Conditional Probability**

The probability of an event that is conditional on another event is written using a vertical bar between the two events. The probability of event $A$ occurring _given_ that event $B$ occurs is calculated as:

### $$ P(A | B) = \frac{P(A \cap B)}{P(B)} $$

This represents the probability of both $A$ and $B$ occurring, divided by the probability that $B$ occurs at all.

<a id='bayes-rule'></a>
## Bayes' Theorem

---

Bayes' theorem relates the probability of $A$ given $B$ to the probability of $B$ given $A$. This rule is critical for performing statistical inference, as we'll see shortly. It's formulated as:

### $$ P(A|B) = \frac{P(B|A)\;P(A)}{P(B)} $$

Let's return to the courtroom example.

Say $A$ is the event that the suspect is guilty.

$B$ is the event that the suspect's wallet was found at the scene of the crime.

Using Bayes' theorem, we phrase this as: The probability that the suspect is guilty given that the suspect's wallet was found at the scene of the crime is equivalent to the probability that the suspect's wallet was found there given that the suspect is guilty, times the probability that the suspect is guilty (without evidence), and divided by the total probability that the wallet is found at the scene of the crime.


<a id='parts'></a>
## Bayes' Theorem in Parts
---

Using the diachronic interpretation of Bayes' theorem, we can describe each part with its label, like in our coin flip example above.

### $$P\left(model\;|\;data\right) = \frac{P\left(data\;|\;model\right)}{P(data)}\; P\left(model\right)$$

**The Prior**

### $$ \text{prior} = P\left(model\right) $$

The prior is our belief in the model given no additional information. This model could be as simple as a statistic, such as the mean we're measuring, or a complex regression. 

**The Likelihood**

### $$ \text{likelihood} = P\left(data\;|\;model\right) $$

The likelihood is the probability of the data we observed occurring given the model. For example, assuming that a coin is biased toward heads with a mean rate of heads of 0.9, what is the likelihood that we observe 10 tails and two heads in 12 coin flips?

The likelihood is, in fact, what frequentist statistical methods are measuring. 

**The Marginal Probability or Total Probability of the Data**

### $$ \text{marginal probability of data} = P(data) $$

The marginal probability of the data is the probability that our data are observed regardless of what model we choose or believe in. You divide the likelihood by this value to ensure that we are only talking about our model within the context of the data occurring. We divide by this value to ensure that what we get on the other side is a true probability distribution — more on this later.

**The Posterior**

### $$ \text{posterior} = P\left(model\;|\;data\right) $$

The posterior is our _updated_ belief in the model given the new data we have observed. Bayesian statistics are all about updating a prior belief we have about the world with new data, so we're transforming our _prior_ belief into this new _posterior_ belief about the world.