---
title: "Bayes' Theorem"
subtitle: "Bayesian Statistics: From Concept to Data Analysis"
subject: "Bayesian Statistics"
description: "Outline of probability"
categories:
  - Bayesian Statistics
keywords:  
  - Conditional Probability
  - Bayes' theorem

---


# Bayes' Theorem {#sec-bayes-theorem}


In [None]:
#| echo: false

#import pandas as pd

## Conditional Probability {#sec-conditional-probability}

![conditional probability](images/c1l02-ss-01-conditional-probability.png){.column-margin}

$$
P(A \mid B)=\frac{P(A \cap B)}{P(B)}
$$ {#eq-conditional-probability}

independence

$$
P(A \mid B) = P(A) \implies P(A \cap B) = P(A)P(B)
$$ {#eq-conditional-independence}

::: {exm-conditional-proability-students}
### Conditional Probability Example - Female CS Student

Suppose there are 30 students, 9 of whom are female. Of the 30 students, 12 are computer science majors. 4 of those 12 computer science majors are female.

We want to estimate what is the probability of a student being female given that she is a computer science major

We start by writing the above in the language of probability by converting frequencies to probabilities. We start with the marginal.

First, the probability of a student being female from the data given above.

$$P(Female) = \frac{9}{30} = \frac{3}{10}$$

Next, we estimate the probability of a student being a computer science major again just using the data given above.

$$P(CS) = \frac{12}{30} = \frac{2}{5}$$

Next, we can estimate the joint probability, i.e. the probability of being female and being a CS major. Again we have been given the numbers in the data above.

$$P(F\cap CS) = \frac{4}{30} = \frac{2}{15}$$

Finally, we can use the definition of conditional probability and substitute the above

$$P(F \mid CS) = \frac{P(F \cap CS)}{P(CS)} = \frac{2/15}{2/5} = \frac{1}{3}$$
:::

An intuitive way to think about a conditional probability is that we're looking at a sub-segment of the original population, and asking a probability question within that segment

$$P(F \mid CS^c) = \frac{P(F\cap CS^c)}{PS(CS^c)} = \frac{5/30}{18/30} = \frac{5}{18}$$

The concept of **independence** is when one event does not depend on another.

$$A \perp \!\!\! \perp B \iff P(A \mid B) = P(A)$$

It doesn't matter that B occurred.

If two events are independent then the following is true:

$$A \perp \!\!\! \perp B \iff P(A\cap B) = P(A)P(B) $$ {#eq-def-independence}

This can be derived from the conditional probability equation.

### Inverting Conditional Probabilities

If we don't know $P(A \mid B)$ but we do know the inverse probability $P(B \mid A)$ is. We can then rewrite $P(A \mid B)$ in terms of $P(B \mid A)$

$$P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}$$ {#eq-bayes-rule}

::: {exm-conditional-proability-hiv}
### Conditional Probability Example - ELISA HIV test

:::: {.content-visible unless-format="pdf"}

{{< video https://www.youtube.com/watch?v=R13BD8qKeTg&t=265s >}}

::::

Let's look at an example of an early test for HIV antibodies known as the ELISA test. - The test has a true positive rate of 0.977. - It has a true negative rate of 0.926. - The incidence of HIV in North America is .0026.

Now we want to know the probability of an individual having the disease given that they tested positive $P(HIV | +)$.

This is the inverse probability of the true positive, so we will need to use Bayes' theorem.

We start by encoding the above using mathematical notation, so we know what to substitute into Bayes' theorem.

The true positive rate is:

$$P(+ \mid HIV) = 0.977$$

The true negative rate is:

$$P(- \mid NO\_HIV) = 0.926$$

The probability of someone in North America having this disease was

$$P(HIV) = .0026$$

what we want is: $P(HIV \mid +)$

$$
\begin{aligned}
P(HIV \mid +) &= \frac{P(+ \mid HIV)P(HIV)}{P(+ \mid HIV)P(HIV) + P(+ \mid NO\_HIV){P(NO\_HIV)}}  \\ &= \frac{(.977)(.0026)}{(.977)(.0026) + (1-.977)(1-.0026)}  \\ &=  0.033 
\end{aligned}
$$

This is a bit of a **surprise** - although the test has 90% + true and false accuracy - taking it once is only valid 3% of the time. How is this possible?

What happens in Bayes law is that we are updating probabilities. And since we started with such a low probability of .0026, Bayesian updating only brings it up to 0.03.

$$
\begin{aligned}
P(A \mid B) = \frac{P(B \mid A_1){(A_1)}}{\sum_{i=1}^{n}{P(B \mid A_i)}P(A_i)} \end{aligned}
$$
:::

Note: (@McElreath2020Rethinking) discusses how this can be presented less surprisingly.

## Bayes' theorem

![Bayes theorem](images/c1l01-ss-02-bayes-theorem.png){.column-margin style=".column-margin"} Here are a few formulations of Bayes' theorem. We use H to indicate our hypothesis and E as the evidence or data.

We use the definition of conditional probability:

$$P(A \mid B) = \frac{ P(A \cap B)}{P(B)} \quad \text{conditional probability}$$

$$
\begin{aligned}
{\color{orange} \overbrace{\color{orange} P(H|E)}^{\text{Posterior}}} &= \frac{  {\color{pink} \overbrace{\color{pink} P(H \cap E)}^{\text{Joint}}}  } {  {\color{green} \underbrace{{\color{green} P(\text{E})}}_{\text{Marginal Evidence}}} } \\ &= \frac{  {\color{red} \overbrace{\color{red} P (\text{H})}^{\text{Prior}}} \cdot  {\color{blue} \overbrace{\color{blue} P (E \mid H)}^{\text{Likelihood}}} } { {\color{green} \underbrace{{\color{green} P(E)}}_{\text{Marginal Evidence}}} } \\ &= \frac{  {\color{red} \overbrace{\color{red} P (H)}^{\text{Prior}}} \cdot {\color{blue} \overbrace{\color{blue} P (E \mid H)}^{\text{Likelihood}}} }{  {\color{green} \underbrace{\color{green} P(E \mid H) P(H) + P(E \mid H^c) P(H^c)  }_{\text{Marginal Evidence}}} }
\end{aligned}
$$

We can extend Bayes theorem to cases with multiple mutually exclusive events:

![mutually exclusive events](images/total-probability.jpg){.column-margin}

if $H_1 ... H_n$ are mutually exclusive events that sum to 1:

$$\begin{aligned} P(H_1 \mid E) &= \frac{P(E \mid H)P(H_1)}{P(E \mid H_1)P(H_1)+...+P(E \mid H_n)P(H_N)} \\ &= \frac{P(E \mid H)P(H_1)}{\sum_{i=1}^{N} P(E \mid H_i)P(H_i)} \end{aligned} $$

where we used the law of total probability in the denominator

if ${B_i}$ is a finite or countably finite partition of a sample space then

$$  P(A) = {\sum_{i=1}^{N} P(A \cup B_i)}= {\sum_{i=1}^{N} P(A \mid B_i)P(B_i)}$$

$${\color{orange} P (\text{H} \mid \text{E})} = \frac {{\color{red} P(\text{H})} \times {\color{blue}P(\text{E} \mid \text{H})}} {\color{gray} {P(\text{E})}}$$

$${\color{orange} \overbrace{\color{orange} P (\text{Unknown} \mid \text{Data})}^{\text{Posterior}}} = \frac {{\color{red} \overbrace{\color{red} P (\text{Unknown})}^{\text{Prior}}} \times {\color{blue} \overbrace{\color{blue} P (\text{Data} \mid \text{Unknown})}^{\text{Likelihood}}}} {{\color{green} \underbrace{{\color{green} P(\text{E})}}_{\text{Average likelihood}}}}$$

The following is a video explaining Bayes law.


{{< video https://youtu.be/UrWnE9zn94k >}}


## Bayes' Theorem for continuous distributions

When dealing with a continuous random variable $θ$, we can write the conditional density for $θ$ given $y$ as:

$$f(\theta \mid y) =\frac{f(y\midθ)f(θ)}{\int f(y\mid\theta) f(\theta) d\theta }$$ {#eq-bayes-continuous}

This expression does the same thing that the versions of Bayes' theorem from Lesson 2 do. Because $\theta$ is continuous, we integrate over all possible values of $\theta$ in the denominator rather than take the sum over these values. The continuous version of Bayes' theorem will play a central role from Lesson 5 on.

![Rev. Thomas Bayes by Mark Riehl](images/bio-Thomas-Bayes.png){.column-margin}

::: callout-tip
### Historical Note on The Reverend Thomas Bayes

Bayes Rule is due to Thomas Bayes (1701-1761) who was an English statistician, philosopher and Presbyterian minister. Although Bayes never published what would become his most famous accomplishment; his notes were edited and published posthumously by Richard Price.
:::