### Bayes' Theorem

This is given as 

$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $

Let's see a proof.

Consider the definition of conditional probability (see the notebook for intuition):

$ P(A|B) = \frac{P(A \cap B)}{P(B)} $

And that same definition, reversed:

$ P(B|A) = \frac{P(B \cap A)}{P(A)} $

$ P(B|A) = \frac{P(A \cap B)}{P(A)} $

$ P(A \cap B) = P(B|A)P(A)  $

We can now substitute this into the first equation:

$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $

These can be interpreted as:

$ P(A) $ : be the prior probability. If we know nothing about the situation then you might assume a uniform distribution.

$ P(A|B) $ : be the propability that $ A $ is true given the evidence B. This is unknown and we want to find it

$ P(B|A) $ : be the propability that we will observe the evidence if we assume the A is true. This is also known as the likelihood. This is also where you might introduce a model.

$ P(B) $ : be the propability of the evidence across all hypotheses.

### An application

Say we have two coins and the fairness is assumed to be:

$ C_1 = 5:5, (h:t) $, which is fair.

$ C_2 = 2:8, (h:t) $, which is unfair.

If we used a distribution, rather than events, then the parameters for that distribution would be changed as a result of this. See the Bayesian Analyses folder for this.

And suppose it's a magician who gives us the coin.

---

We can now conduct an experiment:

Choose a coin.

$ H_1 $ : the coin chosen is $ C_1 $

$ H_2 $ : the coin chosen is $ C_2 $

$ E $ is "the evidence" of getting heads, $ h $

And then collect some data with 10 coin flips:

$ h,t,h,h,t,h,t,t,h,h $

$ P(H_1) = 2/10 $, since the magician is very suspicious.

$ P(E | H_1) = 5/5 $, or the probability of getting heads if we assume the coin chosen is $ C_1 $. Otherwise, the likelihood.

$ P(E) = 6/10 $, or the probability of getting heads.

Bayes' Theorem states:

$ P(H_1 | E) = \frac{P(E | H_1)P(H_1)}{P(E)} $

So:

$ P(H_1 | E) = \frac{.5 * .2}{.6} $

In [2]:
(.5*.2)/.6

ans =  0.16667


### Finding $ P(B_i|A) $ using Bayes Theorem

Recall that:

$ P(A|B) = \frac{ P(B|A)P(A) }{ P(B) } $

But this implies:

$ P(B|A) = \frac{ P(A|B)P(B) }{ P(A) } $

Imagnine a sample space is partitioned by events $ B_i $, and that these are mutually exclusive

$ P(B_i|A) = \frac{ P(A|B_i)P(B_i) }{ P(A) } $

Where $ P(A) $ is the summation of $ P(A|B_i) $ for all $ B_i $, and this is directly from the law of total probability.

#### Example

Let's say there is some disease:

Let there be events:

$ A $ disease present

$ \bar A $ disease absent

$ \oplus $ test positive

$ \ominus $ test negative

Let there be known probabilities, supplied by experts:

$ P(A) = 0.001 $ this is the probability in the population

$ P(\bar A) = 0.999 $

$ P(\oplus|A) = 0.99 $

$ P(\ominus| A) = 0.01 $

$ P(\ominus|\bar A) = 0.99 $

$ P(\oplus|\bar A) = 0.01 $

If a person has tested positive, then what is the probability they have the disease?

$ P(A|\oplus) = \frac{P(\oplus|A)P(A)}{P(\oplus)} $

But consider that having the disease and not having the disease partition the sample space, then the law of total probability tells us:

$ P(\oplus) = P(\oplus|A)P(A) + P(\oplus|\bar A)P(\bar A) $

Then:

$ P(A|\oplus) = \frac{P(\oplus|A)P(A)}{P(\oplus|A)P(A) + P(\oplus|\bar A)P(\bar A)} $

In [2]:
(.99 * .001) / (.99 * .001 + 0.01 * .999)

ans = 0.090164
