# Bayes Theorem

## Example

#### Suppose there are two bowls of cookies.

- Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies.
- Bowl 2 contains 20 vanilla cookies and 20 chocolate cookies.
- Now suppose you choose one of the bowls at random and, without looking, choose a cookie at random. If the cookie is vanilla, what is the probability that it came from Bowl 1?

- P(B1|V) = P(B1) P(V|B1) / P(V)
- P(B1) = 1/2
- P(V | B1) = 30/40 = 3/4 = 0.75%
- P(V) = P(B1) * P(V|B1) + P(B2) * P(V|B2) = 5/8 or 0.63%

#### Finally, we can apply Bayes’s theorem to compute the posterior probability of Bowl 1:
- P(B1 | V) = (1/2) (3/4) / (5/8) = 3/5

#### This example demonstrates one use of Bayes’s theorem: it provides a way to get from P(B|A) to P(A |B). This strategy is useful in cases like this where it is easier to compute the terms on the right side than the term on the left.

#### Diachronic Bayes

#### There is another way to think of Bayes’s theorem: it gives us a way to update the probability of a hypothesis, 
H , given some body of data, D.

#### This interpretation is “diachronic”, which means “related to change over time”; in this case, the probability of the hypotheses changes as we see new data.

#### Rewriting Bayes’s theorem with H and D yields:
- P(H|D) = P(H) P(D|H) / P(D)

#### In this interpretation, each term has a name:
- P(H) is the probability of the hypothesis before we see the data, called the prior probability, or just prior.
- P(H|D) is the probability of the hypothesis after we see the data, called the posterior.
- P(D|H) is the probability of the data under the hypothesis, called the likelihood.
- P(D) is the total probability of the data, under any hypothesis.

#### Most often we simplify things by specifying a set of hypotheses that are:

- Mutually exclusive, which means that only one of them can be true
- Collectively exhaustive, which means one of them must be true.

#### When these conditions apply, we can compute P(D) using the law of total probability. For example with two hypthesis, H1 and H2.
- P(D) = P(H1) P(D|H1) + P(H2) P(D|H2)
#### And more generally, with any number of hypotheses:
- P(D) = Summation- i P(Hi) P(D|Hi)
#### The process in this section, using data and a prior probability to compute a posterior probability, is called a Bayesian update.

## Bayes Tables
#### A convenient tool for doing a Bayesian update is a Bayes table. You can write a Bayes table on paper or use a spreadsheet, but in this section I’ll use a pandas DataFrame.

In [1]:
import pandas as pd

table = pd.DataFrame(index=['Bowl 1', 'Bowl 2'])
table['prior'] = 1/2, 1/2
table

Unnamed: 0,prior
Bowl 1,0.5
Bowl 2,0.5


In [2]:
table['likelihood'] = 3/4, 1/2
table

Unnamed: 0,prior,likelihood
Bowl 1,0.5,0.75
Bowl 2,0.5,0.5


#### Here we see a difference from the previous method: we compute likelihoods for both hypotheses, not just Bowl 1:

- The chance of getting a vanilla cookie from Bowl 1 is 3/4.
- The chance of getting a vanilla cookie from Bowl 2 is 1/2.

#### You might notice that the likelihoods don’t add up to 1. That’s OK; each of them is a probability conditioned on a different hypothesis. There’s no reason they should add up to 1 and no problem if they don’t.

In [3]:
table['unnorm'] = table['prior'] * table['likelihood']
table

Unnamed: 0,prior,likelihood,unnorm
Bowl 1,0.5,0.75,0.375
Bowl 2,0.5,0.5,0.25


#### I call the result unnorm because these values are the “unnormalized posteriors”. Each of them is the product of a prior and a likelihood
- P(Bi) P(D|Bi)
#### which is the numerator of Bayes’s theorem. If we add them up, we have

- P(B1) P(D|B1) + P(B2) P(D|B2)

#### which is the denominator of Bayes’s theorem, P(D)

#### So we can compute the total probability of the data like this:

In [4]:
prob_data = table['unnorm'].sum()
prob_data

0.625

#### And we can compute the posterior probabilities like this:

In [5]:
table['posterior'] = table['unnorm'] / prob_data
table

Unnamed: 0,prior,likelihood,unnorm,posterior
Bowl 1,0.5,0.75,0.375,0.6
Bowl 2,0.5,0.5,0.25,0.4


#### When we add up the unnormalized posteriors and divide through, we force the posteriors to add up to 1. This process is called “normalization”, which is why the total probability of the data is also called the “normalizing constant”

In [6]:
table2 = pd.DataFrame(index=[6, 8, 12])

from fractions import Fraction

table2['prior'] = Fraction(1, 3)
table2['likelihood'] = Fraction(1, 6), Fraction(1, 8), Fraction(1, 12)
table2

Unnamed: 0,prior,likelihood
6,1/3,1/6
8,1/3,1/8
12,1/3,1/12


In [7]:
def update(table):
    """Compute the posterior probabilities."""
    table['unnorm'] = table['prior'] * table['likelihood']
    prob_data = table['unnorm'].sum()
    table['posterior'] = table['unnorm'] / prob_data
    return prob_data

In [8]:
prob_data = update(table2)
table2

Unnamed: 0,prior,likelihood,unnorm,posterior
6,1/3,1/6,1/18,4/9
8,1/3,1/8,1/24,1/3
12,1/3,1/12,1/36,2/9


## Monty Hall Problem

In [9]:
table3 = pd.DataFrame(index=['Door 1', 'Door 2', 'Door 3'])
table3['prior'] = Fraction(1, 3)
table3

Unnamed: 0,prior
Door 1,1/3
Door 2,1/3
Door 3,1/3


- If the car is behind Door 1, Monty chooses Door 2 or 3 at random, so the probability he opens Door 3 is 1/2
- If the car is behind Door 2, Monty has to open Door 3, so the probability of the data under this hypothesis is 1.
- If the car is behind Door 3, Monty does not open it, so the probability of the data under this hypothesis is 0.

In [10]:
table3['likelihood'] = Fraction(1, 2), 1, 0
table3

Unnamed: 0,prior,likelihood
Door 1,1/3,1/2
Door 2,1/3,1
Door 3,1/3,0


In [11]:
update(table3)
table3

Unnamed: 0,prior,likelihood,unnorm,posterior
Door 1,1/3,1/2,1/6,1/3
Door 2,1/3,1,1/3,2/3
Door 3,1/3,0,0,0


## Conclusion

- First, write down the hypotheses and the data.

- Next, figure out the prior probabilities.

- Finally, compute the likelihood of the data under each hypothesis.

## Exercisses

#### 2.1: Suppose you have two coins in a box. One is a normal coin with heads on one side and tails on the other, and one is a trick coin with heads on both sides. You choose a coin at random and see that one of the sides is heads. What is the probability that you chose the trick coin?

- P(Fair Coin) = 1/2
- P(Rigged Coin) = 1/2
- P(Rigged Coin | Heads)

- P(H) = P(H| Fair Coin) * P(Fair Coin) = 1/2 * 1/2
- P(H | Rigged Coin) * P(Rigged Coin) = (1) (1/2)
- (1/2 * 1/2) + (1 * 1/2) = 1/4 + 1/2 = 3/4 = 0.75
- P(Rigged Coin | H) = P(H|Rigged Coin) * P(Rigged Coin) / P(H)
- 1 * 1/2 / (3/4) = 1/2 / 3/4 = 2/3

#### The probability that you chose the rigged coin, given that you observed heads is 66.7%

## Example 2-2.
#### Suppose you meet someone and learn that they have two children. You ask if either child is a girl and they say yes. What is the probability that both children are girls?
#### Hint: Start with four equally likely hypotheses.

- P(G|G)
- P(G|B)
- P(B|G)
- P(B|B)

#### All probabilities = 1/4, since BB is not an option that limits the scope

- P(GG | at least one girl)
- P(At least one girl)
- P(GG) = 1/4
- P(At least one girl) = P(GG) + P(BG) + P(GB) = 1/4 + 1/4 + 1/4 = 3/4

- P(GG | at least one girl) = P(GG) / P(At least one girl) = 1/4 / 3/4 = 1/3



## Addition Rule
#### P( A or B) = P(A) + P(B) - P(A and B)