# Notes

## Set-up

### Packages

In [7]:
from fractions import Fraction

import numpy as np
import pandas as pd

## Bayes' Theorem

Bayes' Theorem can be very useful when we don't have the complete dataset and so can't calculate conditional probabilities directly.

## Diachronic Bayes

*Diachronic*: Related to change over time.

Bayes' Theorem gives us a way to update the probability of a hypothesis, $H$, given some body of data $D$.

In this context Bayes' Theorem is

\begin{equation}
    P(H|D)= \frac{P(D|H) \cdot P(H)}{P(D)}.
\end{equation}

Each component has a name:
- $P(H)$ is the *prior* probability: the probability of the hypothesis before seeing the data.
- $P(D|H)$ is the *likelihood*: the probability of seeing the data under the hypothesis.
- $P(D)$ is the *total probability* of the data: the probability of seeing the data under any hypothesis.
- $P(H|D)$ is the *posterior* probability: the probability of the hypothesis after seeing the data.

The prior might be based on objective background information or might be subjective.

The total probability is a normalising factor. To calculate it we generally need a set of MECE hypothesis, each with prior probabilities.

The process of computing the posterior probability from the prior using data is called a *Bayesian update*.

## The Dice Problem

>Suppose I have a box with a 6-sided die, an 8-sided die, and a 12-sided die. I choose one of the dice at random, roll it, and report that the outcome is a 1. What is the probability that I chose the 6-sided die?

We perform a Bayesian update under the prior that we were equally likely to choose each die.

In [22]:
table = pd.DataFrame(index=["6-sided", "8-sided", "12-sided"])
table.index = table.index.rename("die")

table["prior"] = Fraction(1, 3)
table["likelihood"] = Fraction(1, 6), Fraction(1, 8), Fraction(1, 12)

table

Unnamed: 0_level_0,prior,likelihood
die,Unnamed: 1_level_1,Unnamed: 2_level_1
6-sided,1/3,1/6
8-sided,1/3,1/8
12-sided,1/3,1/12


In [18]:
def update(table):
    """Compute posterior probabilities."""
    # un-normalised posteriors
    table["unnorm"] = table["prior"] * table["likelihood"]

    total_prob = table["unnorm"].sum()
    table["posterior"] = table["unnorm"] / total_prob
    
    return total_prob


total_prob = update(table)
table

Unnamed: 0,prior,likelihood,unnorm,posterior
6-sided,1/3,1/6,1/18,4/9
8-sided,1/3,1/8,1/24,1/3
12-sided,1/3,1/12,1/36,2/9


# Exercises

## Exercise 1

Suppose you have two coins in a box. One is a normal coin with heads on one side and tails on the other, and one is a trick coin with heads on both sides. You choose a coin at random and see that one of the sides is heads. What is the probability that you chose the trick coin?

### Solution

In [20]:
table = pd.DataFrame(index=["normal", "trick"])
table.index = table.index.rename("coin")

# equally likely you chose each coin
table["prior"] = Fraction(1, 2)

# if you chose the normal coin there is a 50% chance you would see a head on the side you looked at
table.loc["normal", "likelihood"] = Fraction(1, 2)

# if you chose the trick coin there is a 100% chance you would see a head on the side you looked at
table.loc["trick", "likelihood"] = 1

_ = update(table)
table

Unnamed: 0_level_0,prior,likelihood,unnorm,posterior
coin,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
normal,1/2,1/2,1/4,1/3
trick,1/2,1,1/2,2/3


So there is a 2/3 chance that you chose the trick coin.

## Exercise 2

Suppose you meet someone and learn that they have two children. You ask if either child is a girl and they say yes. What is the probability that both children are girls?

### Solution

In [25]:
table = pd.DataFrame(index=["(f, f)", "(f, m)", "(m, f)", "(m, m)"])
table.index = table.index.rename("(sex1, sex2)")

table["prior"] = Fraction(1, 4)
table["likelihood"] = 1, 1, 1, 0

_ = update(table)
table

Unnamed: 0_level_0,prior,likelihood,unnorm,posterior
"(sex1, sex2)",Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"(f, f)",1/4,1,1/4,1/3
"(f, m)",1/4,1,1/4,1/3
"(m, f)",1/4,1,1/4,1/3
"(m, m)",1/4,0,0,0


There is a 1/3 chance that both children are girls.

## Exercise 3

There are many variations of the Monty Hall problem. For example, suppose Monty always chooses Door 2 if he can, and only chooses Door 3 if he has to (because the car is behind Door 2).

- If you choose Door 1 and Monty opens Door 2, what is the probability the car is behind Door 3?
- If you choose Door 1 and Monty opens Door 3, what is the probability the car is behind Door 2?

### Solution

In [26]:
table = pd.DataFrame(index=[1, 2, 3])
table.index = table.index.rename("door")

table["prior"] = Fraction(1, 3)

# if the car is behind door 1 or 3, Monty would definitely open door 2
table.loc[1, "likelihood"] = 1
table.loc[3, "likelihood"] = 1

# the car is definitely not behind door 2
table.loc[2, "likelihood"] = 0

_ = update(table)
table

Unnamed: 0_level_0,prior,likelihood,unnorm,posterior
door,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1/3,1.0,0.333333,0.5
2,1/3,0.0,0.0,0.0
3,1/3,1.0,0.333333,0.5


In the first scenario there is a 50% chance the car is behind door 3.

The second scenario doesn't require any calculation - the only way that Monty would open door 3 is if the car is behind door 2.

## Exercise 4

M&M’s are small candy-coated chocolates that come in a variety of colors.
Mars, Inc., which makes M&M’s, changes the mixture of colors from time to time. In 1995, they introduced blue M&M’s.

- In 1994, the color mix in a bag of plain M&M’s was 30% Brown, 20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan.
- In 1996, it was 24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13% Brown.

Suppose a friend of mine has two bags of M&M’s, and he tells me that one is from 1994 and one from 1996. He won’t tell me which is which, but he gives me one M&M from each bag. One is yellow and one is green. What is the probability that the yellow one came from the 1994 bag?

### Solution

Let's reformulate the problem. We have 2 bags of M&Ms - bag 1 and bag 2. First we pull a yellow from bag 1. Then, we pull a green from bag 2. What is the probability that bag 1 is the 1994 bag?

We perform two Bayesian updates - one for each selection of an M&M - and use the posterior of the first selection as the prior for the second.

In [30]:
table1 = pd.DataFrame(index=[1, 2])
table1.index = table1.index.rename("bag")

table1["prior"] = Fraction(1, 2)

# there is a 20% chance of pulling a yellow out of the 1994 bag and 14% out of the 1996 bag
table1["likelihood"] = 0.2, 0.14

_ = update(table1)
table1

Unnamed: 0_level_0,prior,likelihood,unnorm,posterior
bag,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1/2,0.2,0.1,0.588235
2,1/2,0.14,0.07,0.411765


In [31]:
table2 = pd.DataFrame(index=[1, 2])
table2.index = table2.index.rename("bag")

table2["prior"] = table1["posterior"]

# we pick the green out of the *second* bag
# if the first bag is from 1994 then the second is from 1996 and there is a 20% chance of getting a green
# if the second bag is from 1994 then there is a 10% chance of getting a green
table2["likelihood"] = 0.2, 0.1

_ = update(table2)
table2

Unnamed: 0_level_0,prior,likelihood,unnorm,posterior
bag,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0.588235,0.2,0.117647,0.740741
2,0.411765,0.1,0.041176,0.259259


There is a 74% chance that the yellow M&M came from the 1994 bag.

In [14]:
df = pd.DataFrame(index=[1, 2, 3])
df["prior"] = Fraction(1, 3), Fraction(1, 3), Fraction(1, 3)
df["likelihood"] = Fraction(1, 2), 1, 0

total_probab = update(df)
df

Unnamed: 0,prior,likelihood,unnorm,posterior
1,1/3,1/2,1/6,1/3
2,1/3,1,1/3,2/3
3,1/3,0,0,0


Let's look at the probability that the car is behind door $i$. Say you choose door 1 and the host reveals a goat behind door 3.

Likelihood:
- If the car is behind door 1, what is the probability that the host would reveal a goat behind door 3? $\frac{1}{2}$ - they could have revealed either door 2 or door 3
- If the car is behind door 2, what is the probability that the host would reveal a goat behind door 3? 1 - it's the only one they could reveal.