<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Additional Bayesian statistics problems

_Instructor: Aymeric Flaisler_

---


In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

sns.set_style("whitegrid")

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

### 1. German Tank Problem

A railroad numbers its railcars $1,\ldots,N$. You see a railcar with the number 60 painted on it. The problem is to come up with an estimate for $N$. We'll denote $N=\theta$ to stick with our standard notation.

Apply Bayesian analysis to this problem by articulating the hypothesis/hypotheses, the data, and the likelihood. Be sure to try at least three separate prior distributions for $\theta$. What effect does this have on your posterior distribution of $\theta$ and, thus, your estimate for $N$?

In [None]:
# A:

## The hypotheses are:
## H_60: N = 60.
## H_61: N = 61.
## H_62: N = 62.
## ...
## H_1000: N = 1,000. (I arbitrarily stop here, but we could add more hypotheses.)

## The data is: we observed railcar 60.

## What is the likelihood P(y=60|H) ?

In [None]:
## Now let's says that N = 100
## What is the prior?

## Try to plot it

We can write out the formula for this:

#### $$ P(\text{total trains} = N \;|\; \text{observed} = x) = \frac{P(\text{observed} = x \;|\; \text{total trains} = N)}{P(\text{observed} = x)} P(\text{total trains} = N) $$

In [None]:
# We have the prior, P(total trains = N). We believe that the total number of trains 
# can only be between 1 and 100 and that any of those are equally likely.

# We can write out now a function for the likelihood:
# P(observed = x | total trains = N)


# This will take three arguments: which train number was observed and how many trains
# there are (the conditional part total trains = N)

def likelihood(observed, total_trains):
    if observed > total_trains:
        return 0.0
    else:
        return 1./total_trains

In [None]:
# Iterate over all of our hypotheses and calculate the likelihood. Because this is a discrete
# problem we can plug in each of our train numbers that we have a prior for to get out
# the corresponding likelihood:

for x in range(len(prior)):
    

In [None]:
# We will multiply the prior by the likelihood at this point to get our likelihood
# adjusted by our prior belief:


In [None]:
# You can see these probabilities are tiny - and they don't sum to 1...

# This is where the denominator - the marginal probability of observing train 60
# comes into play. It will normalize this distribution so that the values sum to 1
# and form a proper probability distribution.

# In the Monty Hall problem, we could have calculated the denominator (the marginal 
# probability that Monty opens door B) as:
# P(opened = B) = P(opened=B|win=A)P(win=A) + P(opened=B|win=B)P(win=B) + P(opened=B|win=C)P(win=C)

# In order to find the marginal probability we sum the probabilities across all our hypotheses.
# This is from the law of total probability.

We can write out the marginal probability of observing train 60 with the law of total probability formula.

### $$ P(\text{observed} = 60) = \sum_{i=1}^{100} P(\text{observed} = 60 \cap \text{total trains} = i) $$

Which can be re-written as:

### $$ P(\text{observed} = 60) = \sum_{i=1}^{100} P(\text{observed} = 60 \;|\; \text{total trains} = i)P(\text{total_trains} = i) $$

In [None]:
# So the denominator here, our marginal probability of observing train 60, is the sum
# of our likelihoods times priors for all hypotheses about the train.

# If we think about this from a purely numerical standpoint - this has to be the case because
# in order for all of our likelihood*prior values to sum to 1. and form a proper probability 
# distribution, we will need to divide them by their sum.

posteriors = 

### 2. Dungeons & Dragons Dice Problem #1

There are five dice: a 4-sided die, 6-sided die, 8-sided die, 12-sided die, 20-sided die. You roll a 6. The problem is to predict which die was thrown.

Apply Bayesian analysis to this problem by articulating the hypothesis/hypotheses, the data, and the likelihood. Identify which die you believe to be the thrown die and how likely this is to be the thrown die.

In [None]:
# A:

### 3. Dungeons & Dragons Dice Problem #2

There are five dice: a 4-sided die, 6-sided die, 8-sided die, 12-sided die, 20-sided die. You roll the same die and get a 6, 4, 8, 7, 5, 7. The problem is to predict which die was thrown.

Apply Bayesian analysis to this problem by articulating the hypothesis/hypotheses, the data, and the likelihood. Identify which die you believe to be the thrown die and how likely this is to be the thrown die.

In [None]:
# A:

### 4. M&M Problem

You have two bags of M&Ms. The first bag, created before 1995, has the following color distribution: 30% brown, 20% yellow, 20% red, 10% orange, 10% green, 10% tan. The second bag, created after 1995, has the following color distribution: 24% blue, 20% green, 16% orange, 14% yellow, 12% red, 12% brown.

From one bag, you pull a yellow M&M. The problem is to predict from which bag you pulled the yellow M&M.

Apply Bayesian analysis to this problem by articulating the hypothesis/hypotheses, the data, the likelihood.

Consider the yellow M&M already pulled (so this is part of your data). From the other bag, you pull a green M&M. Update your posterior appropriately and update your answer to the problem.

In [None]:
# A: