## The double dice problem

This notebook demonstrates a way of doing simple Bayesian updates using the table method, with a Pandas DataFrame as the table.

Copyright 2018 Allen Downey

MIT License: https://opensource.org/licenses/MIT


In [1]:
# Configure Jupyter so figures appear in the notebook
%matplotlib inline

# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

import numpy as np
import pandas as pd

### The BayesTable class

Here's the class that represents a Bayesian table.

In [2]:
class BayesTable(pd.DataFrame):
    def __init__(self, hypo, prior=1):
        columns = ['hypo', 'prior', 'likelihood', 'unnorm', 'posterior']
        super().__init__(columns=columns)
        self.hypo = hypo
        self.prior = prior
    
    def mult(self):
        self.unnorm = self.prior * self.likelihood
        
    def norm(self):
        nc = np.sum(self.unnorm)
        self.posterior = self.unnorm / nc
        return nc
    
    def update(self):
        self.mult()
        return self.norm()
    
    def reset(self):
        return BayesTable(self.hypo, self.posterior)

### The double dice problem

Suppose I have a box that contains one each of 4-sided, 6-sided, 8-sided, and 12-sided dice.  I choose a die at random, and roll it twice
without letting you see the die or the outcome.  I report that I got
the same outcome on both rolls.

1) What is the posterior probability that I rolled each of the dice?


2) If I roll the same die again, what is the probability that I get the same outcome a third time?

Here's a `BayesTable` that represents the four hypothetical dice.

In [3]:
table = BayesTable([4, 6, 8, 12])

Unnamed: 0,hypo,prior,likelihood,unnorm,posterior
0,4,1,,,
1,6,1,,,
2,8,1,,,
3,12,1,,,


Since we didn't specify prior probabilities, the default value is equal priors for all hypotheses.

Now we can specify the likelihoods: if a die has `n` sides, the chance of getting the same outcome twice is `1/n`.

So the likelihoods are:

In [4]:
table.likelihood = [1/4, 1/6, 1/8, 1/12]
table

Unnamed: 0,hypo,prior,likelihood,unnorm,posterior
0,4,1,0.25,,
1,6,1,0.166667,,
2,8,1,0.125,,
3,12,1,0.083333,,


Now we can use `update` to compute the posterior probabilities:

In [5]:
table.update()
table

Unnamed: 0,hypo,prior,likelihood,unnorm,posterior
0,4,1,0.25,0.25,0.4
1,6,1,0.166667,0.166667,0.266667
2,8,1,0.125,0.125,0.2
3,12,1,0.083333,0.083333,0.133333


The second part of the problem asks for the probability of getting the same outcome a third time, if we roll the same die again.

If the die has `n` sides, the probability of getting the same value again is `1/n`, which happens to be the same as the likelihood function.

To get the total probability of getting the same outcome, we have to add up the conditional probabilities:

```
P(n | data) * P(same outcome | n)
```

The first term is the posterior probability; the second term is `1/n`.

In [6]:
total = 0
for _, row in table.iterrows():
    total += row.posterior / row.hypo
    
total

0.18055555555555555

This calculation is similar to the first step of the update, so we can also compute it by

1) Creating a new table with the posteriors from `table`.

2) Adding the likelihood of getting the same outcome a third time.

3) Computing the normalizing constant.

In [7]:
table2 = table.reset()
table2.likelihood = [1/4, 1/6, 1/8, 1/12]
table2

Unnamed: 0,hypo,prior,likelihood,unnorm,posterior
0,4,0.4,0.25,,
1,6,0.266667,0.166667,,
2,8,0.2,0.125,,
3,12,0.133333,0.083333,,


In [8]:
table2.update()

0.18055555555555552

In [9]:
table2

Unnamed: 0,hypo,prior,likelihood,unnorm,posterior
0,4,0.4,0.25,0.1,0.553846
1,6,0.266667,0.166667,0.044444,0.246154
2,8,0.2,0.125,0.025,0.138462
3,12,0.133333,0.083333,0.011111,0.061538


This result is the same as the posterior after seeing the same outcome three times.

This example demonstrates a general truth: if you see an event and perform an update, the normalizing constant is the predictive prior probability of the event.