# Think Bayes

Second Edition

Copyright 2020 Allen B. Downey

License: [Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)

In [1]:
import numpy as np
import pandas as pd

## Introduction

The fundamental idea behind all Bayesian statistics is Bayes's theorem,
which is surprisingly easy to derive, provided that you understand
conditional probability. So we'll start with probability, then
conditional probability, then Bayes's theorem, and on to Bayesian
statistics.

## Probability

A probability is a number between 0 and 1 (including both) that
represents a degree of belief in a fact or prediction. The value 1
represents certainty that a fact is true, or that a prediction will come
true. The value 0 represents certainty that the fact is false.

Intermediate values represent degrees of certainty. The value 0.5, often
written as 50%, means that a predicted outcome is as likely to happen as
not. For example, the probability that a tossed coin lands with the "heads" side up is
close to 50%.
The probability that a six-sided die comes up 3 is close to 1/6 or 16.7%.

## Conditional probability

A conditional probability is a probability based on some relevant
information. For example, suppose I toss two coins. The probability that
both coins land heads is 25%.

But suppose I toss two coins and, without showing you the result, tell
you that at least one of the coins is heads. What is the probability
that both are heads? The answer is 1/3.

Here's how I got that: when I toss the coins, there are four equally
likely outcomes: heads-heads, heads-tails, tails-heads, and tails-tails.
When I tell you that at least one coin is heads, that eliminates one
outcome, tails-tails.

The remaining outcomes are heads-heads, heads-tails, and tails-heads,
and they are still equally likely. So the probability of heads-heads is
1/3.

That argument is correct, but if you don't find it entirely convincing,
we'll come back to this problem and solve it more carefully using
Bayes's Theorem.

In this example, we computed the conditional probability of two heads,
given the information that at least one coin is heads.

The usual notation for conditional probability is $P(A|B)$, which is
the probability of $A$ given that $B$ is true. In this example, $A$
represents the two heads outcome, and $B$ is the condition that at least one
coin is heads.

## Conjoint probability

**Conjoint probability** is a fancy way to say the probability that two
things are true. I'll use the notation $P(A~\mathrm{AND}~B)$ to mean the
probability that $A$ and $B$ are both true.

If you learned about probability in the context of coin tosses and dice,
you might have learned the following formula:

$P(A~\mathrm{AND}~B) = P(A)~P(B)$

For example, if I toss two coins, and $A$ means the first coin lands
face up, and $B$ means the second coin lands face up, then $P(A) =
P(B) = 0.5$, and sure enough, 

$P(A~\mathrm{AND}~B) = P(A)~P(B) = 0.25$.

But this formula only works because in this case $A$ and $B$ are
independent; that is, knowing the first outcome does not change the
probability of the second. Or, more formally $P(A|B) = P(A)$.

Here is a different example where the outcomes are not independent.
Suppose that $A$ means that it rains today and $B$ means that it rains
tomorrow. If I know that it rained today, it is more likely that it will
rain tomorrow, so $P(B|A) > P(B)$.

In general, the probability of a conjunction is

$P(A~\mathrm{AND}~B) = P(A)~P(B|A)$ 

for any $A$ and $B$. 
So if the chance of rain on any given day is 0.5, the chance of rain on two consecutive
days is not 0.25, but probably a bit higher.

## The cookie problem

We'll get to Bayes's theorem soon, but I want to motivate it with an
example called the cookie problem, which is based on [an example from Wikipedia that is no longer there](http://en.wikipedia.org/wiki/Bayes'_theorem).

> Suppose there are two bowls of cookies. Bowl 1 contains 30 vanilla
> cookies and 10 chocolate cookies. Bowl 2 contains 20 of each.
>
> Now suppose you choose one of the bowls at random and, without
> looking, select a cookie at random. The cookie is vanilla. What is the
> probability that it came from Bowl 1?

This is a conditional probability; we want $P(\mathrm{Bowl 1}~|~\mathrm{vanilla})$, but it is not obvious how to compute it. 
If I asked a
different question---the probability of a vanilla cookie given Bowl
1---it would be easy: 

$P(\mathrm{vanilla}~|~\mathrm{Bowl 1}) = 3/4$.

Sadly, $P(A|B)$ is *not* the same as $P(B|A)$, but there is a way to get from
one to the other: Bayes's theorem.

## Bayes's theorem


Here's how we derive Bayes's theorem. We'll start with the probability
of a conjunction: 

$P(A~\mathrm{AND}~B) = P(A)~P(B|A)$ 

Since we have not
said anything about what $A$ and $B$ mean, they are interchangeable.
Interchanging them yields 

$P(B~\mathrm{AND}~A) = P(B)~P(A|B)$ 

Also,
conjunction is commutative; that is 

$P(A~\mathrm{AND}~B) = P(B~\mathrm{AND}~A)$

That's all we need. Pulling those pieces together, we get

$P(B)~P(A|B) = P(A)~P(B|A)$ 

Which means there are two ways to
compute the conjunction. If you have $P(A)$, you multiply by the
conditional probability $P(B|A)$. Or you can do it the other way
around; if you know , you multiply by $P(A|B)$.

Finally we divide through by $P(B)$:

$P(A|B) = \frac{P(A)~P(B|A)}{P(B))$ 

And that's Bayes's theorem! It
might not look like much, but it turns out to be surprisingly powerful.

For example, we can use it to solve the cookie problem. I'll write $B_1$
for the hypothesis that the cookie came from Bowl 1 and $V$ for the
vanilla cookie. Plugging in Bayes's theorem we get

$P(B_1|V) = \frac{P(B_1)~P(V|B_1)}{P(V)})$ 

The term on the left is
what we want: the probability of Bowl 1, given that we chose a vanilla
cookie. The terms on the right are:

-   $P(B_1)$: This is the probability that we chose Bowl 1,
    unconditioned by what kind of cookie we got. Since the problem says
    we chose a bowl at random, we can assume $P(B_1) = 1/2$.

-   $P(V|B_1)$: This is the probability of getting a vanilla cookie
    from Bowl 1, which is 3/4.

-   $P(V)$: This is the probability of drawing a vanilla cookie from
    either bowl. Since we had an equal chance of choosing either bowl
    and the bowls contain the same number of cookies, we had the same
    chance of choosing any cookie. Between the two bowls there are 50
    vanilla and 30 chocolate cookies, so $P(V) = 5/8$.

Putting it together, we have $P(B_1|V) = \frac{(1/2)~(3/4)}{5/8}$
which reduces to 3/5. So the vanilla cookie is evidence in favor of the
hypothesis that we chose Bowl 1, because vanilla cookies are more likely
to come from Bowl 1.

This example demonstrates one use of Bayes's theorem: it provides a
strategy to get from $P(B|A)$ to $P(A|B)$. 
This strategy is useful in cases, like the
cookie problem, where it is easier to compute the terms on the right
side of Bayes's theorem than the term on the left.

## The diachronic interpretation

There is another way to think of Bayes's theorem: it gives us a way to
update the probability of a hypothesis, $H$, in light of some body of
data, $D$.

This way of thinking about Bayes's theorem is called the **diachronic
interpretation**. "Diachronic" means that something is happening over
time; in this case, the probability of the hypotheses changes over time
as we see new data.

Rewriting Bayes's theorem with $H$ and $D$ yields:

$P(H|D) = \frac{P(H)~P(D|H)}{P(D)}$ 

In this interpretation, each term has a name:

-  $P(H)$ is the probability of the hypothesis before we see the data, called
    the prior probability, or just **prior**.

-  $P(H|D)$ is what we want to compute, the probability of the hypothesis after
    we see the data, called the **posterior**.

-  $P(D|H)$ is the probability of the data under the hypothesis, called the
    **likelihood**.

-  $P(D)$ is the **total probability of the data**, under any hypothesis.

Sometimes we can compute the prior based on background information. For
example, the cookie problem specifies that we choose a bowl at random
with equal probability.

In other cases the prior is subjective; that is, reasonable people might
disagree, either because they use different background information or
because they interpret the same information differently.

The likelihood is usually the easiest part to compute. In the cookie
problem, if we know which bowl the cookie came from, we find the
probability of a vanilla cookie by counting.

Computing the total probability of the data can be tricky. It is
supposed to be the probability of seeing the data under any hypothesis
at all, but in the most general case it is hard to nail down what that
means.

Most often we simplify things by specifying a set of hypotheses that
are:

* Mutually exclusive: At most one hypothesis in the set can be true, and

* Collectively exhaustive: There are no other possibilities; at least one of the hypotheses has to be true.

In the cookie problem, there are only two hypotheses---the cookie came
from Bowl 1 or Bowl 2---and they are mutually exclusive and collectively
exhaustive.

In that case we can compute using the law of total probability, which
says that if there are two exclusive ways that something might happen,
you can add up the probabilities like this:

$P(D) = P(B_1)~P(D|B_1) + P(B_2)~P(D|B_2)$ 

Plugging in the values
from the cookie problem, we have

$P(D) = (1/2)~(3/4) + (1/2)~(1/2) = 5/8$ 

which is what we computed
earlier by mentally combining the two bowls.

## Bayes Tables

In the cookie problem we can compute the probability of the data
directly, but that's not always the case. In fact, computing the total
probability of the data is often the hardest part of the problem.

Fortunately, there is another way to solve problems like this that makes
it easier: the Bayes table.

You can write a Bayes table on paper or use a spreadsheet, but for this
example I'll use a Pandas DataFrame.

First I'll make empty DataFrame with one row for each hypothesis:

In [2]:
import pandas as pd

table = pd.DataFrame(index=['Bowl 1', 'Bowl 2'])

Now I'll add a column to represent the priors:

In [3]:
table['prior'] = 1/2, 1/2
table

Unnamed: 0,prior
Bowl 1,0.5
Bowl 2,0.5


And a column for the likelihoods:

In [4]:
table['likelihood'] = 3/4, 1/2
table

Unnamed: 0,prior,likelihood
Bowl 1,0.5,0.75
Bowl 2,0.5,0.5


Here we see a difference from the previous method: we compute likelihoods for both hypotheses, not just Bowl 1:

* The chance of getting a vanilla cookie from Bowl 1 is 3/4.

* The chance of getting a vanilla cookie from Bowl 2 is 1/2.

The following cells write the Bayes table to a file.

In [5]:
# Get utils.py

import os

if not os.path.exists('utils.py'):
    !wget https://github.com/AllenDowney/ThinkBayes2/raw/master/code/soln/utils.py
        
if not os.path.exists('tables'):
    !mkdir tables

In [6]:
from utils import write_table

write_table(table, 'table01-01')

The next step is similar to what we did with Bayes's Theorem; we multiply the priors by the likelihoods:

In [7]:
table['unnorm'] = table['prior'] * table['likelihood']
table

Unnamed: 0,prior,likelihood,unnorm
Bowl 1,0.5,0.75,0.375
Bowl 2,0.5,0.5,0.25


I called the result `unnorm` because it is an "unnormalized posterior".  To see what that means, let's compare the right-hand side of Bayes's Theorem:

$P(H) P(D|H)~/~P(D)$

To what we have computed so far:

$P(H) P(D|H)$

The difference is that we have not divided through by $P(D)$, the total probability of the data.  So let's do that.

There are two ways to compute $P(D)$:

1. Sometimes we can figure it out directly.

2. Otherwise, we can compute it by adding up the unnormalized posteriors.

Here's the total of the unnormalized posteriors:

In [8]:
prob_data = table['unnorm'].sum()
prob_data

0.625

Notice that we get 5/8, which is what we got by computing $P(D)$ directly.

Now we divide by $P(D)$ to get the posteriors:

In [9]:
table['posterior'] = table['unnorm'] / prob_data
table

Unnamed: 0,prior,likelihood,unnorm,posterior
Bowl 1,0.5,0.75,0.375,0.6
Bowl 2,0.5,0.5,0.25,0.4


The posterior probability for Bowl 1 is 0.6, which is what we got using Bayes's Theorem explicitly.

As a bonus, we also get the posterior probability of Bowl 2, which is 0.4.

The posterior probabilities add up to 1, which they should, because the hypotheses are "complementary"; that is, either one of them is true or the other, but not both.  So their probabilities have to add up to 1.

When we add up the unnormalized posteriors and divide through, we force the posteriors to add up to 1.  This process is called "normalization", which is why the total probability of the data is also called the "[normalizing constant](https://en.wikipedia.org/wiki/Normalizing_constant#Bayes'_theorem)"

In [10]:
write_table(table, 'table01-02')

## The Dice Problem 

A Bayes table can also solve problems with more than two hypotheses.  For example:

> Suppose I have a box with a 6-sided die, an 8-sided die, and a 12-sided
die. I choose one of the dice at random, roll it, and report that the
outcome is a 1. What is the probability that I chose the 6-sided die?

In this example, there are three hypotheses with equal prior
probabilities. The data is my report that the outcome is a 1. Under the
hypothesis that I chose the 6-sided die, the probability of the data is
1/6. If I chose the 8-sided die, the probability is 1/8, and if I chose
the 12-sided die, it's 1/12.

In [11]:
table2 = pd.DataFrame(index=[6, 8, 12])

I'll use fractions to represent the prior probabilities and the likelihoods.  That way they don't get rounded off to floating-point numbers.

In [12]:
from fractions import Fraction

table2['prior'] = Fraction(1, 3)
table2['likelihood'] = Fraction(1, 6), Fraction(1, 8), Fraction(1, 12)
table2

Unnamed: 0,prior,likelihood
6,1/3,1/6
8,1/3,1/8
12,1/3,1/12


Once you have priors and likelhoods, the remaining steps are always the same.

In [13]:
table2['unnorm'] = table2['prior'] * table2['likelihood']
prob_data2 = table2['unnorm'].sum()
table2['posterior'] = table2['unnorm'] / prob_data2
table2

Unnamed: 0,prior,likelihood,unnorm,posterior
6,1/3,1/6,1/18,4/9
8,1/3,1/8,1/24,1/3
12,1/3,1/12,1/36,2/9


The posterior probability of the 6-sided die is 4/9.

In [14]:
write_table(table2, 'table01-03')

## The Monty Hall problem


Monty Hall was the original host of the game show *Let's Make a Deal*.
The Monty Hall problem is based on one of the regular games on the show.
If you are a contestant, here's how the game works:

-   Monty shows you three closed doors numbered 1, 2, and 3. He tells
    you that there is a prize behind each door.

-   One prize is valuable (traditionally a car), the other two are less
    valuable (traditionally goats).

-   The object of the game is to guess which door has the car. If you
    guess right, you get to keep the car.

Suppose you pick Door 1. Before opening the door you chose, Monty opens
Door 3 and reveals a goat. Then Monty offers you the option to stick
with your original choice or switch to the remaining unopened door.

To maximize your chance of winning the car, should you stick with Door 1
or switch to Door 2?

To answer this question, we have to make some assumptions about the
behavior of the host:

1.  Monty always opens a door and offers you the option to switch.

2.  He never opens the door you picked or the door with the car.

3.  If you choose the door with the car, he chooses one of the other
    doors at random.

Under these assumptions, you are better off switching. If you stick, you
win $1/3$ of the time. If you switch, you win $2/3$ of the time.

If you have not encountered this problem before, you might find the
answer surprising. You would not be alone; many people have the strong
intuition that it doesn't matter if you stick or switch. There are two
doors left, they reason, so the chance that the car is behind Door A is
50%. But that is wrong.

To see why, it might help to use a Bayes table. We start with three
hypotheses: the car might be behind Door 1, 2, or 3. According to the
statement of the problem, the prior probability for each door is 1/3.

The data is that Monty opened Door 3 and revealed a goat. So let's
consider the probability of the data under each hypothesis:

-   If the car were behind Door 3, Monty would not have opened it, so
    the probability of the data under this hypothesis is 0.

-   If the car were behind Door 2, Monty would have to open Door 3, so
    the probability of the data under this hypothesis is 1.

-   If the car were behind Door 1, Monty would choose Door 2 or 3 at
    random; the probability he would open Door 3 is $1/2$.

Once we figure out prior probabilities and likelihoods, the Bayes table
does the rest. 

In [15]:
table3 = pd.DataFrame(index=['Door 1', 'Door 2', 'Door 3'])

And here are the priors and likelihoods.

In [16]:
table3['prior'] = Fraction(1, 3)
table3['likelihood'] = Fraction(1, 2), 1, 0
table3

Unnamed: 0,prior,likelihood
Door 1,1/3,1/2
Door 2,1/3,1
Door 3,1/3,0


The next step is always the same.

In [17]:
table3['unnorm'] = table3['prior'] * table3['likelihood']
prob_data3 = table3['unnorm'].sum()
table3['posterior'] = table3['unnorm'] / prob_data3
table3

Unnamed: 0,prior,likelihood,unnorm,posterior
Door 1,1/3,1/2,1/6,1/3
Door 2,1/3,1,1/3,2/3
Door 3,1/3,0,0,0


The posterior probability for Door 2 is 2/3, so you are better off switching.

After Monty opens Door 3, the posterior probability of Door 1 is $1/3$;
the posterior probability of Door 2 is $2/3$.

As this example shows, our intuition for probability is not always
reliable. Bayes's Theorem provides a divide-and-conquer strategy that
can help:

1.  First, write down the hypotheses and the data.

2.  Next, figure out the prior probabilities.

3.  Finally, compute the likelihood of the data under each hypothesis.

The Bayes table does the rest.

In [18]:
write_table(table3, 'table01-04')

## Summary

In this chapter...

In the next chapter

But first you might want to work on these exercises.

## Exercises

**Exercise:** Suppose you have two coins in a box.
One is a normal coin with heads on one side and tails on the other, and one is a trick coin with heads on both sides.  You choose a coin at random and see that one of the sides is heads.
What is the probability that you chose the trick coin?

In [19]:
# Solution

table = pd.DataFrame(index=['Normal', 'Trick'])
table['prior'] = 1/2
table['likelihood'] = 1/2, 1

table['unnorm'] = table['prior'] * table['likelihood']
prob_data4 = table['unnorm'].sum()

table['posterior'] = table['unnorm'] / prob_data4
table

Unnamed: 0,prior,likelihood,unnorm,posterior
Normal,0.5,0.5,0.25,0.333333
Trick,0.5,1.0,0.5,0.666667


**Exercise:** Suppose you meet someone and learn that they have two children.
You ask if either child is a girl and they say yes.
What is the probability that both children are girls?

Hint: Start with four equally likely hypotheses.

In [20]:
# Solution

table = pd.DataFrame(index=['GG', 'GB', 'BG', 'BB'])
table['prior'] = 1/4
table['likelihood'] = 1, 1, 1, 0

table['unnorm'] = table['prior'] * table['likelihood']
prob_data = table['unnorm'].sum()

table['posterior'] = table['unnorm'] / prob_data
table

Unnamed: 0,prior,likelihood,unnorm,posterior
GG,0.25,1,0.25,0.333333
GB,0.25,1,0.25,0.333333
BG,0.25,1,0.25,0.333333
BB,0.25,0,0.0,0.0


**Exercise:** There are many variations of the [Monty Hall problem](https://en.wikipedia.org/wiki/Monty_Hall_problem}).  
For example, suppose Monty always chooses Door 2 if he can and
only chooses Door 3 if he has to (because the car is behind Door 2).

If you choose Door 1 and Monty opens Door 2, what is the probability the car is behind Door 3?

If you choose Door 1 and Monty opens Door 3, what is the probability the car is behind Door 2?

In [21]:
# Solution

# If the car is behind Door 1, Monty would always open Door 2 
# If the car is behind Door 2, Monty would have opened Door 3
# If the car is behind Door 3, Monty would always open Door 2

table = pd.DataFrame(index=['Door 1', 'Door 2', 'Door 3'])
table['prior'] = 1/3
table['likelihood'] = 1, 0, 1

table['unnorm'] = table['prior'] * table['likelihood']
prob_data = table['unnorm'].sum()

table['posterior'] = table['unnorm'] / prob_data
table

Unnamed: 0,prior,likelihood,unnorm,posterior
Door 1,0.333333,1,0.333333,0.5
Door 2,0.333333,0,0.0,0.0
Door 3,0.333333,1,0.333333,0.5


In [22]:
# Solution

# If the car is behind Door 1, Monty would have opened Door 2
# If the car is behind Door 2, Monty would always Door 3
# If the car is behind Door 3, Monty would have opened Door 3

table = pd.DataFrame(index=['Door 1', 'Door 2', 'Door 3'])
table['prior'] = 1/3
table['likelihood'] = 0, 1, 0

table['unnorm'] = table['prior'] * table['likelihood']
prob_data = table['unnorm'].sum()

table['posterior'] = table['unnorm'] / prob_data
table

Unnamed: 0,prior,likelihood,unnorm,posterior
Door 1,0.333333,0,0.0,0.0
Door 2,0.333333,1,0.333333,1.0
Door 3,0.333333,0,0.0,0.0


**Exercise:** M&M's are small candy-coated chocolates that come in a variety of
colors.  Mars, Inc., which makes M&M's, changes the mixture of colors from time to time.
In 1995, they introduced blue M&M's.  

* In 1994, the color mix in a bag of plain M&M's was 30\% Brown, 20\% Yellow, 20\% Red, 10\% Green, 10\% Orange, 10\% Tan.  

* In 1996, it was 24\% Blue , 20\% Green, 16\% Orange, 14\% Yellow, 13\% Red, 13\% Brown.

Suppose a friend of mine has two bags of M&M's, and he tells me
that one is from 1994 and one from 1996.  He won't tell me which is
which, but he gives me one M&M from each bag.  One is yellow and
one is green.  What is the probability that the yellow one came
from the 1994 bag?

Hint: The trick to this question is to define the hypotheses and the data carefully.

In [23]:
# Solution

# Hypotheses:
# A: yellow from 94, green from 96
# B: yellow from 96, green from 94

table = pd.DataFrame(index=['A', 'B'])
table['prior'] = 1/2
table['likelihood'] = 0.2*0.2, 0.14*0.1

table['unnorm'] = table['prior'] * table['likelihood']
prob_data = table['unnorm'].sum()

table['posterior'] = table['unnorm'] / prob_data
table

Unnamed: 0,prior,likelihood,unnorm,posterior
A,0.5,0.04,0.02,0.740741
B,0.5,0.014,0.007,0.259259
