# Introduction to Bayesian thinking and modeling

# Bayes's Rule

We have already derived Bayes's Rule:

$$P(A|B) = \frac{P(A) P(B|A)}{P(B)}$$

As an example, we used an oil drilling example and Bayes's Rule  to compute conditional probability of finding oil if the seismic indicated that the prospect holds oil. We could also compute the probability of finding oil if the seismic indicated that the prospect **does not** hold oil as well as the probabilities of the prospect **not holding oil** if the seismic indicated "oil" or "dry" (not oil). 

In this chapter, we'll use it to solve several more challenging problems using conditional probability in the form of Bayes Rule.

## The Cookie Problem

<img src="../figs/cookie_problem.png" alt="Alternative Text" width="300"/>

We'll start with a simple case:

> Suppose there are two bowls of cookies.
>
> * Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies. 
>
> * Bowl 2 contains 20 vanilla cookies and 20 chocolate cookies.
>
> Now suppose you choose one of the bowls at random and, without looking, choose a cookie at random. If the cookie is vanilla, what is the probability that it came from Bowl 1?

What we want is the conditional probability that we chose from Bowl 1 given that we got a vanilla cookie, $P(B_1 | V)$.

But what we get from the statement of the problem is:

* The conditional probability of getting a vanilla cookie, given that we chose from Bowl 1, $P(V | B_1)$ and

* The conditional probability of getting a vanilla cookie, given that we chose from Bowl 2, $P(V | B_2)$.


Bayes's Rule tells us how they are related:

$$P(B_1|V) = \frac{P(B_1)~P(V|B_1)}{P(V)}$$

The term on the left is what we want. The terms on the right are:

-   $P(B_1)$, the probability that we chose Bowl 1,
    unconditioned by what kind of cookie we got. 
    Since the problem says we chose a bowl at random, 
    we set $P(B_1) = 1/2$.

-   $P(V|B_1)$, the probability of getting a vanilla cookie
    from Bowl 1, which is 3/4.

-   $P(V)$, the probability of drawing a vanilla cookie from
    either bowl. 

To compute $P(V)$, we can use the law of total probability:

$$P(V) = P(B_1)~P(V|B_1) ~+~ P(B_2)~P(V|B_2)$$

Plugging in the numbers from the statement of the problem, we have

$$P(V) = (1/2)~(3/4) ~+~ (1/2)~(1/2) = 5/8$$

We can also compute this result directly, like this: 

* Since we had an equal chance of choosing either bowl and the bowls contain the same number of cookies, we had the same chance of choosing any cookie. 

* Between the two bowls there are 50 vanilla and 30 chocolate cookies, so $P(V) = 5/8$.

Finally, we can apply Bayes's Rule to compute the posterior probability of Bowl 1:

$$P(B_1|V) = (1/2)~(3/4)~/~(5/8) = 3/5$$

This example demonstrates one use of Bayes's theorem: it provides a
way to get from $P(B|A)$ to $P(A|B)$. 
This strategy is useful in cases like this where it is easier to compute the terms on the right side than the term on the left.

## Diachronic Bayes

There is another way to think of Bayes's theorem: it gives us a way to
update the probability of a hypothesis, $H$, given some body of data, $D$.

This interpretation is "diachronic", which means "related to change over time"; in this case, the probability of the hypotheses changes as we see new data.

Rewriting Bayes's rule with $H$ and $D$ yields:

$$P(H|D) = \frac{P(H)~P(D|H)}{P(D)}$$

In this interpretation, each term has a name:

-  $P(H)$ is the probability of the hypothesis before we see the data, called the prior probability, or just **prior**.

-  $P(H|D)$ is the probability of the hypothesis after we see the data, called the **posterior**.

-  $P(D|H)$ is the probability of the data under the hypothesis, called the **likelihood**.

-  $P(D)$ is the **total probability of the data**, under any hypothesis.

Sometimes we can compute the prior based on background information. For example, the cookie problem specifies that we choose a bowl at random with equal probability.

In other cases the prior is subjective; that is, reasonable people might disagree, either because they use different background information or because they interpret the same information differently.

The likelihood is usually the easiest part to compute. In the cookie
problem, we are given the number of cookies in each bowl, so we can compute the probability of the data under each hypothesis.

Computing the total probability of the data can be tricky. 
It is the probability of seeing the data under any hypothesis at all. 
Most often we work through this by specifying a set of hypotheses that
are:

* Mutually exclusive, which means that only one of them can be true, and

* Collectively exhaustive, which means one of them must be true.

When these conditions apply, we can compute $P(D)$ using the law of total probability.  For example, with two hypotheses, $H_1$ and $H_2$:

$$P(D) = P(H_1)~P(D|H_1) + P(H_2)~P(D|H_2)$$

And more generally, with any number of hypotheses:

$$P(D) = \sum_i P(H_i)~P(D|H_i)$$

The process in this section, using data and a prior probability to compute a posterior probability, is called a **Bayesian update**.

## Bayes Tables

A convenient tool for doing a Bayesian update is a Bayes table.
You can write a Bayes table on paper or use a spreadsheet, but in this section I'll use a Pandas `DataFrame`.

First I'll make empty `DataFrame` with one row for each hypothesis:

In [2]:
import pandas as pd
# Define the hypotheses (choosing from Bowl 1 or Bowl 2)
hypotheses = ["Bowl 1", "Bowl 2"]
# Create an empty DataFrame with one row per hypothesis
bayes_table = pd.DataFrame(index=hypotheses)
# Add prior probabilities (P(H))
bayes_table["Prior"] = [1/2, 1/2]
bayes_table

Unnamed: 0,Prior
Bowl 1,0.5
Bowl 2,0.5


Now I'll add a column to represent the priors:

And a column for the likelihoods:

Here we see a difference from the previous method: we compute likelihoods for both hypotheses, not just Bowl 1:

* The chance of getting a vanilla cookie from Bowl 1 is 3/4.

* The chance of getting a vanilla cookie from Bowl 2 is 1/2.

As you see, the likelihoods don't add up to 1.  That's OK; each of them is a probability conditioned on a different hypothesis.
There's no reason they should add up to 1 and no problem if they don't.

The next step is similar to what we did with Bayes's Rule; we multiply the priors by the likelihoods:

I call the result `unnorm` because these values are the "unnormalized posteriors".  Each of them is the product of a prior and a likelihood:

$$P(H_i)~P(D|H_i)$$

which is the numerator of Bayes's Rule. 
If we add them up, we have

$$P(H_1)~P(D|H_1) + P(H_2)~P(D|H_2)$$

which is the denominator of Bayes's Rule, $P(D)$.

So we can compute the total probability of the data like this:

Notice that we get 5/8, which is what we got by computing $P(D)$ directly.

And we can compute the posterior probabilities like this:

The posterior probability for Bowl 1 is 0.6, which is what we got using Bayes's Rule explicitly.
As a bonus, we also get the posterior probability of Bowl 2, which is 0.4.

When we add up the unnormalized posteriors and divide through, we force the posteriors to add up to 1.  This process is called "normalization", which is why the total probability of the data is also called the "normalizing constant".

## The Newspaper Problem

Let's use a Bayes table to solve the newspaper problem we discussed in the PowerPoints.

It is Saturday morning at 08:00, and I must decide whether to walk down to the bottom of my driveway to get the newspaper.

>On the basis of past experience, I judge that there is an 80% chance that the paper has been delivered by now.
Looking out of the kitchen window, I can see exactly half of the bottom of the driveway, and the paper is not in the half that I see.
>
>If the paper has been delivered there’s an equal chance that it will fall in each half of the driveway.
>
>What is the probability that the paper has been delivered?

Let

- D = Delivered

- S = See the newspaper

Prior

Likelihood

Unnormalized posterior

Total probability (the denominator in Bayes) of not seeing the paper

Normalized posterior

## The Dice Problem

<img src="../figs/Dice-2.png" alt="Alternative Text" width="200"/>

A Bayes table can also solve problems with more than two hypotheses.  For example:

> Suppose I have a box with a 6-sided die, an 8-sided die, and a 12-sided die. I choose one of the dice at random, roll it, and report that the outcome is a 1. What is the probability that I chose the 6-sided die?

In this example, there are three hypotheses with equal prior
probabilities. The data is my report that the outcome is a 1. 

If I chose the 6-sided die, the probability of the data is
1/6. If I chose the 8-sided die, the probability is 1/8, and if I chose the 12-sided die, it's 1/12.

Here's a Bayes table that uses integers to represent the hypotheses:

In [None]:
table2 = pd.DataFrame(index=[6, 8, 12])

We'll use fractions to represent the prior probabilities and the likelihoods.  That way they don't get rounded off to floating-point numbers.

Once you have priors and likelhoods, the remaining steps are always the same, so let's put them in a function:

And call it like this.

Here is the final Bayes table:

The posterior probability of the 6-sided die is 4/9, which is a little more than the probabilities for the other dice, 3/9 and 2/9.
Intuitively, the 6-sided die is the most likely because it had the highest likelihood of producing the outcome we saw.

## The Monty Hall Problem

<img src="../figs/Monty_open_door.png" alt="Alternative Text" width="300"/>

Next we'll use a Bayes table to solve one of the most contentious problems in probability.

The Monty Hall problem is based on a game show called *Let's Make a Deal*. If you are a contestant on the show, here's how the game works:

* The host, Monty Hall, shows you three closed doors -- numbered 1, 2, and 3 -- and tells you that there is a prize behind each door.

* One prize is valuable (traditionally a car), the other two are less valuable (traditionally goats).

* The object of the game is to guess which door has the car. If you guess right, you get to keep the car.

Suppose you pick Door 1. Before opening the door you chose, Monty opens Door 3 and reveals a goat. Then Monty offers you the option to stick with your original choice or switch to the remaining unopened door.

To maximize your chance of winning the car, should you stick with Door 1 or switch to Door 2?

To answer this question, we have to make some assumptions about the behavior of the host:

1.  Monty always opens a door and offers you the option to switch.

2.  He never opens the door you picked or the door with the car.

3.  If you choose the door with the car, he chooses one of the other
    doors at random.

Under these assumptions, you are better off switching. 
If you stick, you win $1/3$ of the time. If you switch, you win $2/3$ of the time.

If you have not encountered this problem before, you might find that
answer surprising. You would not be alone; many people have the strong
intuition that it doesn't matter if you stick or switch. There are two
doors left, they reason, so the chance that the car is behind Door A is 50%. But that is wrong.

To see why, it can help to use a Bayes table. We start with three
hypotheses: the car might be behind Door 1, 2, or 3. According to the
statement of the problem, the prior probability for each door is 1/3.

The data is that Monty opened Door 3 and revealed a goat. So let's
consider the probability of the data under each hypothesis (remember that you have chosen Door 1, but not opened it, so Monty will not open Door 1):

* If the car is behind Door 1, Monty chooses Door 2 or 3 at random, so the probability he opens Door 3 is $1/2$.

* If the car is behind Door 2, Monty has to open Door 3, so the probability of the data under this hypothesis is 1.

* If the car is behind Door 3, Monty does not open it, so the probability of the data under this hypothesis is 0.

Here are the likelihoods. 

Now that we have priors and likelihoods, we can use `update` to compute the posterior probabilities.

After Monty opens Door 3, the posterior probability of Door 1 is $1/3$;
the posterior probability of Door 2 is $2/3$.
So you are better off switching from Door 1 to Door 2.

A little more discussion that, hopefully, provides more insight regarding the Monty Hall problem.

How **Monty’s Choice Gives Information**

- Monty always opens a door that is NOT the one you picked and NOT the car.

- If the car were behind Door 1, Monty could choose either Door 2 or Door 3 at random.

- If the car were behind Door 2, Monty is forced to open Door 3.

- If the car were behind Door 3, Monty is forced to open Door 2.

This means that some of Monty’s choices are more likely under certain conditions, which allows us to update our probabilities using Bayes’ Rule.

**Key Insight from the Information Update**

- **Obvious Information**: When Monty opens Door 3, we know for certain that the car is **not behind Door 3**.

- **Less Obvious Information**: Monty’s action is **not independent** of where the car is.

1. He was forced to open Door 3 if the car was behind Door 2.
2. If the car was behind Door 1, he had a choice between opening Door 2 or Door 3.
3. Because he opens Door 3, this slightly reduces the likelihood that the car was behind Door 1, and increases the likelihood that it is behind Door 2.

As this example shows, our intuition for probability is not always
reliable. 
Bayes's Rule can help by providing a divide-and-conquer strategy:

1.  First, write down the hypotheses and the data.

2.  Next, figure out the prior probabilities.

3.  Finally, compute the likelihood of the data under each hypothesis.

The Bayes table does the rest.

## Summary

We solved the Cookie Problem using Bayes's Rule explicitly and using a Bayes table.
There's no real difference between these methods, but the Bayes table can make it easier to compute the total probability of the data, especially for problems with more than two hypotheses.

Then we solved the Dice Problem, which we will see again later, and the Monty Hall problem, which you might hope you never see again 😊.

If the Monty Hall problem makes your head hurt, you are not alone.  But it demonstrates the power of Bayes's Rule as a divide-and-conquer strategy for solving tricky problems.  And I hope it provides some insight into *why* the answer is what it is.

When Monty opens a door, he provides information we can use to update our belief about the location of the car.  Part of the information is obvious.  If he opens Door 3, we know the car is not behind Door 3.  But part of the information is more subtle.  Opening Door 3 is more likely if the car is behind Door 2, and less likely if it is behind Door 1.  So the data is evidence in favor of Door 2.  We will come back to this notion of evidence in future chapters.

# The End