# Distributions

We will need a utility library, `utils` from Think Bayes.

The ".py" file should be in the same directory as your Jupyter Notebook

In the previous chapter we used Bayes's Rule to solve a cookie problem; then we solved it again using a Bayes table.
In this chapter we will solve it one more time using a `Pmf` object, which represents a "probability mass function".
We will talk about what that means, and why it is useful for Bayesian statistics.

We'll use `Pmf` objects to solve some more challenging problems and take one more step toward Bayesian statistics.
But we'll start with distributions.

## Distributions

In statistics a **distribution** is a set of possible outcomes and their corresponding probabilities, i.e., a combination of possibilities and probabilities.
For example, if you toss a coin, there are two possible outcomes with
approximately equal probability.
If you roll a six-sided die, the set of possible outcomes is the numbers 1 to 6, and the probability associated with each outcome is 1/6.

To represent distributions, we'll use a library called `empiricaldist`.
An "empirical" distribution is based on data, as opposed to a
theoretical distribution.
We'll use this library throughout our discussion.  I'll introduce the basic features in this notebookand we'll see additional features later.

## Probability Mass Functions

If the outcomes in a distribution are discrete, we can describe the distribution with a **probability mass function**, or PMF, which is a function that maps from each possible outcome to its probability.

`empiricaldist` provides a class called `Pmf` that represents a
probability mass function.
To use `Pmf` you can import it like this:

The following example makes a `Pmf` that represents the outcome of a
coin toss.

`Pmf` creates an empty `Pmf` with no outcomes.
Then we can add new outcomes using the bracket operator.
In this example, the two outcomes are represented with strings, and they have the same probability, 0.5.

You can also make a `Pmf` from a sequence of possible outcomes.

The following example uses `Pmf.from_seq` to make a `Pmf` that represents a six-sided die.

In this example, all outcomes in the sequence appear once, so they all have the same probability, $1/6$.

More generally, outcomes can appear more than once, as in the following example:

- The letter `M` appears once out of 11 characters, so its probability is $1/11$.

- The letter `i` appears 4 times, so its probability is $4/11$.

- Since the letters in a string are not outcomes of a random process, we'll use the more general term "quantities" for the letters in the `Pmf`.

- The `Pmf` class inherits from a Pandas `Series`, so anything you can do with a `Series`, you can also do with a `Pmf`.

- For example, you can use the bracket operator to look up a quantity and get the corresponding probability.

- In the word "Mississippi", about 36% of the letters are "s".

- However, if you ask for the probability of a quantity that's not in the distribution, you get a `KeyError`.



You can also call a `Pmf` as if it were a function, with a letter in parentheses.

- If the quantity is in the distribution the results are the same. 

- But if it is not in the distribution, the result is `0`, not an error.

With parentheses, you can also provide a sequence of quantities and get a sequence of probabilities.

- The quantities in a `Pmf` can be strings, numbers, or any other type that can be stored in the index of a Pandas `Series`.

- If you are familiar with Pandas, that will help you work with `Pmf` objects. 

- But we will cover what you need to know as we go along.

## The Cookie Problem Revisited

In this section we'll use a `Pmf` to solve the cookie problem from <<_TheCookieProblem>>.
Here's the statement of the problem again:

> Suppose there are two bowls of cookies.
>
> * Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies. 
>
> * Bowl 2 contains 20 vanilla cookies and 20 chocolate cookies.
>
> Now suppose you choose one of the bowls at random and, without looking, choose a cookie at random. If the cookie is vanilla, what is the probability that it came from Bowl 1?

Here's a `Pmf` that represents the two hypotheses and their prior probabilities:

- This distribution, which contains the prior probability for each hypothesis, is the **prior distribution**.

- To update the distribution based on new data (the vanilla cookie), we multiply the priors by the likelihoods. 

- The likelihood of drawing a vanilla cookie from Bowl 1 is $P(V | B_1) = 3/4$. The likelihood for Bowl 2 is $P(V | B_2) = 1/2$.

- The result is the unnormalized posteriors; that is, they don't add up to 1.

- To make them add up to 1, we can use `normalize`, which is a method provided by `Pmf`.

- The return value from `normalize` is the total probability of the data, which is $5/8$.

- `posterior`, which contains the posterior probability for each hypothesis, is the **posterior distribution**.

- $P(B_1 | V) = 0.6$

- $P(B_2 | V) = 0.4$

- From the posterior distribution we can select the posterior probability for Bowl 1:

- And the answer is 0.6.


- One benefit of using `Pmf` objects is that it is easy to do successive updates with more data.

- For example, suppose you put the first cookie back (so the contents of the bowls don't change) and draw again from the same bowl.

- If the second cookie is also vanilla, we can do a second update like this 
(`posterior *= likelihood_vanilla = posterior * likelihood_vanilla)`:

## Why Does the Posterior Change After the Second Draw?

### **Key Question:**  
If the cookie is replaced and the bowl contents don’t change, why does the posterior change after the second draw?

### **The Core Idea: Bayesian Updating & Sequential Learning**  
Even though the contents of the bowl remain unchanged, **our belief about which bowl we are drawing from changes** after each observation. This is the essence of **Bayesian updating**.

## **Breaking it Down Step by Step**
1. **Before drawing any cookies**, we start with a **prior**:  
   $$ P(B_1) = 0.5, \quad P(B_2) = 0.5 $$  

2. **After the first draw (vanilla cookie)**:
   - We update our beliefs using Bayes’ Rule.
   - The posterior probability changes to:
     $$ P(B_1 | V) = 0.6, \quad P(B_2 | V) = 0.4 $$
   - This updated **posterior** is now our new belief about which bowl we are drawing from.

3. **Before the second draw**:
   - We do not reset our belief back to the original **prior**. Instead, we treat the updated **posterior** from step 2 as our new prior.
   - This reflects that we now have **one piece of evidence** that influences our expectations for future draws.

4. **Second draw (also vanilla)**:
   - Since our belief has already been adjusted by the first observation, the second observation **further refines** our belief.
   - We apply Bayesian updating **again**, using the previous posterior as the new prior.

## **Key Takeaway**
- Even though the bowl contents stay the same, our belief about which bowl we are drawing from **does not reset** after each draw.
- Each draw provides additional **evidence**, leading to a **progressively stronger belief** that we are drawing from Bowl 1.
- This is why the posterior after the second draw differs from the first—because we are accumulating evidence, rather than resetting to the original prior.


Now the posterior probability for Bowl 1 is almost 70%.
But suppose we do the same thing again and get a chocolate cookie.

Here are the likelihoods for the new data:

And here's the update.

Now the posterior probability for Bowl 1 is about 53%.
After two vanilla cookies and one chocolate, the posterior probabilities are close to 50/50.

## 101 Bowls

Next let's solve a cookie problem with 101 bowls:

* Bowl 0 contains 0% vanilla cookies,

* Bowl 1 contains 1% vanilla cookies,

* Bowl 2 contains 2% vanilla cookies,

and so on, up to

* Bowl 99 contains 99% vanilla cookies, and

* Bowl 100 contains all vanilla cookies.

As in the previous version, there are only two kinds of cookies, vanilla and chocolate.  So Bowl 0 is all chocolate cookies, Bowl 1 is 99% chocolate, and so on.

Suppose we choose a bowl at random, choose a cookie at random, and it turns out to be vanilla.  What is the probability that the cookie came from Bowl $x$, for each value of $x$?

To solve this problem, I'll use `np.arange` to make an array that represents 101 hypotheses, numbered from 0 to 100.

We can use this array to make the prior distribution:

- As this example shows, we can initialize a `Pmf` with two parameters.
The first parameter is the prior probability; the second parameter is a sequence of quantities.

- In this example, the probabilities are all the same, so we only have to provide one of them; it gets "broadcast" across the hypotheses.

- Since all hypotheses have the same prior probability, this distribution is **uniform**.

- Here are the first few hypotheses and their probabilities.

The likelihood of the data is the fraction of vanilla cookies in each bowl, which we can calculate using `hypos`:

Now we can compute the posterior distribution in the usual way:


The following figure shows the prior distribution and the posterior distribution after one vanilla cookie.

- The posterior probability of Bowl 0 is 0 because it contains no vanilla cookies.

- The posterior probability of Bowl 100 is the highest because it contains the most vanilla cookies.

- In between, the shape of the posterior distribution is a line because the likelihoods are proportional to the bowl numbers.

- Now suppose we put the cookie back, draw again from the same bowl, and get another vanilla cookie.

- Here's the update after the second cookie:

And here's what the posterior distribution looks like.

- After two vanilla cookies, the high-numbered bowls have the highest posterior probabilities because they contain the most vanilla cookies; the low-numbered bowls have the lowest probabilities.

- But suppose we draw again and get a chocolate cookie.

- Here's the update:

And here's the posterior distribution.

- Now Bowl 100 has been eliminated because it contains no chocolate cookies.

- But the high-numbered bowls are still more likely than the low-numbered bowls, because we have seen more vanilla cookies than chocolate.


- In fact, the peak of the posterior distribution is at Bowl 67, which corresponds to the fraction of vanilla cookies in the data we've observed, $2/3$.

- The quantity with the highest posterior probability is called the **MAP**, which stands for "maximum a posteriori probability", where "a posteriori" is unnecessary Latin for "posterior".

- To compute the MAP, we can use the `Series` method `idxmax`:

Or `Pmf` provides a more memorable name for the same thing:

- As you might suspect, this example isn't really about bowls; it's about estimating proportions.

- Imagine that you have one bowl of cookies.

- You don't know what fraction of cookies are vanilla, but you think it is equally likely to be any fraction from 0 to 1.

- If you draw three cookies and two are vanilla, what proportion of cookies in the bowl do you think are vanilla?

- The posterior distribution we just computed is the answer to that question.

- We'll come back to estimating proportions in the next chapter.

- But first let's use a `Pmf` to solve the dice problem.

## The Dice Problem

In the previous chapter we solved the dice problem using a Bayes table.
Here's the statement of the problem:

> Suppose you have a box with a 6-sided die, an 8-sided die, and a 12-sided die.
> You choose one of the dice at random, roll it, and report that the outcome is a 1.
> What is the probability that you chose the 6-sided die?

Let's solve it using a `Pmf`.
I'll use integers to represent the hypotheses:

We can make the prior distribution like this:


As in the previous example, the prior probability gets broadcast across the hypotheses.
The `Pmf` object has two attributes:

* `qs` contains the quantities in the distribution;

* `ps` contains the corresponding probabilities.

Now we're ready to do the update.
Here's the likelihood of the data for each hypothesis.

And here's the update.

The posterior probability for the 6-sided die is $4/9$.

Now suppose you roll the same die again and get a 7.
Here are the likelihoods:

- The likelihood for the 6-sided die is 0 because it is not possible to get a 7 on a 6-sided die.

- The other two likelihoods are the same as in the previous update.


- Here's the update:

After rolling a 1 and a 7, the posterior probability of the 8-sided die is about 69%.

## Updating Dice

The following function is a more general version of the update in the previous section:

- The first parameter is a `Pmf` that represents the possible dice and their probabilities.

- The second parameter is the outcome of rolling a die.


- The first line selects quantities from the `Pmf` which represent the hypotheses.

- Since the hypotheses are integers, we can use them to compute the likelihoods.

- In general, if there are `n` sides on the die, the probability of any possible outcome is `1/n`.


- However, we have to check for impossible outcomes!

- If the outcome exceeds the hypothetical number of sides on the die, the probability of that outcome is 0.


- `impossible` is a Boolean `Series` that is `True` for each impossible outcome.
- We use it as an index into `likelihood` to set the corresponding probabilities to 0.


- Finally, we multiply `pmf` by the likelihoods and normalize.


- Here's how we can use this function to compute the updates in the previous section.

- We start with a fresh copy of the prior distribution:


And use `update_dice` to do the updates.

The result is the same.

## Summary

- This notebook introduces the `empiricaldist` module, which provides `Pmf`, which we use to represent a set of hypotheses and their probabilities.


- `empiricaldist` is based on Pandas; the `Pmf` class inherits from the Pandas `Series` class and provides additional features specific to probability mass functions.

- We'll use `Pmf` and other classes from `empiricaldist` throughout the book because they simplify the code and make it more readable.

- But we could do the same things directly with Pandas.


- We use a `Pmf` to solve the cookie problem and the dice problem, which we saw in the previous chapter.

- With a `Pmf` it is easy to perform sequential updates with multiple pieces of data.


- We also solved a more general version of the cookie problem, with 101 bowls rather than two.

- Then we computed the MAP, which is the quantity with the highest posterior probability.

# The End