<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Into to Bayes

_Authors: Noelle Brown, Matt Brems_

---

# Part 1: Bayes Theorem

## Problem 1: The Monty Hall Problem

The "Monty Hall Problem” is a famous statistical problem based on the game show "Let's Make a Deal." (Monty Hall was the show’s original host.) 

If you haven't heard of this game show, no worries. We’ll break down the basics below.

"Let's Make a Deal" features three doors labeled "A," "B," and "C." As the contestant, you are told that, behind exactly one door, there’s a new car. Behind the other two doors are goats. Your goal is to select the door with the car behind it.

<img src="./images/goat.jpeg" style="height: 250px">


The game goes as follows:

1. You select a door.
2. The game show host, knowing which door hides the car, opens one of the doors you didn’t select to reveal a goat. (Important: If you selected a door with a goat, the host picks the other door with a goat. If you started by selecting the door with the car, the host picks from the remaining two doors at random.)
3. The host then asks you if you would like to stick with the door you originally picked or switch to the other remaining door.

### Strategy 1: Stick with our original door

We choose Door A, we are shown that there is a goat behind door C.

Reminder: Bayes' theorem

$$
P(A|B) = \frac{P(B|A)P(A)}{P(B)}
$$

Applied to our problem:

$$
P(\text{Car behind A}|\text{Shown C}) = \frac{P(\text{Shown C}|\text{Car behind A})P(\text{Car behind A})}{P(\text{Shown C})}
$$

First, let's knock out our priors.

In [None]:
p_car_A = 
p_car_B = 
p_car_C = 

Next, let's use the law of total probability to find $P(\text{Shown C})$.

Reminder: Law of Total Probability:

$$
P(B) = \sum_{i=1}^{n}P(B|A_{i})P(A_{i})
$$

Applied to our problem:

$$
P(\text{Shown C}) = P(\text{Shown C} | \text{Car behind A})*P(\text{Car behind A}) + P(\text{Shown C} | \text{Car behind B})*P(\text{Car behind B}) + P(\text{Shown C} | \text{Car behind C})*P(\text{Car behind C})
$$

In [None]:
p_c_given_a = 
p_c_given_b = 
p_c_given_c = 

In [None]:
p_shown_c =

Plugging all of this into Bayes' theorem tells us that the probability that there is a car behind the one we chose (again, we are sticking with our original choice of A here) given that there is a goat behind door C is:

### Strategy 2: Switch doors

We choose Door A, we are shown that there is a goat behind door C. Now we switch to door B.

Bayes' Theorem applied to our new strategy:

$$
P(\text{Car behind B}|\text{Shown C}) = \frac{P(\text{Shown C}|\text{Car behind B})P(\text{Car behind B})}{P(\text{Shown C})}
$$

We can use what we calculated above.

Plugging all of this into Bayes' theorem tells us that the probability that there is a car behind the door we switched to (door B) given that there is a goat behind door C is:

<details><summary>What should we do?</summary>

Switch! Switching gives us a $\frac{2}{3}$ chance of winning the car, while sticking with our original choice only leaves us with a $\frac{1}{3}$ chance of winning!
</details>

If this is still not intuitive, we can think about it in another way.

When we pick our original door, we have a $\frac{1}{3}$ chance of the car being behind the door that you picked and a $\frac{2}{3}$ chance of the car being behind the doors that you didn't pick.

![](./images/first-pick.png)

Once we are shown that there is a goat behind door C, swapping doubles our chances of winning! There is *still* a $\frac{2}{3}$ chance of the car being behind the doors that you didn't pick.

![](./images/switch.png)

*Images from* [*Mathigon*](https://mathigon.org/course/probability/monty-hall)

## Problem 2: Cookies!

We have two jars of cookies. In the first jar, we have 30 sugar cookies and 10 chocolate chip cookies. In the second jar, we have 20 sugar cookies and 20 chocolate chip cookies.

![](./images/cookies.png)

Without knowing which is which, we randomly select a jar and pull out a **sugar** cookie. What is the probability that the cookie came from jar 1?

[*Source*](http://www.greenteapress.com/thinkbayes/html/thinkbayes002.html)

Bayes' theorem applied to this problem:

$$
P(\text{Jar 1}|\text{Sugar Cookie}) = \frac{P(\text{Sugar Cookie}|\text{Jar 1})P(\text{Jar 1})}{P(\text{Sugar Cookie})}
$$

First, let's knock out our priors.

In [None]:
p_jar1 = 
p_jar2 =

Next, let's use the law of total probability to find $P(\text{Sugar Cookie})$.

$$
P(\text{Sugar Cookie}) = P(\text{Sugar Cookie} | \text{Jar 1})*P(\text{Jar 1}) + P(\text{Sugar Cookie} | \text{Jar 2})*P(\text{Jar 2})
$$

In [None]:
p_sugar_given_jar1 = 
p_sugar_given_jar2 = 

In [None]:
p_sugar = 

## Problem 3: Naïve Bayes

Remember when we used Naïve Bayes as a model during our NLP lessons?

Now we can see exactly how that works using Bayes' theorem!

Let's say we want to classify whether a message is spam or ham (not spam).

Assume this is our data of the words we see in our messages and their counts in 10 normal messages:

| hello | work | coffee | lunch | money |
| --- | --- | --- | --- | --- |
| 8 | 5 | 3 | 3 | 1 |

And this is our data of the words we see in our messages and their counts in 10 spam messages:

| hello | work | coffee | lunch | money |
| --- | --- | --- | --- | --- |
| 5 | 2 | 1 | 1 | 9 |

We get a new message with the word 'money' in the message. Let's classify it!

First, we will find the probability that a new message is spam given that it has the word 'money' in the message.

Reminder: Bayes' theorem

$$
\begin{eqnarray*}
\text{Bayes' Theorem: } P(A|B) &=& \frac{P(B|A)P(A)}{P(B)}
\end{eqnarray*}
$$

- Let $A$ be that a message is spam.
- Let $B$ represent that 'money' was used in the message.

$$
\begin{eqnarray*}
P(\text{message is spam}|\text{'money' in message}) &=& \frac{P(\text{'money' in message}|\text{message is spam})P(\text{message is spam})}{P(\text{'money' in message})}
\end{eqnarray*}
$$

In [None]:
p_spam =  
p_ham = 

$$
P(\text{'money' in message}) = P(\text{'money' in message}|\text{spam})*P(\text{spam}) + P(\text{'money' in message}|\text{ham})*P(\text{ham})
$$


In [None]:
p_money_given_spam = 
p_money_given_ham = 

In [None]:
p_money = 

Now, let's find the probability that a new message is ham given that it has the word 'money' in the message.

$$
\begin{eqnarray*}
P(\text{message is ham}|\text{'money' in message}) &=& \frac{P(\text{'money' in message}|\text{message is ham})P(\text{message is ham})}{P(\text{'money' in message})}
\end{eqnarray*}
$$

Great, so we would classify this message as spam! 

It is slightly more complex in an actual Naive Bayes model since we will incorporate all of the words in the message, but all this entails is multiplying the conditional probabilities of each word in the message together with our prior probability (the probability that a message is spam or ham). You can watch a video explanation of this [here](https://statquest.org/naive-bayes-clearly-explained/) (this video was also the inspiration behind this example).

Now, let's think about some extra work that we did to compare these two probabilities.

<details><summary>In both of these, we divided by the probability that 'money' was in the message. What would have happened if we didn't do that?</summary>

We would have still classified this message as spam! Since we divided by the same denominator in both cases, our answers without dividing by the probability that 'money' was in the message were proportional to our answers when we divided by it!
</details>

This sets up the idea behind Bayesian Inference.

---

To the slides!

---

# Part 2: Bayesian Inference

This afternoon you will see an example of how to use Bayesian inference to solve a problem. I also encourage you to check out [this blog post](https://towardsdatascience.com/bayesian-inference-intuition-and-example-148fd8fb95d6) which walks through another example of Bayesian inference in Python.

You can also play around with [this visualization](https://rpsychologist.com/d3/bayes/) to see how the posterior distribution changes with the prior and likelihood.