# The One Goal for Today

To understand Bayes' rule.

# Motivation

Today we are going to review terminology for talking about probabilities, and then talk about Bayes' rule. 

How is this relevant to data analysis, particularly supervised modeling? kNN is very flexible, but it is also **slow**. Every time we want to classify a new datapoint, we have to calculate the distance between it and all the data points in the training data and then sort those distances. 

We will use Bayes' rule to develop a second, faster approach to supervised modeling for data with a qualitative dependent variable.

# Data samples

Now, in data analysis we very rarely have *all the data*. In most cases, we have a *sample* of the data that we assume / hope / know to be big enough to generalize over. 

If dataset A (or B) above were *all the data* we could calculate actual probabilities; for example, the probability that A[0, 0] = 4. 

Even if A (or B) is a *sample* of the data, we still calculate relative frequencies, and if the sample is large enough they will approximate probabilities. 

# Review of probability

## Probability of independent events

Let's imagine we have two dice, die $A$ and die $B$. 
* *What is the probability of rolling a 4 on die $A$, $P(A=4)$*?
* *What is the probability of rolling a 4 on die $B$, $P(B=3)$*?
* *What is the probability of rolling a 4 on die $A$ _and_ a 4 on die *B*, $P(A=4, B=3)$?

The probability of rolling a 4 on die $A$ is *independent* of the probability of rolling a 4 on die *B*. 

So $P(A=4,B=3) = P(A=4)*P(B=3)$.

# Conditional probabilities

Sometimes I carry an umbrella to work. Let's imagine that I look out my window each morning and if I see rain, I am more likely to carry my umbrella to work. Here's two work weeks of my life, with the weather each day (R = rain, S = sun) and whether I carried an umbrella (U = carried an umbrella):
| M | T | W | R | F | M | T | W | R | F |
| - | - | - | - | - | - | - | - | - | - |
| R | S | R | R | S | S | R | R | R | S |
| U |   | U |   | U |   | U | U |   |   |

The probability of me carrying an umbrella to work is *not* independent of the probability that the weather is rainy:
* What's the probability that the weather is rainy, $P(W=R)$? 
* What's the probability that I carry my umbrella to work, $P(C=U)$? 
* Is the probability that I carry my umbrella to work _and_ the weather is rainy, $P(W=R, C=U)$, equal to $P(W=R)*P(C=U)$? 

# Bayes' rule

Let's look deeper at $P(W=R, C=U)$. Let's calculate it two ways:

Way 1:
* What's the probability that the weather is rainy, $P(W=R)$? 
* What's the probability that I carry my umbrella to work _given that_ the weather is rainy, $P(C=U|W=R)$? 
* What's the probability of $P(W=R, C=U)$? 

Way 2:
* What's the probability that I carry my umbrella to work, $P(C=U)$? 
* What's the probability that the weather is rainy _given that_ I carry my umbrella to work, $P(W=R|C=U)$? 
* What's the probability of $P(W=R, C=U)$? 

That's interesting! So $P(W=R)*P(C=U|W=R) = P(C=U)*P(W=R|C=U)$. Let's rearrange: $P(C=U|W=R) = \frac{P(C=U)*P(W=R|C=U)}{P(W=R)}$. 

This is a *universal rule* called Bayes' rule. Bayes' rule tells us how to update the probability of an event using *prior* knowledge about things (evidence) that might influence that event. In fact, there is a whole branch of statistics (Bayesian statistics) based on Bayes' rule. 

Let's restate Bayes' rule a little; instead of $P(C=U|W=R)$ we can just say $P(U|R)$. So $P(U|R) = \frac{P(U)*P(R|U)}{P(R)}$. $P(U)$ is the *prior* on $U$, $P(R|U)$ is the *likelihood* (of $R$ given $U$), $P(R)$ is the *evidence* or *normalization*, and $P(U|R)$ is the *posterior*.

## Bayes' rule example

Now let's do an example using these tasty peanut M&Ms I have here. In my cup, there are three M&Ms. They might all be yellow, they might all be blue, or some of them might be yellow and some blue. In other words, here are the possibilities:
* 3 blues, 0 yellows (3b0y)
* 2 blues, 1 yellow (2b1y)
* 1 blue, 2 yellows (1b2y)
* 0 blues, 3 yellows (0b3y)

Without any *evidence*, all four of those possibilities are equally likely. So let's accumulate some *evidence*. I'm going to:
1. Take one M&M from the cup (without looking into the cup!).
2. Write down the color of the M&M.
3. Put the M&M back (do **NOT** eat the M&M!!).

I draw *one* M&M from the cup, without looking into the cup. Then, *given* that one M&M, let's see if we can estimate the probability that there are 2 blues and 1 yellow:

**2b1y**
* $P(2b1y | 1?) = P(2b1y)*P(1?|2b1y) / P(1?)$. 
* $P(2b1y) = 1/4$.  
* $P(1?|2b1y) = ??$.
* $P(1?) = P(1?|3b0y)*P(3b0y) + P(1?|2b1y)*P(2b1y) + P(1?|1b2y)*P(1b2y) + P(1?|0b3y)*P(0b3y) = ??$.

$P(2b1y | ?b) = ??$.

Let's repeat for the other three possible outcomes:

__1b2y__

* $P(1b2y | 1?) = P(1b2y)*P(1?|1b2y) / P(1?)$. 
* $P(1b2y) = 1/4$.  
* $P(1?|1b2y) = ??$.
* $P(1?) = P(1?|3b0y)*P(3b0y) + P(1?|2b1y)*P(2b1y) + P(1?|1b2y)*P(1b2y) + P(1?|0b3y)*P(0b3y) = ??$.

So $P(1b2y | 1?) = ??$.

__3b__
* $P(3b0y | 1?) = P(3b0y)*P(1?|3b0y) / P(1?)$. 
* $P(3b0y) = 1/4$.  
* $P(1?|3b0y) = ??$.
* $P(1?) = P(1?|3b0y)*P(3b0y) + P(1?|2b1y)*P(2b1y) + P(1?|1b2y)*P(1b2y) + P(1?|0b3y)*P(0b3y) = ??$.

So $P(3b0y | 1?) = ??$.

__3y__
* $P(0b3y | 1?) = P(0b3y)*P(1?|0b3y) / P(1?)$. 
* $P(0b3y) = 1/4$.  
* $P(1?|0b3y) = ??$.
* $P(1?) = P(1?|3b0y)*P(3b0y) + P(1?|2b1y)*P(2b1y) + P(1?|1b2y)*P(1b2y) + P(1?|0b3y)*P(0b3y) = ??$.

So $P(0b3y | 1?) = ??$.

Sanity check: does the sum of all four probabilities equal 1?

## Exercise 

Now you do one! Each of you will have a cup with *4* M&Ms. 

* What are the possible outcomes for color combinations in your cup?
* Draw one M&M. Given this M&M, what is the probability of each combination of color combination in your cup?

Worked out below:

* $P(1x) = P(1x|4x0y)*P(4x0y) + P(1x|3x1y)*P(3x1y) + P(1x|2x2y)*P(2x2y) + P(1x|1x3y)*P(1x3y) + P(1x|0x4y)*P(0x4y) = 1*1/5 + 3/4*1/5 + 1/2*1/5 + 1/4*1/5 + 0 = 1/5 + 3/20 + 1/10 + 1/20 = 1/2$.

* $P(4x0y | 1x) = (P(4x0y)*P(1x|4x0y)) / P(1x)$. 
* $P(4x0y) = 1/5$ (possibilities: 4x0y, 3x1y, 2x2y, 1x3y, 0x4y)  
* $P(1x|4x0y) = 1$.

So $P(4x0y | 1x) = 1/5*1 / 1/2 = 2/5$.

* $P(3x1y | 1x) = (P(3x1y)*P(1x|3x1y)) / P(1x)$. 
* $P(3x1y) = 1/5$ (possibilities: 4x0y, 3x1y, 2x2y, 1x3y, 0x4y)  
* $P(1x|3x1y) = 3/4$.

So $P(3x1y | 1x) = 1/5*3/4 / 1/2 = 3/10$.

* $P(2x2y | 1x) = (P(2x2y)*P(1x|2x2y)) / P(1x)$. 
* $P(2x2y) = 1/5$ (possibilities: 4x0y, 3x1y, 2x2y, 1x3y, 0x4y)  
* $P(1x|2x2y) = 1/2$.

So $P(2x2y | 1x) = 1/5*1/2 / 1/2 = 1/5$.

* $P(1x3y | 1x) = (P(1x3y)*P(1x|4x0y)) / P(1x)$. 
* $P(1x3y) = 1/5$ (possibilities: 4x0y, 3x1y, 2x2y, 1x3y, 0x4y)  
* $P(1x|1x3y) = 1/4$.

So $P(1x3y | 1x) = 1/5*1/4 / 1/2 = 1/10$.

* $P(0x4y | 1x) = 0$. 

Sanity check: 1/10 + 1/5 + 3/10 + 2/5 = 10/10!