## Bayes' Rule reviewed 

Remember Monday, with the M&Ms? The complete worked additional example for cups with *4* M&Ms is now in the Day 25 notebook, so you can check your own math.

$P(Y|X) = \frac{P(Y)*P(X|Y)}{P(X)}$.

* $P(Y|X)$ is the *posterior*
* $P(Y)$ is the *prior*
* $P(X|Y)$ is the *likelihood*
* $P(X)$ is the *evidence* (or *normalization*)

## Fitting and predicting using Naive Bayes

Let's imagine I have a very very large bag of M&Ms. Some of them are peanut, some raisin and some regular. I only like peanut M&Ms. The thing is, I'm in a dark room (or it's Halloween night) so I can only guess the type of the M&M by feel. I repeatedly reach into the bag, grab one M&M and then either eat it or do not eat it. This gives the dataset below:

| Candy | Outcome |
| ----- | ---- |
| peanut M&M | ate it |
| peanut M&M | ate it |
| peanut M&M | ate it |
| raisin M&M | did not eat it |
| raisin M&M | ate it |
| regular M&M | did not eat it |
| peanut M&M | did not eat it |
| regular M&M | did not eat it |
| raisin M&M | did not eat it |
| raisin M&M | did not eat it |

Side note: we typically convert qualitative values to numbers for machine learning; it enables us to use all the power of numpy, at some cost in readability to humans. I'm not going to do this today, for readability to us, but on Friday we'll be back to 'numeric' representation of features.

We want to fit a model that will tell us the probability that we ate it (or did not eat it) given the type of candy it is. We can do that by calculating:
* The prior that we ate it (or did not eat it)
* The likelihood of the type of candy given that we ate it (or did not eat it) 


__Fit__

A table of counts might help. Fill this in!:

| Outcome | Peanut M&M | Regular M&M | Raisin M&M | Total |
| -------- | ------- | ---------- | ------------ | ------- |
| Ate it         | 3 | 0 | 1 | 4 |
| Did not eat it | 1 | 2 | 3 | 6 |
| Total |  4 | 2 | 4 | 10 |

* Calculate the prior for *ate it* and the prior for *did not eat it*: $P(ate it) = 4/10$; $P(did not eat it) = 6/10$
* Calculate the likelihood of *peanut M&M* given *ate it*, of *regular M&M* given *ate it*, of *raisin M&M* given *ate it* and of each type of M&M given *did not eat it*: $P(X|Y) \sim \frac{|X \& Y|}{|Y|}$ 

| Y | Peanut M&M | Regular M&M | Raisin M&M |
| -------- | ------- | ---------- | ------------ |
| Ate it         | 3/4 | 0 | 1/4 |
| Did not eat it | 1/6 | 2/6 | 3/6 |


Store both sets of values.

__Predict__

Given a new observation, *peanut M&M*, what is my most likely behavior?

* Compare $P(ate it|peanut M\&M)$ and $P(did not eat it|peanut M\&M)$. 



$P(ate it|peanut M\&M) = \frac{P(ate it)*P(peanut M\&M|ate it)}{P(peanut M\&M)}$

$P(did not eat it|peanut M\&M) = \frac{P(did not eat it)*P(peanut M\&M|did not eat it))}{P(peanut M\&M)}$

Since $P(peanut M\&M)$ is in the denominator in both cases, we can ignore it completely for the comparison. So we only need the prior and the likelihood, both of which we calculated during __fit__.
* $P(ate it|peanut M\&M) = 4/10*3/4 = 3/10$
* $P(did not eat it|peanut M\&M) = 6/10*1/6 = 1/10$

Which is higher?

The Naive Bayes "predict formula" is $argmax_Y P(Y)*P(X|Y)$.

### Score

Let's say I have the test data below:

| Candy | Outcome |
| ----- | ---- |
| peanut M&M | did not eat it |
| peanut M&M | ate it |
| raisin M&M | did not eat it |
| raisin M&M | ate it |
| regular M&M | did not eat it |

* Does this data include all my labels?
* Does this data include all possible values for my one independent variable?

How well does my Naive Bayes model perform on this test data? 

| Y | X | P(ate it)\*P(X\|ate it) | P(did not eat it)\*P(X\|did not eat it) | Yhat |
| --- | --- | --- | --- | --- |
| did not eat it | peanut M&M | 2/5*3/4 | 3/5*1/6  | ate it |
| ate it | peanut M&M | 2/5*3/4 | 3/5*1/6  | ate it  |
| did not eat it | raisin M&M | 2/5*1/4 | 3/5*3/6 | did not eat it |
| ate it | raisin M&M | 2/5*1/4  | 3/5*3/6 | did not eat it  |
| did not eat it | regular M&M | 2/5*0  | 3/5*2/6  | did not eat it |

Based on this test data:
* What is the accuracy? 6/10
* What is the confusion matrix?

|  | ate it | did not eat it |
| -- | --- | ---- |
| ate it | 1 | 1 |
| did not eat it | 1 | 2 |