# The One Goal for Today

To understand how to fit a Naive Bayes model.

# Bayes' Rule reviewed 

Remember Monday, with the M&Ms? The complete worked additional example for cups with *4* M&Ms is now in the Day 27 notebook, so you can check your own math.

Bayes' rule: $P(Y|X) = \frac{P(Y)*P(X|Y)}{P(X)}$.

* $P(Y|X)$ is the *posterior*
* $P(Y)$ is the *prior*
* $P(X|Y)$ is the *likelihood*
* $P(X)$ is the *evidence* (or *normalization*)

# Fitting and predicting using Naive Bayes

Let's imagine I have a very very large bag of M&Ms. Some of them are peanut (N), some raisin (R) and some regular chocolate (C). I only like peanut M&Ms. The thing is, I'm in a dark room (or it's Halloween night) so I can only guess the type of the M&M by feel. I repeatedly reach into the bag, grab one M&M and then either eat it (E) or do not eat it (D). You then record the type of the M&M and what I did. This gives the dataset below:

| Candy | Outcome |
| ----- | ---- |
| peanut M&M (N) | did eat it (E) |
| peanut M&M (N) | did eat it (E) |
| peanut M&M (N) | did eat it (E) |
| raisin M&M (R) | did not eat it (D) |
| raisin M&M (R) | did eat it (E) |
| chocolate M&M (C) | did not eat it (D) |
| peanut M&M (N) | did not eat it (D) |
| chocolate M&M (C) | did not eat it (D) |
| raisin M&M (R) | did not eat it (D) |
| raisin M&M (R) | did not eat it (D) |

Side note: we typically convert qualitative values to numbers for machine learning; it enables us to use all the power of numpy, at some cost in readability to humans. I'm not going to do this today, for readability to us, but on Friday we'll be back to 'numeric' representation of features.

We want to fit a model that will tell us the probability that I ate it (or did not eat it) given the type of candy it is. We can do that by calculating:
* The priors for E and D
* The likelihood of the type of candy (N, R, C) given E and D

The dependent variable here will be the outcome, and the independent variable will be the type of M&M. The dependent variable is *qualitative* so this will be a *classification* model.

## Fit

A table of counts might help. Fill this in using the data sample above:

| Outcome | Peanut M&M (N) | Chocolate M&M (C) | Raisin M&M (R) | Total |
| -------- | ------- | ---------- | ------------ | ------- |
| Did eat it (E)        |  |  |  |  |
| Did not eat it (D) |  |  |  |  |
| Total |  |  |  |  |

* Calculate the prior for E and the prior for D: $P(E) = ??$; $P(D) = ??$
* Calculate the likelihood of N given E, of C given E, of R given E, of N given D, of C given D, of R given D:

| | N | C | R |
| -------- | ------- | ---------- | ------------ |
| E        |  |  |  |
| D |  |  |  |


Store both sets of values.

## Predict 

Given a new observation, *peanut M&M*, what is my most likely behavior? Let's compare $P(E|N)$ and $P(D|N)$. 
1. Expand $P(E|N)$ using Bayes' rule: 
2. Expand $P(D|N)$ using Bayes' rule: 


Since $P(N)$ is in the denominator in both cases, we can ignore it completely for the comparison. So we only need the prior and the likelihood, both of which we calculated during __fit__.
* $P(E|N) = ??$
* $P(D|N) = ??$

Which is higher?

The Naive Bayes "predict formula" is $argmax_Y P(Y)*P(X|Y)$.

## Score

Let's say I have the test data below:

| Candy | Outcome |
| ----- | ---- |
| peanut M&M (N) | did not eat it (D) |
| peanut M&M (N) | ate it (E) |
| raisin M&M (R) | did not eat it (D) |
| raisin M&M (R) | ate it (E) |
| chocolate M&M (C) | did not eat it (D) |

* Does this data include all my labels?
* Does this data include all possible values for my one independent variable?

How well does my Naive Bayes model perform on this test data? 

| Y | X | P(E)\*P(X\|E) | P(D)\*P(X\|D) | $\hat{Y}$ |
| --- | --- | --- | --- | --- |
| D | N |  |   |  |
| D | N |  |   |   |
| D | R |  |  |  |
| E | R |   |  |  |
| D | C |   |   |  |

Based on this test data:
* What is the accuracy? 
* What is the confusion matrix?

|  | E | D |
| -- | --- | ---- |
| E |  |  |
| D |  |  |