# Lecture 2: Naive Bayes
***

<img src="figs/bayes.jpg" width=1201 height=50> 

<!---
![my_image](files/figs/bayes.jpg)
-->


<a id='prob1'></a>

<a id='prob2'></a>

### Problem 1: Naive Bayes on Symbols
***

> This problem was adopted from [Naive Bayes and Text Classification I: Introduction and Theory](https://arxiv.org/abs/1410.5329) by Sebastian Raschka

Consider the following training set of 12 symbols which have been labeled as either + or -: 

<br>

<img src="figs/shapes.png" width=500>


Answer the following questions: 


**A**: What are the general features associated with each training example? Shape and Color <br>
labels are + and -

In the next part, we'll use Naive Bayes to classify the following test example: 

<img src="figs/bluesquare.png" width=200>

OK, so this symbol actually appears in the training set, but let's pretend that it doesn't.  

The decision rule can be defined as 

>Classify ${\bf x}$ as + if <br>
>$p(+ ~|~ {\bf x} = [blue,~ square]) \geq p(- ~|~ {\bf x} = [blue, ~square])$ <br>
>else classify sample as -

**B**: What are the Maximum Likelihood Estimates of the priors $p(+)$ and $p(-)$?<br>
>$p(+) = 7/12 \\ p(-) = 5/12$


**C**: Identify and compute estimates of the class-conditional probabilities required to predict the class of ${\bf x} = [blue,~square]$?
>$p(blue|+) ~= ~3/7 \\ ~p(square|+) ~= ~5/7 \\ ~p(blue|-) ~= ~3/5 \\~p(square|-) ~= ~3/5$<br>

**D**: Using the estimates computed above, compute the **posterior** scores for each label, and find the Naive Bayes prediction of the label for ${\bf x} = [blue,~square]$. <br>

>$p(+|b,s) ~is ~proportional ~to ~7/12 * 3/7 * 5/7 ~= ~0.18$<br>
>$p(-|b,s) ~is ~proportional ~to ~5/12 * 3/5 * 3/5 ~= ~0.15$<br>

The naive bayes prediction is +

**E**: If you haven't already, compute the class-conditional probabilities scores $\hat{p}({\bf x} = [blue,~square] ~|~ +)$ and $\hat{p}({\bf x} = [blue,~square] ~|~ -)$ under the Naive Bayes assumption.  How can you reconsile these values with the final prediction that would made? 
<br><br>
Note that if we were to take only class conditional probabilities into consideration, we would have predicted -, this is the way prior knowlege affects our computations.

<a id='prob3'></a>

### Problem 2: Laplace Smoothing 
***

Consider the same training set from Problem 2, but suppose we see the following test example: 
    
<img src="figs/greencircle.png" width=200>

Before you get too far into trying to predict the label of the green circle, look carefully at the training set.  Notice that there are no green shapes labeled - in the training set, so when we try to compute the class-conditional probability $p(green ~|~ -)$ we'll get a zero probability.  To fix this, you'll implement Laplace smoothing. Notice that this is a little different than the SPAM vs HAM example shown in the video.  We actually have two very different features in shapes and colors. We'll apply Laplace Smoothing to the shape and color class-conditional probabilities separately. 

**A**: What would the general formula for the estimate of $p(shape ~|~ class)$ with Laplace Smoothing look like for the given training set?  What is the *vocabulary* in the shape case? <br>
>$p(shape='shape' ~|~ class) = (~(~number ~of ~shapes ~= ~'shape'~in~class~) ~+ ~1)~~/~~(~total~number~of~figures~in~class~ ~+~~number~of~distinct~shapes~in~class~)$

Vocabulary of shape: SQUARE, CIRCLE

**B**: What would the general formula for the estimate of $p(color ~|~ class)$ with Laplace Smoothing look like for the given training set?  What is the *vocabulary* in the color case?

>$p(color='color' ~|~ class) =  = (~(~number ~of ~colors ~= ~'color'~in~class~) ~+ ~1)~~/~~(~total~number~of~figures~in~class~ ~+~~number~of~distinct~colors~in~class~)$

Vocabulary of color: RED, GREEN, BLUE

**C**: Predict the label for the green circle using the Laplaced smoothed class-conditional probability formulas.  Don't forget to apply Laplace Smoothing to the priors as well! 

>$p(green|+) ~= ~(2+1)/(7+3) ~= 3/10\\
~p(circle|+) ~= ~(2+1)/(7+2) ~= 1/3\\
~p(green|-) ~= ~(0+1)/(5+3) ~= 1/8\\
~p(circle|-) ~= ~(2+1)/(5+2) ~= 3/7\\
~p(+) ~= ~(7+1)/(12+2) ~= 4/7\\
~p(-) ~= ~(5+1)/(12+2) ~= 3/7$<br>

Predicting label for green circle:<br>
>$p(+|green,circle) = p(+) * p(green|+) * p(circle|+) = 4/7 * 3/10 * 1/3 = 2/35 = 0.05714$
>$p(-|green,circle) = p(-) * p(green|-) * p(circle|-) = 3/7 * 1/8 * 3/7 = 9/392 = 0.02296$
<br>

Predicted label is +

<br>

### Problem 3: Unknown Features
***

Once again consider the training set from Problem 2, but suppose we see the following test example: 
    
<img src="figs/yellowsquare.png" width=200>

OK, this is a weird one.  Up until this point, we've never seen the color *yellow*, and thus don't include it in the color vocabulary.  One way that we could handle this is to add to the color vocabulary, and then recompute the the class-conditional probabilities with *yellow* included in the vocabulary. 

But what happens when on the next test example we see a *pink* circle (or worse, a triangle)? We'd rather not continue to modify our probability estimates whenever we see shape or color that we haven't see before.  One solution to this is to just assume we'll see weird things in the future and combine all of the posibilities into an UNK feature. If we do this, then our class-conditional probabilities become 

$$
p(feature ~|~ class) = \frac{\#~instances~of~feature~in~class + 1}{\#~total~symbols~in~class + |V| + 1}
$$

where here the vocabular $V$ is the same vocabular defined by the training set. 

**A**: Predict the label of the yellow square.  

>$p(yellow|+) ~= ~(0+1)/(7+3+1) ~= 1/11\\
~p(square|+) ~= ~(5+1)/(7+2+1) ~= 3/5\\
~p(yellow|-) ~= ~(0+1)/(5+3+1) ~= 1/9\\
~p(square|-) ~= ~(3+1)/(5+2+1) ~= 1/2\\
~p(+) ~= ~(7+1)/(12+2) ~= 4/7\\
~p(-) ~= ~(5+1)/(12+2) ~= 3/7$<br>

Predicting label for yellow square:<br>
>$p(+|yellow,square) = p(+) * p(yellow|+) * p(square|+) = 4/7 * 1/11 * 3/5 = 2/35 = 0.03116$
>$p(-|green,circle) = p(-) * p(yellow|-) * p(square|-) = 3/7 * 1/9 * 1/2 = 9/392 = 0.0238$
<br>

Predicted label is +

<a id='prob1ans'></a>

<br><br><br><br>
<br><br><br><br>
<br><br><br><br>
<br><br><br><br>
### Helper Functions 
***

In [1]:
from IPython.core.display import HTML
HTML("""
<style>
.MathJax nobr>span.math>span{border-left-width:0 !important};
</style>
""")

from IPython.display import Image