# Naive Bayes



In [24]:
import pandas as pd
import numpy as np

eat = pd.read_csv("../data/does_james_eat.csv")
eat

Unnamed: 0,is_hungry,has_poptarts,is_driving,will_eat
0,yes,yes,no,yes
1,yes,no,no,yes
2,yes,no,yes,no
3,no,yes,yes,no
4,no,no,yes,no
5,yes,no,no,yes
6,yes,yes,no,yes
7,no,no,yes,no
8,yes,no,no,yes
9,yes,yes,no,yes


## Bayes Theorem

$$ P(c|x) = \frac{P(x|c)\times{P(c)}}{P(x)}$$

$ P(c|x) $: posterior probability; probability of a class (c) given a predictor (x)
- example: probability that I will eat given that I am hungry

$ P(x|c) $: likelihood; probability of the predictor (x) given the class (c)
- example: probability that I am hungry given that I will eat

$ P(c) $: prior probability of the class

$ P(x) $: prior probability of the predictor



## Let's try this out
what is the probability I will eat given that I am hungry?

$$ P(eat|hungry) = \frac{P(hungry|eat)\times{P(eat)}}{P(hungry)} $$

In [25]:
prob_will_eat = 7 / 11
prob_hungry_given_will_eat = 6/7
prob_hungry = 7/11 # probability of hungry given will eat + probability of hungry give won't eat

prob_will_eat_given_hungry = prob_will_eat * prob_hungry_given_will_eat/prob_hungry
print(prob_will_eat_given_hungry)

0.8571428571428571


what is the probability I won't eat given that I am hungry?

$$ P(wont eat|hungry) = \frac{P(hungry|wont eat)\times{P(wont eat)}}{P(hungry)} $$

In [26]:
prob_wont_eat = 4 / 11
prob_hungry_given_wont_eat = 1/4
prob_hungry = 7/11 # probability of hungry given will eat + probability of hungry give won't eat

prob_wont_eat_given_hungry = prob_wont_eat * prob_hungry_given_wont_eat/prob_hungry
print(prob_wont_eat_given_hungry)

0.14285714285714288


note: sometimes the prior probability of the predictor $ P(x) $ is difficult to calculate so it will often be removed in the calculation. However, what remains is not a probability but a score that is proportional to the posterior probability  $ P(c|x) $ 

In other words,

$$  P(x|c)\times{P(c)} \propto P(c|x) $$

so where as we would previously pick the target class given I am hungry that has the highest probability, instead I will take that with the higher score.

In [27]:
prob_will_eat = 7 / 11
prob_hungry_given_will_eat = 6/7

score_will_eat_given_hungry = prob_will_eat * prob_hungry_given_will_eat
print("score proportional to P(eat | hungry):", score_will_eat_given_hungry)

prob_wont_eat = 4 / 11
prob_hungry_given_wont_eat = 1/4

score_wont_eat_given_hungry = prob_wont_eat * prob_hungry_given_wont_eat
print("score proportional to P(wont eat | hungry):", score_wont_eat_given_hungry)

score proportional to P(eat | hungry): 0.5454545454545454
score proportional to P(wont eat | hungry): 0.09090909090909091


In [28]:
buffer = eat['will_eat'] == 'yes'
print("will eat")
print(eat[buffer])

print("won't eat")
print(eat[~buffer])

will eat
   is_hungry has_poptarts is_driving will_eat
0        yes          yes         no      yes
1        yes           no         no      yes
5        yes           no         no      yes
6        yes          yes         no      yes
8        yes           no         no      yes
9        yes          yes         no      yes
10        no          yes         no      yes
won't eat
  is_hungry has_poptarts is_driving will_eat
2       yes           no        yes       no
3        no          yes        yes       no
4        no           no        yes       no
7        no           no        yes       no


What if the question becomes what is prediction (eat or won't eat) given multiple inputs such as hungry and having poptarts?

Well, then it just becomes

$$ p(eat | X) \propto p(hungry | eat) \times p(poptarts | eat) \times p(eat)$$

and

$$ p(wonteat | X) \propto p(hungry | wont eat) \times p(poptarts | wont eat) \times p(wont eat)$$

- $ p(hungry | eat) = \frac{6}{7}$
- $ p(poptarts | eat) = \frac{4}{7}$
- $ p(driving | eat) = \frac{0}{7}$

- $ p(hungry | wont eat) = \frac{1}{4}$
- $ p(poptarts | wont eat) = \frac{1}{4} $
- $ p(driving | wont eat) = \frac{4}{4} $ 

- $ p(eat) =  \frac{7}{11} $
- $ p(wont eat) = \frac{4}{11} $


so let's plug those values in to see if given a new observation with the inputs of hungry = 'yes' and poptarts = 'yes' what we would should predict with respect whether or not eating food will happen.




In [29]:
prob_will_eat = 7 / 11
prob_wont_eat = 4 / 11
prob_hungry_given_will_eat = 6/7
prob_poptart_given_will_eat = 4/7
prob_hungry_given_wont_eat = 1/4
prob_poptart_given_wont_eat = 1/4

score_will_eat = prob_will_eat * prob_hungry_given_will_eat * prob_poptart_given_will_eat
score_wont_eat = prob_wont_eat * prob_hungry_given_wont_eat * prob_poptart_given_wont_eat

print('score will eat:', score_will_eat)
print("score won't eat:", score_wont_eat)



score will eat: 0.3116883116883116
score won't eat: 0.022727272727272728


Okay, what if we take all of the inputs? 



In [30]:
prob_will_eat = 7 / 11
prob_wont_eat = 4 / 11
prob_hungry_given_will_eat = 6/7
prob_poptart_given_will_eat = 4/7
prob_driving_given_will_eat = 0/4
prob_hungry_given_wont_eat = 1/4
prob_poptart_given_wont_eat = 1/4
prob_driving_given_wont_eat = 4/4

score_will_eat = prob_will_eat * prob_hungry_given_will_eat * prob_poptart_given_will_eat * prob_driving_given_will_eat
score_wont_eat = prob_wont_eat * prob_hungry_given_wont_eat * prob_poptart_given_wont_eat * prob_driving_given_wont_eat

print('score will eat:', score_will_eat)
print("score won't eat:", score_wont_eat)

score will eat: 0.0
score won't eat: 0.022727272727272728


**Why? Why did we get 0 for our score for `will eat`?**

Well, it has to do with $ P(driving | eat)$. This value is $ 0 $ and $ 0 $ multiplied by any other value is $ 0 $. Well, how do we prevent this? Well, there is a value $ \alpha $ which we can use to ensure that we don't run into these $ 0 $ counts. We can set this to $1$

In [31]:
prob_will_eat = 8/ 12
prob_wont_eat = 5 / 12
prob_hungry_given_will_eat = 7/8
prob_poptart_given_will_eat = 5/8
prob_driving_given_will_eat = 1/5 
prob_hungry_given_wont_eat = 2/5
prob_poptart_given_wont_eat = 2/5
prob_driving_given_wont_eat = 5/5

score_will_eat = prob_will_eat * prob_hungry_given_will_eat * prob_poptart_given_will_eat * prob_driving_given_will_eat
score_wont_eat = prob_wont_eat * prob_hungry_given_wont_eat * prob_poptart_given_wont_eat * prob_driving_given_wont_eat

print('score will eat:', score_will_eat)
print("score won't eat:", score_wont_eat)

score will eat: 0.07291666666666666
score won't eat: 0.06666666666666668


## Naive Bayes

- For years, best spam filtering methods used naive Bayes.
- Our first probabilistic classifier where we **think of learning as a problem of statistical inference**.

- Classification technique based on Bayes’ Theorem **with an assumption of independence among predictors** - hence the Naive. 
    - The presence of a particular feature in a class is unrelated to the presence of any other feature.
    - This is like saying: If you are hungry the probability of the symptoms(growling stomach, mouth watering, weakness,...) manifesting are independant 

E.g. You receive a spam mail that contains the words "Money", "URGENT!", "Prize!". Even if these features depend on each other or others, all of these properties independently contribute to the probability that this email is SPAM.

- Naive Bayes is easy to build and useful for very large data sets. 

- Naive Bayes outperforms even highly sophisticated classification methods and works well with text data.