# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) [Naive Bayes Classifier]

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*

- Explain Bayes' theorem
- Explain the components of the Bayesian "world view": posterior, prior, likelihood
- Describe Naive Bayes
- Understand how Naive Bayes is used in Machine Learning

### Introduction (5 mins)

**BAYESIAN PROBABILITY**

Probability is a representation of our uncertainty given what we know and believe to be true. Given a number of observed positive occurences over a number of events *and our prior belief about the true probability of positive occurences,* what is the *distribution of the true probability*?

### $$P\left(\;true\;|\;observed\;\right) = \frac{P\left(\;observed\;|\;true\;\right)}{P(\;observed\;)} P\left(\;true\;\right)$$

where:

$P\left(\;true\;|\;observed\;\right)$ is the **posterior probability** or **conditional probability**. This is the probability of an occurence given what we observed.

$P\left(\;observed\;|\;true\;\right)$ is the **likelihood,** which is the probability of what we observed  given our prior belief about the probability of occurence. 

${P(\;observed\;)}$ is the **marginal probability** of the observed data. 

$P\left(\;true\;\right)$ is the **prior probability** belief. It is what you thought the probability was before observing the events.

---

source: DSI2 W8D1 lecture notes

### Generative versus Discriminative models (5 mins)

A generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal?

$$P\left(\;X\;|\;y\;\right)$$


A discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal.


$$P\left(\;y\;|\;X\;\right)$$


source: http://stackoverflow.com/questions/879432/what-is-the-difference-between-a-generative-and-discriminative-algorithm

### So how is this relevant to Machine Learning? (10 mins)

![Example](breast_cancer_example.png)

source: https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/

Question:

What is the probability of actually having cancer when test returns "Positive"?

Ans: 0.078

![](original_golf_data.png) 

![](transformed_golf_data.png)

source: 5 Minutes With Ingo: Naïve Bayes
https://www.youtube.com/watch?v=IlVINQDk4o8

But how do we use it to make a prediction?

Example:
    
    Should I play golf when it is "sunny outlook", "cool temperature", "high humidity", "Windy"?
  

$$likelihood:\   P\left(\;play\ golf\;|\;sunny, cool, high, true\;\right) = (3 / 9) * (3 / 9) * (3 / 9) * (3 / 9) * (9 / 14)$$
   $$        = 0.0079  $$
   
   
$$P\left(\;Don't\ play\ golf\;|\;sunny, cool, high, true\;\right) = (2 / 4) * (1 / 5) * (4 / 5) * (3 / 5) * (5 / 14)$$
   $$        = 0.0171  $$

In [7]:
play = (3. / 9) * (3. / 9) * (3. / 9) * (3. / 9) * (9. / 14)

In [8]:
no_play = (2. / 4) * (1. / 5) * (4. / 5) * (3. / 5) * (5. / 14)

In [9]:
play / (play + no_play)

0.31645569620253156

Hence, don't play golf (since lower than 0.5)

### Why Naive?

Ans: Assumes distributions of features are independent of one another, in most cases, this is not true, but NB still offers decent performance

### Use cases:

- Spam classification
- Email foldering
- E-commerce products binning