# Naive Bayes

[Naive Bayes](https://github.com/justmarkham/DAT4/blob/master/slides/13_naive_bayes.pdf) is a simple yet decently effective algorithm for classification. 

It's based on [Bayes' Theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem):

$$
P(A|B) = \frac{P(B|A)*P(A)}{P(B)}
$$

Or in classification terms:

$$
P(class\ C\ |\ x_{i}) = \frac{P(x_{i}\ |\ class\  C)*P(class\ C)}{P(x_{i})}
$$

The class of the instance is whatever class has the highest posterior $P(class\ C\ |\ x_{i})$.

Naive Bayes makes the naive assumption on $P(x_{i}\ |\ class\ C)$ that the individual features are independent of each other. 

i.e

$$
P(x_{1}, x_{2},...x_{i}\ |\ class\ C) = P(x_{1}\ |\ class\ C)\ P(x_{2}\ |\ class\ C)\ \text{...} P(x_{i}\ |\ class\ C)
$$

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Play Data

We have two weeks of data with the following attributes:

- Holiday: Whether the day was a holiday or not.
- Weather: What the weather of the day was.
- Play (Target): Whether the kids went out to play or not.

Our task is to learn a Naive Bayes classifier that will say whether the kids will go out to play or not given the `Holiday` and `Weather`.

In [2]:
play_data = pd.DataFrame([("Yes", "Sunny", "Yes"),
                          ("No", "Overcast", "Yes"),
                          ("No", "Rainy", "No"),
                          ("Yes", "Sunny", "Yes"),
                          ("No", "Sunny", "No"),
                          ("Yes", "Overcast", "Yes"),
                          ("Yes", "Rainy", "No"),
                          ("No", "Sunny", "Yes"),
                          ("Yes", "Overcast", "Yes"),
                          ("No", "Rainy", "No"),
                          ("Yes", "Rainy", "Yes"),
                          ("Yes", "Overcast", "No"),
                          ("Yes", "Sunny", "Yes"),
                          ("No", "Overcast", "No")
                         ], columns=['Holiday', 'Weather', 'Play'])

In [3]:
play_data

Unnamed: 0,Holiday,Weather,Play
0,Yes,Sunny,Yes
1,No,Overcast,Yes
2,No,Rainy,No
3,Yes,Sunny,Yes
4,No,Sunny,No
5,Yes,Overcast,Yes
6,Yes,Rainy,No
7,No,Sunny,Yes
8,Yes,Overcast,Yes
9,No,Rainy,No


## "Training" a Naive Bayes

"Training" a Naive Bayes involves calculating the **likelihood** which is the conditional probability of each feature given each class and the **prior** for each class.

Note: We can ignore the denominator (**evidence prior**) as we are comparing **posteriors** for class and the denominator is same for each class.

### Computing class priors

#### Exercise 1

Compute the class priors for the two classes.

In [7]:
# Code Here
priors = play_data['Play'].value_counts(normalize=True).to_dict() 

In [8]:
priors

{'Yes': 0.5714285714285714, 'No': 0.42857142857142855}

### Computing likelihood 

#### Exercise 2

Compute the likelihood for each feature for both classes.

In [10]:
likelihoods = {}
likelihoods['Play'] = {}
likelihoods['Play']['Holiday'] = play_data[play_data['Play'] == 'Yes']['Holiday'].value_counts(normalize=True).to_dict()
likelihoods['Play']['Weather'] = play_data[play_data['Play'] == 'Yes']['Weather'].value_counts(normalize=True).to_dict()

likelihoods['NoPlay'] = {}
likelihoods['NoPlay']['Holiday'] = play_data[play_data['Play'] == 'No']['Holiday'].value_counts(normalize=True).to_dict()
likelihoods['NoPlay']['Weather'] = play_data[play_data['Play'] == 'No']['Weather'].value_counts(normalize=True).to_dict()

In [11]:
likelihoods

{'Play': {'Holiday': {'Yes': 0.75, 'No': 0.25},
  'Weather': {'Sunny': 0.5, 'Overcast': 0.375, 'Rainy': 0.125}},
 'NoPlay': {'Holiday': {'No': 0.6666666666666666, 'Yes': 0.3333333333333333},
  'Weather': {'Rainy': 0.5,
   'Overcast': 0.3333333333333333,
   'Sunny': 0.16666666666666666}}}

## Making predictions

The prediction phase involves computing the **posterior** for each class and choosing the class with the highest probability.

#### Exercise 3

Create a class `NBClassifier` that takes `priors` and `likelihoods` when initialized. 

It should have a `predict` function that takes inputs `Holiday` and `Weather` and returns the prediction for `Play` i.e True or False.

In [None]:
# Code Here