# Naive Bayes Classifier

Naive Bayes is a statistical classification technique based on Bayes Theorem. It is one of the simplest supervised learning algorithms. 

<img src="image/bayes.png">

- P(h): the probability of hypothesis h being true (regardless of the data). This is known as the prior probability of h.
- P(D): the probability of the data (regardless of the hypothesis). This is known as the prior probability.
- P(h|D): the probability of hypothesis h given the data D. This is known as posterior probability.
- P(D|h): the probability of data d given that the hypothesis h was true. This is known as likelihood.

# How it works

Let’s understand the working of Naive Bayes through an example. Given an example of weather conditions and playing cricket. You need to calculate the probability of playing cricket. Now, you need to classify whether players will play or not, based on the weather condition.

<img src="image/dataset.png" width="120">


## In case of a single feature

1. Calculate the prior probability for given class labels
2. Find Likelihood probability with each attribute for each class
3. Put these value in Bayes Formula and calculate posterior probability
4. See which class has a higher probability, given the input belongs to the higher probability class


<img src="image/example.png" >


#### Now suppose you want to calculate the probability of playing when the weather is overcast.

<img src="image/table.png" width="480">

#### Probability of playing:
`P(Yes | Overcast) = P(Overcast | Yes) P(Yes) / P (Overcast) .....................(1)`
1. Calculate Prior Probabilities:

`P(Overcast) = 4/14 = 0.29`

`P(Yes)= 9/14 = 0.64`

2. Calculate Likelihood Probabilities:

`P(Overcast |Yes) = 4/9 = 0.44`

3. Put Prior and Likelihood probabilities in equation (1)

`P (Yes | Overcast) = 0.44 * 0.64 / 0.29 = 0.98`


#### Probability of not playing:

<img src="image/table.png" width="480">

`P(No | Overcast) = P(Overcast | No) P(No) / P (Overcast) .....................(2)`

1. Calculate Prior Probabilities:

`P(Overcast) = 4/14 = 0.29`

`P(No)= 5/14 = 0.36`

2. Calculate Likelihood Probabilities:

`P(Overcast |No) = 0/9 = 0`

3. Put Prior and Likelihood probabilities in equation (2)

`P (No | Overcast) = 0 * 0.36 / 0.29 = 0`

4. **The probability of a 'Yes' class is higher. So you can determine here if the weather is overcast than players will play the game.**

> # Task #1: 
### Now suppose you want to calculate the probability of playing when the weather is rainy. (5 mins)


# Classifier Building in Scikit-learn

In [1]:
from sklearn import preprocessing
import pandas as pd


# Assigning features and label variables
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','Sunny',
        'Rainy','Sunny','Overcast','Overcast','Rainy']

# temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Mild']

play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No']

In [2]:
#creating labelEncoder
le = preprocessing.LabelEncoder()

# Converting string labels into numbers.
weather_encoded=le.fit_transform(weather)
label=le.fit_transform(play)
print(weather_encoded, ' \n',label)

[2 2 0 1 1 1 0 2 2 1 2 0 0 1]  
 [0 0 1 1 1 0 1 0 1 1 1 1 1 0]


In [3]:
df= pd.DataFrame()
df['weather'] = weather_encoded
df['play'] = label
df

Unnamed: 0,weather,play
0,2,0
1,2,0
2,0,1
3,1,1
4,1,1
5,1,0
6,0,1
7,2,0
8,2,1
9,1,1


In [8]:
df['weather'].values.reshape(-1,1)

array([[2],
       [2],
       [0],
       [1],
       [1],
       [1],
       [0],
       [2],
       [2],
       [1],
       [2],
       [0],
       [0],
       [1]], dtype=int64)

In [19]:
#Import Gaussian Naive Bayes model
from sklearn.naive_bayes import GaussianNB

#Create a Gaussian Classifier
model = GaussianNB()

# Train the model 
model.fit(df['weather'].values.reshape(-1,1),
          df['play'])


GaussianNB(priors=None, var_smoothing=1e-09)

In [21]:
# let's check for rainy weather
predicted= model.predict([[1]]) 
print("Predicted Value:", predicted)

Predicted Value: [1]


> # Assignment #2
There is a CSV data in `data/` folder. Implement Naive Bayes with that data. (using scikit-learn)