Before you go into Naive Bayes, 

you need to understand what ‘Conditional Probability’ is and what is the ‘Bayes Rule’.

The conditional probability of A given B, is the probability of A occurring given that B has already occurred.

Mathematically, Conditional probability of A given B can be computed as: $P(A|B)= \frac{P(A \& B)}{P(B)}$

|         | Female | Male | Total |
|---------|--------|------|-------|
| Teacher | 8      | 12   | 20    |
| Student | 32     | 48   | 80    |
| Total   | 40     | 80   | 100   |

What is the probability of selecting a teacher if he is a male?

<img src="https://www.machinelearningplus.com/wp-content/uploads/2018/11/01_bayes_rule_derive_new-1024x355.png"/>

<img src="https://www.machinelearningplus.com/wp-content/uploads/2018/11/02_bayes_rule_new-1024x278.png"/>

<img src="https://www.machinelearningplus.com/wp-content/uploads/2018/11/03_bayes_rule_naive_bayes_new-1024x511.png" />

<img src="https://www.machinelearningplus.com/wp-content/uploads/2018/11/04_naive_bayes_interpretation_new-1024x550.png" />

The fundamental Naive Bayes assumption is that each feature makes an:

* independent
* equal

contribution to the outcome.

What is Gaussian Naive Bayes?

So far we’ve seen the computations when the X’s are categorical. But how to compute the probabilities when X is a continuous variable?

If we assume that the X follows a particular distribution, then you can plug in the probability density function of that distribution to compute the probability of likelihoods.

If you assume the X’s follow a Normal (aka Gaussian) Distribution, which is fairly common, we substitute the corresponding probability density of a Normal distribution and call it the Gaussian Naive Bayes. You need just the mean and variance of the X to compute this formula.

$$P(X|Y=c) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{\frac{-(x-\mu)^2}{2\sigma^2}}$$

where mu and sigma are the mean and variance of the continuous X computed for a given class ‘c’ (of Y).

To make the features more Gaussian like, you might consider transforming the variable using something like the Box-Cox to achieve this.

That’s it. Now, let’s build a Naive Bayes classifier.

In [2]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline

In [3]:
df = pd.read_csv("weather.nominal.csv")

In [4]:
df

Unnamed: 0,outlook,temperature,humidity,windy,play
0,sunny,hot,high,False,no
1,sunny,hot,high,True,no
2,overcast,hot,high,False,yes
3,rainy,mild,high,False,yes
4,rainy,cool,normal,False,yes
5,rainy,cool,normal,True,no
6,overcast,cool,normal,True,yes
7,sunny,mild,high,False,no
8,sunny,cool,normal,False,yes
9,rainy,mild,normal,False,yes


In [14]:
outlook = df.groupby("outlook")['play'].value_counts().unstack()

outlook.fillna(0,inplace=True)

outlook.loc['Total'] = outlook.sum()

outlook['P(yes)'] = outlook['yes']/outlook.loc['Total','yes']

outlook['P(no)'] = outlook['no']/outlook.loc['Total','no']

In [15]:
outlook

play,no,yes,P(yes),P(no)
outlook,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
overcast,0,4,0.444444,0.0
rainy,2,3,0.333333,0.4
sunny,3,2,0.222222,0.6
Total,5,9,1.0,1.0


In [17]:
temperature = df.groupby("temperature")['play'].value_counts().unstack()

temperature.fillna(0,inplace=True)

temperature.loc['Total'] = temperature.sum()

temperature['P(yes)'] = temperature['yes']/temperature.loc['Total','yes']

temperature['P(no)'] = temperature['no']/temperature.loc['Total','no']

temperature

play,no,yes,P(yes),P(no)
temperature,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
cool,1,3,0.333333,0.2
hot,2,2,0.222222,0.4
mild,2,4,0.444444,0.4
Total,5,9,1.0,1.0


In [19]:
humidity = df.groupby("humidity")['play'].value_counts().unstack()

humidity.fillna(0,inplace=True)

humidity.loc['Total'] = humidity.sum()

humidity['P(yes)'] = humidity['yes']/humidity.loc['Total','yes']

humidity['P(no)'] = humidity['no']/humidity.loc['Total','no']

humidity

play,no,yes,P(yes),P(no)
humidity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
high,4,3,0.333333,0.8
normal,1,6,0.666667,0.2
Total,5,9,1.0,1.0


In [22]:
wind = df.groupby("windy")['play'].value_counts().unstack()

wind.fillna(0,inplace=True)

wind.loc['Total'] = wind.sum()

wind['P(yes)'] = wind['yes']/wind.loc['Total','yes']

wind['P(no)'] = wind['no']/wind.loc['Total','no']

wind

play,no,yes,P(yes),P(no)
windy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
False,2,6,0.666667,0.4
True,3,3,0.333333,0.6
Total,5,9,1.0,1.0


Predict the play for the following

Sunny, Hot, Normal, False

$P(Yes|(Sunny, Hot, Normal, False)) = \frac{P(Sunny|Yes)P(Hot|Yes)P(Normal|Yes)P(False|Yes)P(Yes)}{P(Sunny)P(Hot)P(Normal)P(False)} $

In [29]:
outlook.loc['sunny']['P(yes)'],humidity.loc['normal']['P(yes)'],temperature.loc['hot']['P(yes)'],wind.loc[False]['P(yes)']

(0.2222222222222222,
 0.6666666666666666,
 0.2222222222222222,
 0.6666666666666666)