
# Gaussian Naive Bayes Classifier
---


##**Case Study:** Iris Dataset

**Objective:** The objective of this challenge is to make you know about Naive Bayes applied on Numerical Values.

**DataSet Columns:**<br>
*	 Petal Height
*  Petal Width
*  Sepal Height
*  Sepal Width
*  Target: The kind of the Iris flower (Virginica, Setosa, Versicolor)

# Importing Libraries

Start by importing the necessary libraries. For this problem we need the following:


*   Numpy: for numerical calculations
*   Pandas: to deal with the dataset
*   math: to work on the mathematical aspects of Naive Bayes



In [None]:
import numpy as np
import pandas as pd
import math

# Loading the Dataset

The dataset we have does not include names for different columns. This is why we should name the columns by hand as ['Sepal Height', 'Sepal Width', 'Petal Height', 'Petal Width', 'Target'].

In [None]:
df = pd.read_csv('iris.csv')
df.columns = ['Sepal Height', 'Sepal Width', 'Petal Height', 'Petal Width', 'Target']
df.head()

Unnamed: 0,Sepal Height,Sepal Width,Petal Height,Petal Width,Target
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa


##Data Preprocessing

You may have noticed that the Target Column contains string values rather than numbers. This is why, we will Change the string values to numerical.

In [None]:
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
df['Target'] = label_encoder.fit_transform(df['Target'])
df.head()

Unnamed: 0,Sepal Height,Sepal Width,Petal Height,Petal Width,Target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [None]:
if df.isnull().values.any():
    df = df.dropna()

#Naive Bayes

##Finding different Classes

In [None]:
num_classes = df['Target'].nunique()
print(f"Number of classes: {num_classes}")

Number of classes: 3


SO we have 3 classes of flowers.

Remember the basic formula that we used for Naive Bayes. <br>
<img src="https://equatio-api.texthelp.com/svg/%5C%20P(%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D)%3D%5Cfrac%7BP(%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D%7C%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D)%5Ccdot%20P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D%5Cright)%7D%7BP(%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D)%7D" alt="P of open paren C l a. s s divides F of e a. t u r e s close paren equals the fraction with numerator P of open paren F of e a. t u r e s divides C l a. s s close paren times P of open paren C l a. s s close paren and denominator P of F of e a. t u r e s">

Since we have 3 classes, and 4 features, we need to calculate the following probabilities.<br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 0 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren"> <br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 1 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren"> <br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 2 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren">


So in reality we need to calculate the following:

<img src="https://equatio-api.texthelp.com/svg/P_0%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)" alt="P sub 0 equals P of open paren F sub 1 divides C l a. s s sub 0 close paren P of open paren F sub 2 divides C l a. s s sub 0 close paren P of open paren F sub 3 divides C l a. s s sub 0 close paren P of open paren F sub 4 divides C l a. s s sub 0 close paren"><img src="https://equatio-api.texthelp.com/svg/P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%5Cright)" alt="P of open paren C l a. s s sub 0 close paren"><br><img src="https://equatio-api.texthelp.com/svg/P_1%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)" alt="P sub 1 equals P of open paren F sub 1 divides C l a. s s sub 1 close paren P of open paren F sub 2 divides C l a. s s sub 1 close paren P of open paren F sub 3 divides C l a. s s sub 1 close paren P of open paren F sub 4 divides C l a. s s sub 1 close paren"><img src="https://equatio-api.texthelp.com/svg/P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%5Cright)" alt="P of open paren C l a. s s sub 1 close paren"><br>
<img src="https://equatio-api.texthelp.com/svg/P_2%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%5Cright)" alt="P sub 2 equals P of open paren F sub 1 divides C l a. s s sub 2 close paren P of open paren F sub 2 divides C l a. s s sub 2 close paren P of open paren F sub 3 divides C l a. s s sub 2 close paren P of open paren F sub 4 divides C l a. s s sub 2 close paren P of open paren C l a. s s sub 2 close paren">



We see which one is the greatest, and based on that we assign the class.

Those probabilities will be approximated using a distribution.
In this example, we will use the Gaussien Distribution.

##Gaussian Probability Density Function

We recall that the Gaussien Probability density function is given by:
<br>
<img src="https://equatio-api.texthelp.com/svg/f%5Cleft(x%5Cright)%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Ctextcolor%7B%238D44AD%7D%7B%5Csigma%7D%7D%5Cexp%5Cleft%5C%7B-%5Cfrac%7B%5Cleft(x-%5Ctextcolor%7B%233697DC%7D%7Bmean%7D%5Cright)%5E2%7D%7B2%5Ctextcolor%7B%238D44AD%7D%7B%5Csigma%7D%5E2%7D%5Cright%5C%7D" alt="f of x equals 1 over the square root of 2 pi sigma the exp of open brace negative the fraction with numerator open paren x minus m e a. n close paren squared and denominator 2 sigma squared close brace">

In [None]:
def gaussian_pdf(x, mean, sigma):
  return (1 / (np.sqrt(2 * np.pi) * sigma)) * np.exp(-((x - mean) ** 2) / (2 * sigma ** 2))

##Naive Bayes Implementation

In [None]:
def naive_bayes(df, features, target_name):

  n_data = len(df)

  # P(Class)
  p_class_0 = len(df[df[target_name] == 0]) / n_data
  p_class_1 = len(df[df[target_name] == 1]) / n_data
  p_class_2 = len(df[df[target_name] == 2]) / n_data

  p_feature_given_class_0 = []
  p_feature_given_class_1 = []
  p_feature_given_class_2 = []

  for i in range(len(features)):
      # Calculate P(F_i / Target=0)
      p_f_given_class_0 = gaussian_pdf(features[i], np.mean(df[df[target_name] == 0].iloc[:, i].values), np.std(df[df[target_name] == 0].iloc[:, i]))
      p_feature_given_class_0.append(p_f_given_class_0)

      # Calculate P(F_i / Target=1)
      p_f_given_class_1 = gaussian_pdf(features[i], np.mean(df[df[target_name] == 1].iloc[:, i].values), np.std(df[df[target_name] == 1].iloc[:, i]))
      p_feature_given_class_1.append(p_f_given_class_1)

      # Calculate P(F_i / Target=2)
      p_f_given_class_2 = gaussian_pdf(features[i], np.mean(df[df[target_name] == 2].iloc[:, i].values), np.std(df[df[target_name] == 2].iloc[:, i]))
      p_feature_given_class_2.append(p_f_given_class_2)

  p0 = np.prod(p_feature_given_class_0) * p_class_0
  p1 = np.prod(p_feature_given_class_1) * p_class_1
  p2 = np.prod(p_feature_given_class_2) * p_class_2

  return np.argmax([p0, p1, p2])

Get the corresponding class for a flower having the following features [4.9, 3.0,	1.4,	0.2].

In [None]:
flower_features = np.array([4.9, 3.0, 1.4, 0.2])
y_pred = naive_bayes(df, flower_features, 'Target')
print(f"The predicted class is {y_pred}")

The predicted class is 0


Now here we will splot our data between 2 sets:

*   One from which the Naive Bayes Model will take the probabilities. (The **old** set) 80%
*   one that it hasn't seen before to test on it (The **new** set) 20%

In [None]:
from sklearn.model_selection import train_test_split

X = df.drop('Target', axis=1)
y = df['Target']
old_x, new_x,old_y,new_y = train_test_split(X,y, test_size=0.2)

Now use the function you built and get the corresponding testing predictions, and then compute the accuracy of your model.

In [None]:
df_Train = pd.concat([old_x, old_y], axis=1)
error = 0
predictions = []
for i in range(len(new_x)):
    predictions.append(naive_bayes(df_Train, new_x.iloc[i], 'Target'))
    if predictions[i] != new_y.iloc[i]:
        error += 1

accuracy = (len(new_x) - error) * 100/ len(new_x)
print(f"The accuracy of the model is {accuracy}")

  p_f_given_class_0 = gaussian_pdf(features[i], np.mean(df[df[target_name] == 0].iloc[:, i].values), np.std(df[df[target_name] == 0].iloc[:, i]))
  p_f_given_class_1 = gaussian_pdf(features[i], np.mean(df[df[target_name] == 1].iloc[:, i].values), np.std(df[df[target_name] == 1].iloc[:, i]))
  p_f_given_class_2 = gaussian_pdf(features[i], np.mean(df[df[target_name] == 2].iloc[:, i].values), np.std(df[df[target_name] == 2].iloc[:, i]))


The accuracy of the model is 93.33333333333333
