
# Challenge of the Week: Gaussian Naive Bayes Classifier
---
© 2024, Zaka AI, Inc. All Rights Reserved.

##**Case Study:** Iris Dataset

**Objective:** The objective of this challenge is to make you know about Naive Bayes applied on Numerical Values.

**DataSet Columns:**<br>
*	 Petal Height
*  Petal Width
*  Sepal Height
*  Sepal Width
*  Target: The kind of the Iris flower (Virginica, Setosa, Versicolor)

# Importing Libraries

Start by importing the necessary libraries. For this problem we need the following:


*   Numpy: for numerical calculations
*   Pandas: to deal with the dataset
*   math: to work on the mathematical aspects of Naive Bayes



In [None]:
import numpy as np
import pandas as pd
import math

from sklearn import preprocessing
from sklearn.model_selection import train_test_split

**Mounting my drive and changing the working directory**

In [None]:
# Mounting the drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Changing the working directory
%cd /content/drive/MyDrive/Colab_Notebooks/ZAKA_AIC /Module2_2

/content/drive/MyDrive/Colab_Notebooks/ZAKA_AIC /Module2_2


# Loading the Dataset

Load the dataset in your environment. One thing to note is that the dataset you have does not include names for different columns. This is why you should name the columns by hand as ['Sepal Height', 'Sepal Width', 'Petal Height', 'Petal Width', 'Target']. Then don't forget to show the head of your dataset to get a better insight into it.

In [None]:
# Creating a dataframe with the required columns
iris_df = pd.read_csv('iris.csv')
iris_df.columns = ['Sepal Height', 'Sepal Width', 'Petal Height', 'Petal Width', 'Target']

iris_df.head()

Unnamed: 0,Sepal Height,Sepal Width,Petal Height,Petal Width,Target
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa


##Data Preprocessing

You may have noticed that the Target Column contains string values rather than numbers. This is why, you will Change the string values to numerical.

In [None]:
# Changing the string in the traget column by encoding it
encoder = preprocessing.LabelEncoder()
iris_df['Target'] = encoder.fit_transform(iris_df['Target'])

iris_df.head()

Unnamed: 0,Sepal Height,Sepal Width,Petal Height,Petal Width,Target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


Make sure we have no null values, and if we have, remove them.

In [None]:
# Checking for null values through info()
iris_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Sepal Height  150 non-null    float64
 1   Sepal Width   150 non-null    float64
 2   Petal Height  150 non-null    float64
 3   Petal Width   150 non-null    float64
 4   Target        150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB


#Naive Bayes

##Finding different Classes

First, find how many classes we have in our dataset (although it should always appear in the description of your dataset)

In [None]:
# Finding the number of classes through unique()
iris_df['Target'].unique()

array([0, 1, 2])

SO we have 3 classes of flowers.

Remember the basic formula that we used for Naive Bayes. <br>
<img src="https://equatio-api.texthelp.com/svg/%5C%20P(%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D)%3D%5Cfrac%7BP(%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D%7C%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D)%5Ccdot%20P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D%5Cright)%7D%7BP(%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D)%7D" alt="P of open paren C l a. s s divides F of e a. t u r e s close paren equals the fraction with numerator P of open paren F of e a. t u r e s divides C l a. s s close paren times P of open paren C l a. s s close paren and denominator P of F of e a. t u r e s">

Since we have 3 classes, and 4 features, we need to calculate the following probabilities.<br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 0 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren"> <br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 1 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren"> <br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 2 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren">


So in reality we need to calculate the following:

<img src="https://equatio-api.texthelp.com/svg/P_0%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)" alt="P sub 0 equals P of open paren F sub 1 divides C l a. s s sub 0 close paren P of open paren F sub 2 divides C l a. s s sub 0 close paren P of open paren F sub 3 divides C l a. s s sub 0 close paren P of open paren F sub 4 divides C l a. s s sub 0 close paren"><img src="https://equatio-api.texthelp.com/svg/P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%5Cright)" alt="P of open paren C l a. s s sub 0 close paren"><br><img src="https://equatio-api.texthelp.com/svg/P_1%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)" alt="P sub 1 equals P of open paren F sub 1 divides C l a. s s sub 1 close paren P of open paren F sub 2 divides C l a. s s sub 1 close paren P of open paren F sub 3 divides C l a. s s sub 1 close paren P of open paren F sub 4 divides C l a. s s sub 1 close paren"><img src="https://equatio-api.texthelp.com/svg/P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%5Cright)" alt="P of open paren C l a. s s sub 1 close paren"><br>
<img src="https://equatio-api.texthelp.com/svg/P_2%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%5Cright)" alt="P sub 2 equals P of open paren F sub 1 divides C l a. s s sub 2 close paren P of open paren F sub 2 divides C l a. s s sub 2 close paren P of open paren F sub 3 divides C l a. s s sub 2 close paren P of open paren F sub 4 divides C l a. s s sub 2 close paren P of open paren C l a. s s sub 2 close paren">



We see which one is the greatest, and based on that we assign the class.

Those probabilities will be approximated using a distribution.
In this example, we will use the Gaussien Distribution.

##Gaussian Probability Density Function

We recall that teh Gaussien Probability density function is given by:
<br>
<img src="https://equatio-api.texthelp.com/svg/f%5Cleft(x%5Cright)%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Ctextcolor%7B%238D44AD%7D%7B%5Csigma%7D%7D%5Cexp%5Cleft%5C%7B-%5Cfrac%7B%5Cleft(x-%5Ctextcolor%7B%233697DC%7D%7Bmean%7D%5Cright)%5E2%7D%7B2%5Ctextcolor%7B%238D44AD%7D%7B%5Csigma%7D%5E2%7D%5Cright%5C%7D" alt="f of x equals 1 over the square root of 2 pi sigma the exp of open brace negative the fraction with numerator open paren x minus m e a. n close paren squared and denominator 2 sigma squared close brace">

Write a function that computes the probability using the formula above

In [None]:
def Gaussian_PDF(x, mean, sigma):
  prob = np.exp(-((x - mean)**2)/(2*sigma**2))/(sigma*np.sqrt(2*math.pi))
  return prob

##Naive Bayes Implementation

Write a naive bayes function that receives as input the dataframe df, the features, and the target name, and it returns the predicted class as output.

In [None]:
def naive_bayes (df, features, target_name):

  # Initailizing parameters
  n_examples = float(len(df))
  n_features = features.shape[0]

  n_class_0 = df[df[target_name] == 0]
  n_class_1 = df[df[target_name] == 1]
  n_class_2 = df[df[target_name] == 2]

  p_class0 = len(n_class_0) / n_examples
  p_class1 = len(n_class_1) / n_examples
  p_class2 = len(n_class_2) / n_examples

  mean_given_0 = np.mean(n_class_0, axis = 0)
  mean_given_1 = np.mean(n_class_1, axis = 0)
  mean_given_2 = np.mean(n_class_2, axis = 0)

  std_given_0 = np.std(n_class_0, axis = 0)
  std_given_1 = np.std(n_class_1, axis = 0)
  std_given_2 = np.std(n_class_2, axis = 0)

  p_features_given_0 = []
  p_features_given_1 = []
  p_features_given_2 = []

  for i in range(n_features):
    p_give_0 = Gaussian_PDF(features[i], mean_given_0[i], std_given_0[i])
    p_features_given_0.append(p_give_0)

    p_give_1 = Gaussian_PDF(features[i], mean_given_1[i], std_given_1[i])
    p_features_given_1.append(p_give_1)

    p_give_2 = Gaussian_PDF(features[i], mean_given_2[i], std_given_2[i])
    p_features_given_2.append(p_give_2)

  p0 = np.prod(p_features_given_0)*p_class0
  p1 = np.prod(p_features_given_1)*p_class1
  p2 = np.prod(p_features_given_2)*p_class2


  return np.argmax([p0, p1, p2])


Test Naive Bayes with a prediction.

Get the corresponding class for a flower having the following features [4.9, 3.0,	1.4,	0.2].

In [None]:
# Defining the features for the flower
flower_features = np.array([4.9, 3.0, 1.4, 0.2])

# Calling the naive_bayes function
predicted_class = naive_bayes(iris_df, flower_features, 'Target')

print("The flower is of class:", predicted_class)

The flower is of class: 0


  p_give_0 = Gaussian_PDF(features[i], mean_given_0[i], std_given_0[i])
  p_give_1 = Gaussian_PDF(features[i], mean_given_1[i], std_given_1[i])
  p_give_2 = Gaussian_PDF(features[i], mean_given_2[i], std_given_2[i])


See the performance of our NB model

Now here we will splot our data between 2 sets:

*   One from which the Naive Bayes Model will take the probabilities. (The **old** set) 80%
*   one that it hasn't seen before to test on it (The **new** set) 20%

In [None]:
x_old, x_new, y_old, y_new = train_test_split(iris_df.iloc[:,:-1],iris_df.iloc[:,-1], test_size=0.2)

Now use the function you built and get the corresponding testing predictions, and then compute the accuracy of your model.

In [None]:
old_dataframe = pd.concat([x_old, y_old], axis=1)

errors = 0
predictions = []

for i in range(len(x_new)):
  predicted_class = naive_bayes(old_dataframe, x_new.iloc[i], 'Target')
  predictions.append(predicted_class)
  if predicted_class != y_new.iloc[i]:
    errors += 1

accuracy = (len(x_new) - errors) / len(x_new)
print("Accuracy:", accuracy)

Accuracy: 0.9333333333333333


  p_give_0 = Gaussian_PDF(features[i], mean_given_0[i], std_given_0[i])
  p_give_1 = Gaussian_PDF(features[i], mean_given_1[i], std_given_1[i])
  p_give_2 = Gaussian_PDF(features[i], mean_given_2[i], std_given_2[i])
