#### Case Study: Iris Dataset
DataSet Columns:<br>

* Petal Height
* Petal Width
* Sepal Height
* Sepal Width
* Target: The kind of the Iris flower (Virginica, Setosa, Versicolor)

**Importing Libraries**

In [68]:
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt 

**Loading Dataset**

In [69]:
df = pd.read_csv('iris.csv')
df.columns = ['Petal Height', 'Petal Width', 'Sepal Height', 'Sepal Width', 'Target']
df.head()

Unnamed: 0,Petal Height,Petal Width,Sepal Height,Sepal Width,Target
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa


**Data Preprocessing**

In [70]:
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
df['Target'] = encoder.fit_transform(df['Target']) 
df.head()

Unnamed: 0,Petal Height,Petal Width,Sepal Height,Sepal Width,Target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [71]:
df.isnull().any()

Petal Height    False
Petal Width     False
Sepal Height    False
Sepal Width     False
Target          False
dtype: bool

Finding different Classes

In [72]:
targets = df['Target'].value_counts()
targets

0    50
1    50
2    50
Name: Target, dtype: int64

the basic formula that for Naive Bayes is:<br>

<img src="https://equatio-api.texthelp.com/svg/%5C%20P(%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D)%3D%5Cfrac%7BP(%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D%7C%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D)%5Ccdot%20P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass%7D%5Cright)%7D%7BP(%5Ctextcolor%7B%23E94D40%7D%7BFeatures%7D)%7D" alt="P of open paren C l a. s s divides F of e a. t u r e s close paren equals the fraction with numerator P of open paren F of e a. t u r e s divides C l a. s s close paren times P of open paren C l a. s s close paren and denominator P of F of e a. t u r e s">

Since we have 3 classes, and 4 features, we need to calculate the following probabilities.<br>

<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 0 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren"> <br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 1 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren"> <br>
<img src="https://equatio-api.texthelp.com/svg/P(%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7C%5Ctextcolor%7B%23E94D40%7D%7BF1%2CF2%2CF3%2CF4%7D)" alt="P of open paren C l a. s s sub 2 divides F of 1 comma F of 2 comma F of 3 comma F of 4 close paren">


So we need to calculate the following:<br>

<img src="https://equatio-api.texthelp.com/svg/P_0%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%7D)" alt="P sub 0 equals P of open paren F sub 1 divides C l a. s s sub 0 close paren P of open paren F sub 2 divides C l a. s s sub 0 close paren P of open paren F sub 3 divides C l a. s s sub 0 close paren P of open paren F sub 4 divides C l a. s s sub 0 close paren"><img src="https://equatio-api.texthelp.com/svg/P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_0%7D%5Cright)" alt="P of open paren C l a. s s sub 0 close paren"><br><img src="https://equatio-api.texthelp.com/svg/P_1%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%7D)" alt="P sub 1 equals P of open paren F sub 1 divides C l a. s s sub 1 close paren P of open paren F sub 2 divides C l a. s s sub 1 close paren P of open paren F sub 3 divides C l a. s s sub 1 close paren P of open paren F sub 4 divides C l a. s s sub 1 close paren"><img src="https://equatio-api.texthelp.com/svg/P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_1%7D%5Cright)" alt="P of open paren C l a. s s sub 1 close paren"><br>
<img src="https://equatio-api.texthelp.com/svg/P_2%3DP(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_1%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_2%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_3%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P(%5Ctextcolor%7B%232B7FBB%7D%7B%5Ctextcolor%7B%23E94D40%7D%7BF_4%7D%7D%7C%5Ctextcolor%7B%23E94D40%7D%7B%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%7D)P%5Cleft(%5Ctextcolor%7B%232B7FBB%7D%7BClass_2%7D%5Cright)" alt="P sub 2 equals P of open paren F sub 1 divides C l a. s s sub 2 close paren P of open paren F sub 2 divides C l a. s s sub 2 close paren P of open paren F sub 3 divides C l a. s s sub 2 close paren P of open paren F sub 4 divides C l a. s s sub 2 close paren P of open paren C l a. s s sub 2 close paren">



Those probabilities will be approximated using a distribution. In this example, we will use the Gaussien Distribution.

*Gaussian Probability Density Function*<br>

<img src="https://equatio-api.texthelp.com/svg/f%5Cleft(x%5Cright)%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Ctextcolor%7B%238D44AD%7D%7B%5Csigma%7D%7D%5Cexp%5Cleft%5C%7B-%5Cfrac%7B%5Cleft(x-%5Ctextcolor%7B%233697DC%7D%7Bmean%7D%5Cright)%5E2%7D%7B2%5Ctextcolor%7B%238D44AD%7D%7B%5Csigma%7D%5E2%7D%5Cright%5C%7D" alt="f of x equals 1 over the square root of 2 pi sigma the exp of open brace negative the fraction with numerator open paren x minus m e a. n close paren squared and denominator 2 sigma squared close brace">

In [73]:
def probability(x, mean, sigma):
  prob = np.exp(-((x-mean)**2)/(2*sigma**2)) / (sigma*np.sqrt(2*math.pi))
  return prob

**Naive Bayes Implementation from scratch**

In [74]:
def naive_bayes (df, features, target_name):

  n_examples = float(len(df))
  n_features = features.shape[0]

  # separate classes
  class_0 = df[df[target_name]==0]
  class_1 = df[df[target_name]==1]
  class_2 = df[df[target_name]==2]

  # probability of each class
  p_class_0 = len(class_0) / n_examples
  p_class_1 = len(class_1) / n_examples
  p_class_2 = len(class_2) / n_examples

  # std and mean for each feature given each class
  std_given_0 = np.std(class_0, axis=0)
  std_given_1 = np.std(class_1, axis=0)
  std_given_2 = np.std(class_2, axis=0) 

  mean_given_0 = np.mean(class_0, axis=0)
  mean_given_1 = np.mean(class_1, axis=0)
  mean_given_2 = np.mean(class_2, axis=0) 

  # probability of features given a specific class
  p_f_given_0 =[]
  p_f_given_1 =[]
  p_f_given_2 =[]

  for i in range(n_features):
    p_given_0 = probability(features[i], mean_given_0[i], std_given_0[i])
    p_f_given_0.append(p_given_0)

    p_given_1 = probability(features[i], mean_given_1[i], std_given_1[i])
    p_f_given_1.append(p_given_1)

    p_given_2 = probability(features[i], mean_given_2[i], std_given_2[i])
    p_f_given_2.append(p_given_2)

  p0 = np.prod(p_f_given_0)*p_class_0
  p1 = np.prod(p_f_given_1)*p_class_1
  p2 = np.prod(p_f_given_2)*p_class_2 

  return np.argmax([p0, p1, p2])

Test Naive Bayes with a prediction

In [75]:
features = np.array([4.9, 3.0,	1.4,	0.2])
result = naive_bayes(df, features, 'Target')
print("This Iris flower is in the class ",result) 

This Iris flower is in the class  0


Let's see the performance of our NB model

In [76]:
from sklearn.model_selection import train_test_split
X_old, X_new, y_old, y_new = train_test_split(df.iloc[:, :-1], df.iloc[:, -1], test_size=0.2)

In [77]:
old_dataframe = pd.concat([X_old, y_old], axis=1)

errors = 0
for i in range(len(X_new)):
  res = naive_bayes(old_dataframe, X_new.iloc[i,:], 'Target')
  if res!=y_new.iloc[i]:
    errors +=1 

accuracy = (len(X_new) - errors) * 100 / len(X_new)
print(f"Accuracy of our Naive Bayes is: {accuracy}"+ " %")

Accuracy of our Naive Bayes is: 93.33333333333333 %
