# NAIVE Bayes algorithm

Naive Bayes Algorithm is a classification algorithm based on Bayes Theorem. It is called naive because it assumes that the features in a dataset are independent of each other. This assumption is not true in real life but it simplifies the computation and gives good results in most of the cases.

## Bayes Theorem

Bayes Theorem is a mathematical formula used for calculating conditional probability. It is defined as:

$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$

where A and B are events and P(B) != 0

## Naive Bayes Algorithm

Naive Bayes Algorithm is based on Bayes Theorem. It is defined as:

$$P(y|x_1,x_2,...,x_n) = \frac{P(x_1,x_2,...,x_n|y)P(y)}{P(x_1,x_2,...,x_n)}$$

where y is the class variable and x1, x2, ..., xn are the features.

The algorithm assumes that the features are independent of each other. So, the above equation can be written as:

$$P(y|x_1,x_2,...,x_n) = \frac{P(x_1|y)P(x_2|y)...P(x_n|y)P(y)}{P(x_1,x_2,...,x_n)}$$

The denominator is constant for a given input. So, the equation can be written as:

$$P(y|x_1,x_2,...,x_n) \propto P(x_1|y)P(x_2|y)...P(x_n|y)P(y)$$

The class with the highest probability is the output of the algorithm.

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# ignore warnings
import warnings
warnings.filterwarnings('ignore')

In [3]:
# load the dataset
df = sns.load_dataset('iris')
print(df.head())
X = df.drop('species', axis=1)
y = df['species']

# train test split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa


In [4]:
# model initialize
gnb = GaussianNB()

# train the model
gnb.fit(X_train, y_train)

# predict the test data
y_pred = gnb.predict(X_test)

# evaluate the model
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  1.0
Confusion Matrix: 
 [[15  0  0]
 [ 0 11  0]
 [ 0  0 12]]
Classification Report: 
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       1.00      1.00      1.00        11
   virginica       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38



In [5]:
# model initialize
mnb = MultinomialNB()

# train the model
mnb.fit(X_train, y_train)

# predict the test data
y_pred = mnb.predict(X_test)

# evaluate the model
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.9736842105263158
Confusion Matrix: 
 [[15  0  0]
 [ 0 11  0]
 [ 0  1 11]]
Classification Report: 
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       0.92      1.00      0.96        11
   virginica       1.00      0.92      0.96        12

    accuracy                           0.97        38
   macro avg       0.97      0.97      0.97        38
weighted avg       0.98      0.97      0.97        38



In [6]:
# model initialize
bnb = BernoulliNB()

# train the model
bnb.fit(X_train, y_train)

# predict the test data
y_pred = bnb.predict(X_test)

# evaluate the model
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.2894736842105263
Confusion Matrix: 
 [[ 0 15  0]
 [ 0 11  0]
 [ 0 12  0]]
Classification Report: 
               precision    recall  f1-score   support

      setosa       0.00      0.00      0.00        15
  versicolor       0.29      1.00      0.45        11
   virginica       0.00      0.00      0.00        12

    accuracy                           0.29        38
   macro avg       0.10      0.33      0.15        38
weighted avg       0.08      0.29      0.13        38



---