# Naive Bayes Algorithm

Naive bayes algorithm is a classification technique based on Bayes' theorem with an assumption of independence among predictors. A Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Even if the features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that a particular fruit is an apple or an orange or a banana.

The Naive Bayes algorithm is simple and effective and should be one of the first methods you try on a classification problem. 

## Types of Naive Bayes Algorithm

There are three types of Naive Bayes Algorithm:

1. **Multinomial Naive Bayes**: It is used when the features are categorical. It is primarily used for document classification problems, it is the most widely used algorithm for text classification.

2. **Bernoulli Naive Bayes**: It is used when the features are binary. For example, when working with text data, the Bernoulli Naive Bayes is useful if you want to know if a word appears in a document.

3. **Gaussian Naive Bayes**: It is used when the features are continuous. It is assumed that the features follow a normal distribution.

### Assumptions

The assumptions made by Naive Bayes are:

1. The features are independent of each other.
2. The features are equally important.
3. The features follow a normal distribution.
4. The features are categorical.
5. The features are binary.
7. The features are continuous.
8. The features are normally distributed.

## Steps to Implement Naive Bayes Algorithm

1. **Import the Libraries**: Import the necessary libraries.
2. **Load the Data**: Load the dataset on which you want to train the model.
3. **Data Preprocessing**: Preprocess the data by handling missing values, encoding categorical data, and splitting the data into training and testing sets.
4. **Train the Model**: Train the Naive Bayes model on the training set.
5. **Predict the Test Set**: Predict the test set results.
6. **Evaluate the Model**: Evaluate the model by calculating the accuracy of the model.

### Summary

In summary, the Naive Bayes algorithm is a simple and effective algorithm for classification problems. It is based on Bayes' theorem with an assumption of independence among predictors. There are three types of Naive Bayes Algorithm: Multinomial Naive Bayes, Bernoulli Naive Bayes, and Gaussian Naive Bayes. Each type is used for different types of features: categorical, binary, and continuous respectively.

In [2]:
# Import libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_iris

In [3]:
# dataset
iris = load_iris()

# Splitting the dataset into X and y
X = iris.data
y = iris.target

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# model
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)

# evaluation matrix
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.9777777777777777
Confusion Matrix: 
 [[19  0  0]
 [ 0 12  1]
 [ 0  0 13]]
Classification Report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.92      0.96        13
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45



In [4]:
# model
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
y_pred = mnb.predict(X_test)

# evaluation matrix
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.9555555555555556
Confusion Matrix: 
 [[19  0  0]
 [ 0 12  1]
 [ 0  1 12]]
Classification Report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       0.92      0.92      0.92        13
           2       0.92      0.92      0.92        13

    accuracy                           0.96        45
   macro avg       0.95      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45



In [5]:
# model
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
y_pred = bnb.predict(X_test)

# evaluation matrix
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.28888888888888886
Confusion Matrix: 
 [[ 0 19  0]
 [ 0 13  0]
 [ 0 13  0]]
Classification Report: 
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        19
           1       0.29      1.00      0.45        13
           2       0.00      0.00      0.00        13

    accuracy                           0.29        45
   macro avg       0.10      0.33      0.15        45
weighted avg       0.08      0.29      0.13        45



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
