**NAIVE Bayes algorithm**\
Naive Bayes Algorithm is a classification algorithm based on Bayes Theorem. It is called naive because it assumes that the features in a dataset are independent of each other. This assumption is not true in real life but it simplifies the computation and gives good results in most of the cases\
Naive bayes algorithm is used in text classification, spam filtering, sentiment analysis, and recommendation systems.\
Naive base is a probabilistic classifier that calculates the probability of each category and the conditional probability of each feature given each category. The category with the highest probability is the output of the model.

**Bayes Theorem**
Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred. Bayes’ theorem is stated mathematically as the following equation:
![image.png](attachment:image.png)

Where
- P(A|B) is the probability of hypothesis A given that data B is true.
- P(B|A) is the probability of data B given that hypothesis A is true.
- P(A) and P(B) are the probabilities of A and B being true independently of each other.
**Example of Naive Bayes Algorithm**
Imagine you are a teacher with a class of students and you know following information
- 60% stdents own a bicycle.
- You also know that 60% of students who own a bicycle, 30% bring their bicycle to school
- Of those students who do not owm a bicycle, 10% bring their bicycle to school (may be they borrow one)
- Now if you see a student on bicycle in school, what is the probability that he owns a bicycle?
- A is the event that student owns a bicycle
- B is the event that student brings bicycle to school
- P(A) = 0.6 (Probability that student owns a bicycle)
- P(B/A) = 0.3 (Probability that student brings bicycle to school given that he owns a bicycle)
- p(B|A') = 0.1 (Probability that student brings bicycle to school given that he does not own a bicycle)
- Applying Bayes theorem
- P(A|B) = P(B|A) * P(A) / P(B)
- P(B) = P(B|A) * P(A) + P(B|A') * P(A') = 0.3 * 0.6 + 0.1 * 0.4 = 0.22 # Law of Total probability of bringing bicycle to school
- P(A|B) = 0.3 * 0.6 / (022) = 0.82


# Types of NB Classifier
- **Multinomial Naive Bayes**: It is used when the features are discrete. For example, in text classification, the features are the frequency of words in the document.
- **Bernoulli Naive Bayes**: It is used when the features are binary. For example, in text classification, the features are the presence or absence of a word in the document.
- **Gaussian Naive Bayes**: It is used when the features are continuous. For example, in a dataset of house prices, the features are the area, number of bedrooms, and the price of the house.
# Applications of Naive Bayes Algorithm
- Email spam detection
- Sentiment analysis
- Document categorization
- Medical diagnosis
  # Advantages
- It is simple and easy to implement.
- It gives good results in most of the cases.
- It is computationally fast and can predict the class of the test
- It can be used for binary and multiclass classification problems.
  # Limitations
- It assumes that the features are independent of each other which is not true in real life.
- Data scarcity is a problem. If a category in the test data set is not present in the training dataset, then the model will assign a probability of 0 and will be unable to make a prediction.
- It can be sensitive to irrelevant features.
- Does not perform well for highly correlated features.
  

In [6]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_iris

In [9]:
iris=load_iris()
X=iris.data
y=iris.target

# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [10]:
gnb=GaussianNB()
gnb.fit(X_train, y_train)

In [11]:
y_pred=gnb.predict(X_test)

In [12]:
print("Accuracy: ", accuracy_score(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification report: \n", classification_report(y_test, y_pred))

Accuracy:  1.0
Confusion matrix: 
 [[16  0  0]
 [ 0 18  0]
 [ 0  0 11]]
Classification report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        18
           2       1.00      1.00      1.00        11

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



In [14]:
# use multinomial naive bayes
mnb=MultinomialNB()
mnb.fit(X_train, y_train)
y_pred=mnb.predict(X_test)
print("Accuracy: ", accuracy_score(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification report: \n", classification_report(y_test, y_pred))

Accuracy:  0.6
Confusion matrix: 
 [[16  0  0]
 [ 0  0 18]
 [ 0  0 11]]
Classification report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       0.00      0.00      0.00        18
           2       0.38      1.00      0.55        11

    accuracy                           0.60        45
   macro avg       0.46      0.67      0.52        45
weighted avg       0.45      0.60      0.49        45



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [15]:
# use bernoulli naive bayes
bnb=BernoulliNB()
bnb.fit(X_train, y_train)
y_pred=bnb.predict(X_test)
print("Accuracy: ", accuracy_score(y_test, y_pred))
print("Confusion matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification report: \n", classification_report(y_test, y_pred))

Accuracy:  0.24444444444444444
Confusion matrix: 
 [[ 0  0 16]
 [ 0  0 18]
 [ 0  0 11]]
Classification report: 
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        16
           1       0.00      0.00      0.00        18
           2       0.24      1.00      0.39        11

    accuracy                           0.24        45
   macro avg       0.08      0.33      0.13        45
weighted avg       0.06      0.24      0.10        45



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
