<a href="https://colab.research.google.com/github/ABHAY7238/Road-to-Data-Scientist-/blob/main/Naive_bayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Bayes' Theorem

**Definition:** Bayes' Theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. It is a fundamental concept in probability theory and statistics, used to update the probability estimate for a hypothesis as more evidence or information becomes available. It's often stated as 'the probability of A given B equals the probability of B given A, times the probability of A, divided by the probability of B'.

**Formula:**

```
P(A|B) = [P(B|A) * P(A)] / P(B)
```

Where:
*   `P(A|B)` is the posterior probability: the probability of event A occurring given that event B has occurred.
*   `P(B|A)` is the likelihood: the probability of event B occurring given that event A has occurred.
*   `P(A)` is the prior probability of A: the initial probability of event A occurring before any new evidence is considered.
*   `P(B)` is the marginal probability of B: the probability of event B occurring.

### Naive Bayes Algorithm

**Definition:** The Naive Bayes algorithm is a classification technique based on Bayes' Theorem, but with a strong (and 'naive') assumption of independence among predictors. This means it assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Despite this simplifying assumption, Naive Bayes classifiers have proven to be quite effective in many real-world scenarios, especially in text classification and spam detection.

**Formula (for a classifier):**

For a given class `Ck` and a dependent feature vector `x = (x1, x2, ..., xn)`, the Naive Bayes classifier predicts the class `Ck` such that:

```
P(Ck | x1, ..., xn) = P(Ck) * P(x1 | Ck) * P(x2 | Ck) * ... * P(xn | Ck) / P(x1, ..., xn)
```

Since `P(x1, ..., xn)` is constant for all classes, we can simplify the decision rule to:

```
Class = argmax[Ck] P(Ck) * P(x1 | Ck) * P(x2 | Ck) * ... * P(xn | Ck)
```

Where:
*   `P(Ck | x1, ..., xn)` is the posterior probability of class `Ck` given the predictors `x1` to `xn`.
*   `P(Ck)` is the prior probability of class `Ck`.
*   `P(xi | Ck)` is the likelihood of predictor `xi` given class `Ck`.
*   `P(x1, ..., xn)` is the marginal probability of the predictors (often ignored in classification as it's a normalizing constant).
*   `argmax[Ck]` means we select the class `Ck` that maximizes the posterior probability.

In [3]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
#now import naive bayes libraries
from sklearn.naive_bayes import GaussianNB , MultinomialNB , BernoulliNB
#now import train test split
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score , confusion_matrix , classification_report
from sklearn.datasets import load_iris

In [6]:
#load the datasets
iris = load_iris()
#now divide the dataset into x and y
x = iris.data
y = iris.target
#now train_test_split the data
x_train , x_test , y_train , y_test = train_test_split(x , y , test_size = 0.25 , random_state = 42)

In [12]:
#intialize the model
model = GaussianNB()
#fit the model
model.fit(x_train , y_train)
#predict the test data
y_pred = model.predict(x_test)
#evaluate the model
print('accuracy_score : ',accuracy_score(y_test , y_pred))
print('confusion_matrix : \n',confusion_matrix(y_test , y_pred))
print('classification_report : \n',classification_report(y_test , y_pred))

accuracy_score :  1.0
confusion_matrix : 
 [[15  0  0]
 [ 0 11  0]
 [ 0  0 12]]
classification_report : 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        12

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38



## NOW USING THE MULTINOMIAL MODEL

In [15]:
model  = MultinomialNB()
model.fit(x_train , y_train)

#predict the model
y_pred = model.predict(x_test)

#evaluate the model
print('accuracy_score : ',accuracy_score(y_test , y_pred))
print('confusion_matrix : \n',confusion_matrix(y_test , y_pred))
print('classification_report : \n',classification_report(y_test , y_pred))


accuracy_score :  0.9736842105263158
confusion_matrix : 
 [[15  0  0]
 [ 0 11  0]
 [ 0  1 11]]
classification_report : 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       0.92      1.00      0.96        11
           2       1.00      0.92      0.96        12

    accuracy                           0.97        38
   macro avg       0.97      0.97      0.97        38
weighted avg       0.98      0.97      0.97        38



## NOW USING THE BERNOULIS MODEL

In [16]:
model = BernoulliNB()
model.fit(x_train , y_train)

#predict the model
y_pred = model.predict(x_test)

#evaluate the model
print('accuracy_score : ',accuracy_score(y_test , y_pred))
print('confusion_matrix : \n',confusion_matrix(y_test , y_pred))
print('classification_report : \n',classification_report(y_test , y_pred))

accuracy_score :  0.2894736842105263
confusion_matrix : 
 [[ 0 15  0]
 [ 0 11  0]
 [ 0 12  0]]
classification_report : 
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        15
           1       0.29      1.00      0.45        11
           2       0.00      0.00      0.00        12

    accuracy                           0.29        38
   macro avg       0.10      0.33      0.15        38
weighted avg       0.08      0.29      0.13        38



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
