# Naive Bayes

Naive Bayes is a probabilistic machine learning algorithm based on Bayes' Theorem. It assumes that the presence (or absence) of a particular feature in a class is independent of the presence (or absence) of any other feature, hence the term "naive." Despite its simplicity, it works well for various real-world problems.

In this notebook, we will cover the intuition behind Naive Bayes, its types, advantages, disadvantages, and demonstrate its implementation in Python using the `scikit-learn` library.

---

## Table of Contents

1. [What is Naive Bayes?](#1-what-is-naive-bayes)
2. [Types of Naive Bayes Classifiers](#2-types-of-naive-bayes-classifiers)
3. [Advantages and Disadvantages of Naive Bayes](#3-advantages-and-disadvantages-of-naive-bayes)
4. [Use Cases of Naive Bayes](#4-use-cases-of-naive-bayes)
5. [Implementing Naive Bayes in Python](#5-implementing-naive-bayes-in-python)
6. [Evaluating the Naive Bayes Model](#6-evaluating-the-naive-bayes-model)

---

## 1. What is Naive Bayes?

Naive Bayes is a simple and effective classification algorithm based on Bayes' Theorem with an assumption of independence among features. Bayes' Theorem calculates the probability of a class given a feature set by combining prior probability with the likelihood of the data:

\[
P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}
\]

Where:
- \( P(C|X) \) is the posterior probability of class \( C \) given the feature \( X \).
- \( P(X|C) \) is the likelihood of feature \( X \) given class \( C \).
- \( P(C) \) is the prior probability of class \( C \).
- \( P(X) \) is the prior probability of feature \( X \).

---

## 2. Types of Naive Bayes Classifiers

### 1. **Gaussian Naive Bayes**:
   - Assumes that features follow a Gaussian (normal) distribution.
   - Used for continuous data.

### 2. **Multinomial Naive Bayes**:
   - Used for discrete data, especially for text classification (e.g., document classification).
   - Assumes the feature vectors represent frequencies or counts of different outcomes.

### 3. **Bernoulli Naive Bayes**:
   - Designed for binary/boolean features (e.g., 0 and 1).
   - It is commonly used for document classification tasks where the input data is represented by binary features.

---


## 3. Advantages and Disadvantages of Naive Bayes

### Advantages:
- **Simple and Fast**: Easy to implement and fast to compute.
- **Works Well with Small Data**: Performs well on small datasets, especially for text classification.
- **Handles Missing Data**: Can handle missing data points effectively by ignoring them.
- **Performs Well with High Dimensional Data**: Especially in problems like spam detection, text classification, etc.

### Disadvantages:
- **Naive Assumption**: The assumption that features are independent is rarely true in real-world scenarios.
- **Zero Probability Problem**: If a category is missing from the training set, the algorithm assigns it a zero probability, leading to potential issues.
- **Sensitive to Imbalanced Data**: Performs poorly with imbalanced datasets unless properly managed.

---

## 4. Use Cases of Naive Bayes

Naive Bayes is commonly used in:
- **Spam Detection**: Classifying emails as spam or not spam.
- **Text Classification**: Sentiment analysis, document categorization.
- **Medical Diagnosis**: Predicting diseases based on patient symptoms.
- **Recommender Systems**: Personalizing user recommendations by classifying preferences.

---

## 5. Implementing Naive Bayes in Python

Below is a Python implementation of the Naive Bayes algorithm using `scikit-learn`. We will use the **Iris** dataset for this example, as it’s a well-known dataset for classification tasks.

In [1]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [3]:
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [4]:
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)

## 6. Evaluating the Naive Bayes Model

In [6]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of Naive Bayes model: {accuracy:.2f}')

conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)

class_report = classification_report(y_test, y_pred)
print("Classification Report:\n", class_report)

Accuracy of Naive Bayes model: 1.00
Confusion Matrix:
 [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

