# Naive Bayes Classifier

In this notebook, we will explore the **Naive Bayes** algorithm, a probabilistic classifier based on Bayes’ Theorem:

\[ P(A|B) = \frac{P(B|A)P(A)}{P(B)} \]

Naive Bayes assumes that features are **independent**, which makes it simple and efficient. It is often used for **text classification** (e.g., spam detection).

## 1. Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## 2. Load Dataset
We will use the **Iris dataset**.

In [None]:
iris = load_iris()
X = iris.data
y = iris.target

print("Features:", iris.feature_names)
print("Classes:", iris.target_names)
print("Shape:", X.shape)

## 3. Train-Test Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print("Training size:", X_train.shape)
print("Test size:", X_test.shape)

## 4. Train Naive Bayes Model

In [None]:
nb = GaussianNB()
nb.fit(X_train, y_train)

print("Model trained successfully!")

## 5. Predictions

In [None]:
y_pred = nb.predict(X_test)
print("Predictions:", y_pred[:10])
print("Actual:", y_test[:10])

## 6. Model Evaluation

In [None]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, cmap="Blues", xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

## 7. Key Notes
- Naive Bayes is **fast and efficient** for large datasets.
- Works well for **text data** (spam filtering, sentiment analysis).
- Assumes independence among features (often not true in reality).
- Common variants:
  - **GaussianNB** → continuous data
  - **MultinomialNB** → discrete counts (word frequencies)
  - **BernoulliNB** → binary features