# 6. Gaussian Naive Bayes

**Purpose:** Learn and revise **Gaussian Naive Bayes** in Scikit-learn.

---

## What is Gaussian Naive Bayes?

**Naive Bayes** classifiers use **Bayes' theorem** and assume features are **conditionally independent** given the class. For **Gaussian Naive Bayes**, we assume that **each feature** (given the class) is normally distributed:

\[
P(x_j \mid y) = \frac{1}{\sqrt{2\pi \sigma_j^2}} \exp\left(-\frac{(x_j - \mu_j)^2}{2\sigma_j^2}\right)
\]

Then:
\[
P(y \mid X) \propto P(y) \prod_{j} P(x_j \mid y)
\]

- The model estimates **mean \( \mu_j \)** and **variance \( \sigma_j^2 \)** per feature per class from the training data.
- **"Naive"** = independence assumption between features; **"Gaussian"** = continuous features modeled with a normal distribution.

## Concepts to Remember

| Concept | Description |
|--------|-------------|
| **Prior** | \( P(y) \) — class frequencies (or Laplace-smoothed). |
| **Likelihood** | \( P(x_j \mid y) \) — Gaussian with class-specific mean and variance. |
| **Independence** | \( P(X \mid y) = \prod_j P(x_j \mid y) \) — simplifies estimation. |
| **When to use** | Classification with **continuous** features that are roughly bell-shaped per class. |

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [None]:
# Two classes with Gaussian-distributed features
np.random.seed(42)
X0 = np.random.randn(60, 2) + np.array([0, 0])
X1 = np.random.randn(60, 2) + np.array([2, 2])
X = np.vstack([X0, X1])
y = np.array([0]*60 + [1]*60)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion matrix:\n", confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
print("Class priors:", model.class_prior_)

## Key Takeaways

- **GaussianNB** has no hyperparameters to tune (besides optional **var_smoothing** for numerical stability).
- Use when features are **continuous** and approximately Gaussian per class.
- Fast to train and works well when the independence assumption is not severely violated.