# Naive Bayes Classification 

## Theory

Naive Bayes is a **probabilistic classifier** based on **Bayes' theorem** with a strong assumption that features are **conditionally independent** given the class. Despite this "naive" assumption, it often performs surprisingly well in practice.

Bayes' theorem is given by:

$P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}$

Where:  
- $P(C|X)$ is the **posterior probability** of class $C$ given features $X$.  
- $P(X|C)$ is the **likelihood** of observing features $X$ given class $C$.  
- $P(C)$ is the **prior probability** of class $C$.  
- $P(X)$ is the **evidence**, i.e., probability of observing $X$ (constant across classes).

### Gaussian Naive Bayes

When features are continuous, we assume each feature follows a **Gaussian (Normal) distribution** for each class. The probability density function is:

$P(x_i | C) = \frac{1}{\sqrt{2 \pi \sigma_C^2}} \exp\left( - \frac{(x_i - \mu_C)^2}{2 \sigma_C^2} \right)$

Where:  
- $x_i$ is the feature value.  
- $\mu_C$ is the mean of the feature for class $C$.  
- $\sigma_C^2$ is the variance of the feature for class $C$.

The **class with the highest posterior probability** is assigned to the sample:

$\hat{C} = \arg\max_C P(C|X) = \arg\max_C P(C) \prod_i P(x_i|C)$

### Steps in Gaussian Naive Bayes:

1. **Separate the training data** by class.  
2. **Compute mean and variance** for each feature per class.  
3. **Calculate likelihoods** using the Gaussian formula.  
4. **Multiply likelihoods with class prior** to get class probabilities.  
5. **Predict the class** with the highest posterior probability.

**Key Assumption:** Features are conditionally independent given the class.  

**Advantages:**  
- Simple, fast, and effective.  
- Works well even with small datasets.  

**Limitations:**  
- Independence assumption may not hold in real data.  
- Sensitive to irrelevant features.

----
----

# Gaussian Naive Bayes Step-by-Step Example

We illustrate **Gaussian Naive Bayes** on a toy dataset with **continuous features** and **binary classes**.

| Feature 1 (X1) | Feature 2 (X2) | Class |
|----------------|----------------|-------|
| 1.0            | 2.0            | A     |
| 1.2            | 1.8            | A     |
| 2.0            | 3.0            | B     |
| 2.2            | 3.2            | B     |

We want to **predict the class** for a new sample:  
**X = [1.1, 2.1]**

---

## Step 0: Outline

1. Compute **prior probabilities** for each class.  
2. Compute **likelihoods** for each feature (for Gaussian, use mean & variance).  
3. Multiply likelihoods and prior to get **unnormalized posterior**.  
4. **Normalize** posteriors so they sum to 1.  
5. Choose the class with the **highest posterior probability**.


---

## Step 1: Compute class priors

Count of each class:

- Class A: 2  
- Class B: 2  
- Total: 4  

$$
P(A) = \frac{2}{4} = 0.5, \quad P(B) = \frac{2}{4} = 0.5
$$

---

## Step 2: Compute mean and variance for each feature per class

**Class A:**  
- Feature 1: $\mu_1 = 1.1$, $\sigma_1^2 = 0.01$  
- Feature 2: $\mu_2 = 1.9$, $\sigma_2^2 = 0.01$  

**Class B:**  
- Feature 1: $\mu_1 = 2.1$, $\sigma_1^2 = 0.01$  
- Feature 2: $\mu_2 = 3.1$, $\sigma_2^2 = 0.01$  

---

## Step 3: Compute Gaussian probability density for each feature

Gaussian PDF formula:  

$$
P(x_i \mid C) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)
$$

### Class A

- Feature 1: $P(X_1=1.1 \mid A) \approx 3.989$  
- Feature 2: $P(X_2=2.1 \mid A) \approx 0.054$  

### Class B

- Feature 1: $P(X_1=1.1 \mid B) \approx 1.87 \cdot 10^{-11}$  
- Feature 2: $P(X_2=2.1 \mid B) \approx 1.87 \cdot 10^{-11}$  

---

## Step 4: Compute unnormalized posterior (likelihood × prior)

$$
\tilde{P}(A \mid X) = P(A) \cdot P(X_1 \mid A) \cdot P(X_2 \mid A) \approx 0.5 \cdot 3.989 \cdot 0.054 \approx 0.108
$$

$$
\tilde{P}(B \mid X) = P(B) \cdot P(X_1 \mid B) \cdot P(X_2 \mid B) \approx 0.5 \cdot (1.87 \cdot 10^{-11})^2 \approx 1.75 \cdot 10^{-22}
$$

---

## Step 5: Normalize to get actual posterior probabilities

Normalization ensures the probabilities sum to 1:

$$
P(A \mid X) = \frac{\tilde{P}(A \mid X)}{\tilde{P}(A \mid X) + \tilde{P}(B \mid X)} \approx \frac{0.108}{0.108 + 1.75 \cdot 10^{-22}} \approx 1.0
$$

$$
P(B \mid X) = \frac{\tilde{P}(B \mid X)}{\tilde{P}(A \mid X) + \tilde{P}(B \mid X)} \approx \frac{1.75 \cdot 10^{-22}}{0.108 + 1.75 \cdot 10^{-22}} \approx 0.0
$$

---

## Step 6: Predict class

Since $P(A \mid X) > P(B \mid X)$, we **predict class = A** for the sample $X=[1.1,2.1]$.



-----
-----

## Exercise: Implement Gaussian Naive Bayes for the Iris Dataset 

In this exercise, you will manually implement Gaussian Naive Bayes using the Iris dataset. You will **NOT** use sklearn's Naive Bayes classifier; instead, you will follow the step-by-step process of computing probabilities, likelihoods, and predictions.

You can use the following Python modules:
```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
```

---

## Step 0: Load the dataset
1. Load the Iris dataset using `load_iris()`
2. Store the features in `X` and the target labels in `y`  
3. Split the dataset into train and test sets using `train_test_split()` (use `test_size=0.3` and `random_state=42`)

**Hint:** Use `iris.data` for features and `iris.target` for labels

---

## Step 1: Create the training function
Create a function `train_naive_bayes(X_train, y_train)` that:
- Gets unique classes using `np.unique(y_train)`
- For each class, finds all samples belonging to that class
- Calculates and stores: mean, variance, and prior probability

**Hints:**
- Use `X_train[y_train == cls]` to get samples for a specific class
- Use `np.mean(class_data, axis=0)` for mean of each feature
- Use `np.var(class_data, axis=0)` for variance of each feature
- Prior = number of samples in class / total samples

**Return:** A dictionary {} with statistics for each class

---

## Step 2: Create the prediction function
Create a function `predict_sample(x, stats)` that:
- Loops through each class in the statistics
- Calculates likelihood using the Gaussian formula
- Multiplies likelihood by prior probability
- Returns the class with highest probability

**Gaussian Formula:**
```python
likelihood = (1 / np.sqrt(2 * np.pi * variance)) * np.exp(-0.5 * ((x - mean)**2) / variance)
```

**Hints:**
- Use `np.prod()` to multiply likelihoods of all features
- Keep track of best probability and best class
- Return the class with the highest probability

---

## Step 3: Train your model
- Call `train_naive_bayes(X_train, y_train)` to get model statistics
- Store the result in a variable called `model`

---

## Step 4: Make predictions
- Use list comprehension to predict each sample in the test set
- `predictions = [predict_sample(x, model) for x in X_test]`

---

## Step 5: Calculate accuracy
- Use `np.mean(predictions == y_test)` to calculate accuracy
- Print the result

---

## Step 6: Print model details 
Print the statistics for each class to understand what the model learned:
- Prior probability
- Mean of each feature  
- Variance of each feature

---

## Expected Output Structure:
```
Accuracy: 0.XX

Model Statistics:
Class 0: Prior = 0.XXX
  Mean: [X.XX X.XX X.XX X.XX]
  Variance: [X.XX X.XX X.XX X.XX]

Class 1: Prior = 0.XXX
  Mean: [X.XX X.XX X.XX X.XX]  
  Variance: [X.XX X.XX X.XX X.XX]

Class 2: Prior = 0.XXX
  Mean: [X.XX X.XX X.XX X.XX]
  Variance: [X.XX X.XX X.XX X.XX]
```

---



# A simplified NV implementation using Sklearn

In [None]:
# Import libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset
iris = load_iris()
X = iris.data  # features
y = iris.target  # target labels

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a Gaussian Naive Bayes classifier
nb_model = GaussianNB()

# Train the model
nb_model.fit(X_train, y_train)

# Make predictions
y_pred = nb_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# The end