# Naive Bayes
---

### ✅ 1. **Concept & Understanding**

**Naive Bayes** is a **probabilistic** machine learning algorithm based on **Bayes' Theorem**, with the "naive" assumption that **features are conditionally independent** given the class.

Used for:

* **Classification problems**
* Especially effective in **text classification** (spam filtering, sentiment analysis)

---

### ✅ 2. **Bayes’ Theorem Formula**

$$
P(C \mid X) = \frac{P(X \mid C) \cdot P(C)}{P(X)}
$$

Where:

* $P(C \mid X)$: Posterior – probability of class $C$ given features $X$
* $P(X \mid C)$: Likelihood – probability of features $X$ given class $C$
* $P(C)$: Prior – probability of class $C$
* $P(X)$: Evidence – probability of features $X$ (can be ignored for comparison)

---

### ✅ 3. **"Naive" Assumption**

Assumes all features $x_1, x_2, ..., x_n$ are independent:

$$
P(X \mid C) = P(x_1 \mid C) \cdot P(x_2 \mid C) \cdot \ldots \cdot P(x_n \mid C)
$$

So the formula becomes:

$$
P(C \mid X) \propto P(C) \cdot \prod_{i=1}^n P(x_i \mid C)
$$

---

### ✅ 4. **Example: Text Classification (Spam Filter)**

Suppose we want to classify emails as **Spam** or **Not Spam** using words like:

* $x_1$: “free”
* $x_2$: “win”
* $x_3$: “money”

Let:

* $X = \{x_1 = 1, x_2 = 1, x_3 = 0\}$

We compute for each class $C \in \{\text{Spam}, \text{Not Spam}\}$:

$$
P(\text{Spam} \mid X) \propto P(\text{Spam}) \cdot P(\text{free} \mid \text{Spam}) \cdot P(\text{win} \mid \text{Spam}) \cdot P(\text{money} \mid \text{Spam})
$$

And similarly for Not Spam. We compare both and choose the class with the **higher posterior probability**.

---

### ✅ 5. **Types of Naive Bayes**

| Type            | Assumes                           | Used For                  |
| --------------- | --------------------------------- | ------------------------- |
| **Gaussian**    | Features are normally distributed | Continuous data           |
| **Multinomial** | Features are counts/frequencies   | Text classification       |
| **Bernoulli**   | Features are binary (0/1)         | Presence/absence of words |

---

### ✅ 6. **Mathematics – Gaussian Naive Bayes**

For continuous values (like height, weight), assume:

$$
P(x_i \mid C) = \frac{1}{\sqrt{2\pi\sigma_C^2}} \exp\left( -\frac{(x_i - \mu_C)^2}{2\sigma_C^2} \right)
$$

Where:

* $\mu_C$: mean of feature $x_i$ for class $C$
* $\sigma_C^2$: variance of feature $x_i$ for class $C$

---

### ✅ 7. **Final Prediction**

$$
\hat{C} = \arg\max_{C} \left[ P(C) \cdot \prod_{i=1}^n P(x_i \mid C) \right]
$$

---

### ✅ 8. **Advantages**

* Fast and simple
* Works well with high-dimensional data (e.g., text)
* Requires less training data

### ✅ 9.**Limitations**

* Strong (naive) independence assumption
* Poor estimates if feature independence is violated

---


![image.png](attachment:8efa0b45-9fef-4ca0-94f1-455c75de60e1.png)