
# üìò Naive Bayes
---

# üß† 1. What is Naive Bayes?

Naive Bayes is a **probabilistic classification algorithm** based on Bayes Theorem.

It predicts class using:

$$
P(C \mid X)
$$

Where:
- $C$ = class  
- $X = (x_1, x_2, ..., x_n)$ = features  

Assumption:
Features are conditionally independent given the class.

---

# üßÆ 2. Bayes Theorem

$$
P(C \mid X) = \frac{P(C)\,P(X \mid C)}{P(X)}
$$

In practice:

$$
P(C \mid X) \propto P(C)\,P(X \mid C)
$$

---

# üß† 3. Naive Independence Assumption

$$
P(x_1, x_2, ..., x_n \mid C)
=
\prod_{i=1}^{n} P(x_i \mid C)
$$

Final working formula:

$$
P(C \mid X) \propto P(C)\prod P(x_i \mid C)
$$

---

# ‚öôÔ∏è 4. Working Steps

## Step 1 ‚Äî Prior

$$
P(C) = \frac{\text{class count}}{\text{total samples}}
$$

## Step 2 ‚Äî Likelihood

$$
P(x_i \mid C)
$$

## Step 3 ‚Äî Multiply

$$
Score(C) = P(C)\prod P(x_i \mid C)
$$

## Step 4 ‚Äî Predict

$$
\hat{y} = \arg\max_C Score(C)
$$

---

# üìä 5. Types of Naive Bayes

## Gaussian NB (Continuous Data)

$$
P(x \mid C) =
\frac{1}{\sqrt{2\pi\sigma^2}}
\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)
$$

---

## Multinomial NB (Text / Counts)

Used for:
- NLP
- Spam detection
- Document classification

---

## Bernoulli NB (Binary Features)

$$
x_i \in \{0,1\}
$$

---

# üß† 6. Log Trick (Very Important)

To avoid numerical underflow:

$$
\log P(C \mid X) =
\log P(C) + \sum \log P(x_i \mid C)
$$

This is used in real implementations.

---

# üß™ 7. Example Logic

Given class priors:

$$
P(Yes)=0.5,\quad P(No)=0.5
$$

Compute likelihoods and compare scores:

$$
Score(Yes) \quad vs \quad Score(No)
$$

Choose maximum.

---


Covers:
- Problem solving using Bayes
- Intuition
- Mathematics behind Naive Bayes
- Code implementation
- Handling numerical data

---

# üß† 1. Bayes Theorem Refresher

$$
P(C \mid X) = \frac{P(C) P(X \mid C)}{P(X)}
$$

Used for classification:

$$
P(C \mid X) \propto P(C) P(X \mid C)
$$

---

# üßÆ 2. Problem Solving Using Bayes

Typical format:

Given:
- Priors
- Conditional probabilities

Find:

$$
P(C_i \mid X)
$$

Steps:
1. Compute prior
2. Compute likelihood
3. Multiply
4. Compare scores

---

# üß† 3. Intuition of Naive Bayes

Model asks:

> How likely is this class?  
> How likely are these features in that class?

Then combines them.

Decision rule:

$$
\hat{y} = \arg\max_C P(C) \prod P(x_i \mid C)
$$

---

# ‚ö° 4. Why It Works Surprisingly Well

Even with wrong independence assumption:
- Probabilities still rank classes correctly
- Works great in high dimensions
- Strong baseline model

---

# üßÆ 5. Mathematical Derivation

Start from:

$$
P(C \mid X) = \frac{P(C, X)}{P(X)}
$$

Using conditional probability:

$$
P(C, X) = P(C)P(X \mid C)
$$

Apply independence assumption:

$$
P(X \mid C) = \prod_{i=1}^n P(x_i \mid C)
$$

Final:

$$
P(C \mid X) \propto P(C) \prod P(x_i \mid C)
$$

---

# üß† 6. Log Likelihood Form

To avoid underflow:

$$
\log P(C \mid X) =
\log P(C) + \sum \log P(x_i \mid C)
$$

Used in real implementations.

---

# üß™ 7. Simple Example

Given two classes:
Spam vs Not Spam

Sentence:
"Win money now"

Compute:

$$
Score(Spam) = P(Spam) \prod P(word \mid Spam)
$$

Compare with:

$$
Score(NotSpam)
$$

Choose higher.

---

# üõ†Ô∏è 8. Handling Numerical Data

For continuous features ‚Üí Gaussian NB

Assume normal distribution:

$$
x \sim \mathcal{N}(\mu, \sigma^2)
$$

Likelihood:

$$
P(x \mid C) =
\frac{1}{\sqrt{2\pi\sigma^2}}
\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)
$$

---

# üß† 9. Parameter Estimation

For each class:

$$
\mu = \text{mean of feature}
$$

$$
\sigma^2 = \text{variance}
$$

Computed from training data.

---

# üíª 10. Code Implementation (Scikit-learn)

```python
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y)

model = GaussianNB()
model.fit(X_train, y_train)

pred = model.predict(X_test)
print(accuracy_score(y_test, pred))
```

---

# üß† 11. When to Use Which Variant

| Data Type | NB Variant |
|----------|------------|
| Continuous | Gaussian |
| Word counts | Multinomial |
| Binary features | Bernoulli |

---

# ‚ö° 12. Strengths

- Extremely fast
- Works well with small data
- Great for text classification
- Easy baseline model

---

# ‚ùå 13. Limitations

- Independence assumption unrealistic
- Correlated features hurt performance
- Probability calibration poor

---

# üéØ 14. Interview Questions

**Why naive?**  
Assumes feature independence.

**Why log probabilities?**  
Avoid numerical underflow.

**Why good for NLP?**  
High dimensional sparse data.

**Gaussian NB assumption?**  
Features normally distributed.

---

# üßæ Final Formula

$$
\hat{y} =
\arg\max_C
\left[
\log P(C) + \sum \log P(x_i \mid C)
\right]
$$

---

# üß† 15. Text Classification Example

Sentence: ‚Äúfree money now‚Äù

$$
P(Spam \mid words) \propto
P(Spam)\prod P(word_i \mid Spam)
$$

---

# üõ†Ô∏è 16. Zero Probability Problem

If any likelihood = 0 ‚Üí product becomes 0.

Solution: **Laplace Smoothing**

$$
P(x \mid C) =
\frac{\text{count}+1}{\text{total}+V}
$$

Where $V$ = vocabulary size.

---

# ‚ö° 17. Advantages

- Very fast  
- Works with small data  
- Great for NLP  
- Easy to implement  

---

# ‚ùå 18. Disadvantages

- Unrealistic independence assumption  
- Sensitive to correlated features  
- Poor probability calibration  

---

# üìä 19. Applications

- Spam filtering  
- Sentiment analysis  
- Fake news detection  
- Medical diagnosis  
- Document classification  

---

# üß† 20. Naive Bayes vs Logistic Regression

| Feature | Naive Bayes | Logistic Regression |
|--------|-------------|--------------------|
| Speed | Very fast | Medium |
| Data needed | Small | Large |
| Handles correlation | Poor | Better |

---

# üß™ 21. Scikit-learn Implementation

```python
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

model = GaussianNB()
model.fit(X_train, y_train)
pred = model.predict(X_test)
```

---

# üß† 22. Interview Questions

**Why naive?**  
Because of independence assumption.

**Why logs?**  
Avoid numerical underflow.

**Best use case?**  
Text classification.

**When not to use?**  
Highly correlated features.

---

# üßæ Final Formula to Remember

$$
\hat{y} =
\arg\max_C
\left[
\log P(C) + \sum \log P(x_i \mid C)
\right]
$$

---

# ‚úÖ Final Summary

Naive Bayes is:
- A simple probabilistic classifier
- Based on Bayes theorem
- Uses independence assumption
- Extremely powerful for NLP tasks

Despite simplicity, it performs surprisingly well in real-world scenarios.