# 🎯 Naive Bayes: Complete Professional Guide

## 📚 What You'll Master
1. **Bayes' Theorem** - Probabilistic classification from first principles
2. **Variants** - Gaussian, Multinomial, Bernoulli
3. **Real-World** - Gmail spam (99.9%), sentiment analysis, medical diagnosis
4. **Exercises** - 4 problems with solutions
5. **Competition** - Text classification
6. **Interviews** - 7 questions

---


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import GaussianNB as SklearnGNB
import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)
print('✅ Naive Bayes ready!')


---
# 📖 Chapter 1: Bayes' Theorem

## The Foundation

$$P(y|X) = \frac{P(X|y)P(y)}{P(X)}$$

- $P(y|X)$: **Posterior** (what we want)
- $P(X|y)$: **Likelihood** (from data)
- $P(y)$: **Prior** (class frequency)

## Naive Assumption

Features are **conditionally independent**:

$$P(X|y) = \prod_{i=1}^{n} P(x_i|y)$$

**"Naive"**: Rarely true in reality, but works surprisingly well!

## Variants

1. **Gaussian**: Continuous features, $P(x|y) \sim \mathcal{N}(\mu_y, \sigma_y^2)$
2. **Multinomial**: Count data (word frequencies)
3. **Bernoulli**: Binary features


In [None]:
class GaussianNaiveBayes:
    def __init__(self):
        self.classes = None
        self.mean = {}
        self.var = {}
        self.priors = {}
    
    def fit(self, X, y):
        self.classes = np.unique(y)
        for c in self.classes:
            X_c = X[y == c]
            self.mean[c] = X_c.mean(axis=0)
            self.var[c] = X_c.var(axis=0)
            self.priors[c] = X_c.shape[0] / X.shape[0]
        return self
    
    def _gaussian_pdf(self, x, mean, var):
        return np.exp(-(x - mean)**2 / (2 * var)) / np.sqrt(2 * np.pi * var)
    
    def predict(self, X):
        preds = []
        for x in X:
            posts = []
            for c in self.classes:
                prior = np.log(self.priors[c])
                likelihood = np.sum(np.log(self._gaussian_pdf(x, self.mean[c], self.var[c])))
                posts.append(prior + likelihood)
            preds.append(self.classes[np.argmax(posts)])
        return np.array(preds)
    
    def score(self, X, y):
        return accuracy_score(y, self.predict(X))

print('✅ GaussianNaiveBayes complete!')


---
# 🏭 Chapter 3: Real-World Use Cases

### 1. Gmail Spam Filtering 📧
- **Impact**: **99.9% accuracy**
- **Type**: Multinomial NB on word counts
- **Scale**: Billions of emails filtered daily

### 2. Sentiment Analysis (Twitter) 🐦
- **Problem**: Classify tweets as positive/negative
- **Impact**: Real-time brand monitoring
- **Advantage**: Fast, scalable

### 3. Medical Diagnosis 🏥
- **Problem**: Disease prediction from symptoms
- **Why NB**: Probabilistic output crucial for doctors
- **Example**: Flu vs Cold classification

### 4. Document Classification (Reuters) 📰
- **Problem**: Auto-categorize news articles
- **Impact**: Content recommendation
- **Features**: TF-IDF word vectors


In [None]:
# Test on Iris
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)

nb = GaussianNaiveBayes()
nb.fit(X_train, y_train)
our_acc = nb.score(X_test, y_test)

sklearn_nb = SklearnGNB()
sklearn_nb.fit(X_train, y_train)
sklearn_acc = sklearn_nb.score(X_test, y_test)

print('='*60)
print(f'Our NB:      {our_acc:.4f}')
print(f'Sklearn:     {sklearn_acc:.4f}')
print('='*60)


---
# 🎯 Exercises

## Exercise 1: Laplace Smoothing ⭐⭐
Add smoothing to handle zero probabilities

## Exercise 2: Multinomial NB ⭐⭐⭐
Implement for text classification

## Exercise 3: Compare Variants ⭐⭐
Gaussian vs Multinomial vs Bernoulli

## Exercise 4: Handle Missing Data ⭐


---
# 🏆 Competition: Text Classification

Classify news articles into categories

**Baseline**: 75%


---
# 💡 Interviews

### Q1: Why "naive"?
**Answer**: Assumes feature independence (rarely true)

### Q2: When does it work well?
**Answer**: Text classification, despite violated assumptions!

### Q3: Gaussian vs Multinomial?
**Gaussian**: Continuous features
**Multinomial**: Count/frequency data

### Q4: Handling zero probabilities?
**Answer**: Laplace smoothing (add $\alpha$ to all counts)

### Q5: NB vs Logistic Regression?
**NB**: Faster, less data needed, generative
**LR**: More accurate, discriminative

### Q6: Computational complexity?
**Answer**: Training: O(nd), Prediction: O(cd)
Extremely fast!

### Q7: Probabilistic output?
**Answer**: Yes! Outputs P(class|features) naturally


---
# 📊 Summary

## Key Takeaways
✅ **Fastest** algorithm (O(nd))
✅ **Probabilistic** output
✅ **Handles high dimensions** well
✅ **Little training data** needed
⚠️ **Naive independence** assumption
⚠️ **Sensitive to feature distribution**

## When to Use
✅ Text classification
✅ Real-time predictions
✅ Baseline model
✅ Small datasets

---

## Next: K-Means for clustering
