# Lesson 07 - Naive Bayes for Text + Laplace Smoothing


## Objectives
- Implement multinomial Naive Bayes with Laplace smoothing.
- Understand the bag-of-words representation.
- Visualize class log-odds for keywords.


## From the notes

**Naive Bayes**
- Assume conditional independence: $p(x|y) = \prod_j p(x_j|y)$.
- Laplace smoothing: $\phi_{j|y} = \frac{\text{count}(x_j, y) + 1}{\text{count}(y) + V}$.

_TODO: Validate formulas in the CS229 main notes PDF._


## Intuition
Naive Bayes multiplies per-word likelihoods, and Laplace smoothing prevents zero probabilities for unseen words.


## Data
We create a toy email dataset with a small vocabulary to illustrate the algorithm.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

vocab = ["free", "money", "hello", "meeting", "offer", "project"]
docs = [
    ("free money offer", 1),
    ("free offer", 1),
    ("hello project meeting", 0),
    ("project meeting", 0),
    ("free money", 1),
    ("hello meeting", 0),
]

def vectorize(text):
    tokens = text.split()
    return np.array([tokens.count(word) for word in vocab])

X = np.vstack([vectorize(t) for t, _ in docs])
y = np.array([label for _, label in docs])

def train_nb(X, y, alpha=1.0):
    V = X.shape[1]
    class_priors = np.array([np.mean(y==0), np.mean(y==1)])
    word_counts = np.array([X[y==c].sum(axis=0) for c in [0,1]])
    total_counts = word_counts.sum(axis=1, keepdims=True)
    word_probs = (word_counts + alpha) / (total_counts + alpha * V)
    return class_priors, word_probs

priors, word_probs = train_nb(X, y)


## Experiments


In [None]:
def predict_nb(x):
    log_probs = np.log(priors) + (x * np.log(word_probs)).sum(axis=1)
    return np.argmax(log_probs)

preds = np.array([predict_nb(x) for x in X])
(preds == y).mean()


## Visualizations


In [None]:
log_odds = np.log(word_probs[1]) - np.log(word_probs[0])
plt.figure(figsize=(6,4))
plt.bar(vocab, log_odds)
plt.title("Naive Bayes log-odds by word")
plt.xlabel("word")
plt.ylabel("log odds spam vs ham")
plt.xticks(rotation=30)
plt.show()

plt.figure(figsize=(6,4))
plt.imshow(word_probs, aspect="auto", cmap="viridis")
plt.colorbar(label="p(word|class)")
plt.yticks([0,1], ["ham", "spam"])
plt.xticks(range(len(vocab)), vocab, rotation=30)
plt.title("Word likelihoods")
plt.show()


## Takeaways
- Laplace smoothing ensures nonzero likelihoods for unseen words.
- Naive Bayes can be competitive for text despite its independence assumption.


## Explain it in an interview
- Explain how Laplace smoothing changes Naive Bayes probabilities.
- Describe why Naive Bayes works well for text classification.


## Exercises
- Add bigrams to the vocabulary and retrain.
- Try different smoothing constants and observe behavior.
- Implement Bernoulli Naive Bayes and compare.
