# Week 6 — Naive Bayes (Generative Text Classifier)

Objectives
- Review joint probability, conditional probability, and Bayes’ theorem
- Understand Naive Bayes as a generative classifier with conditional independence assumptions
- Compare generative vs. discriminative models
- Implement Bernoulli and Multinomial Naive Bayes for text classification
- Train and evaluate on a small spam dataset


In [None]:
import math, random, sys, os
from pprint import pprint

from utils import (
    show_result, tokenize, build_vocab, vectorize_bow, train_test_split,
    NaiveBayesText, accuracy, confusion_matrix, tiny_spam_dataset,
    test_exercise_1_probability, test_exercise_2_nb_fit_predict, test_exercise_3_smoothing
)


## 1. Probability Warm‑up

Definitions
- Joint: $p(a,b)$
- Conditional: $p(a\mid b) = \frac{p(a,b)}{p(b)}$, with $p(b) > 0$
- Bayes’ theorem: $p(a \mid b) = \frac{p(b \mid a)p(a)}{p(b)}$

Implement the functions below.


In [None]:
# Implement the following functions.
def joint(p_a, p_b):
    """Assume independence: p(a,b) = p(a)*p(b)."""
    raise NotImplementedError

def conditional(p_ab, p_b):
    """p(a|b) = p(a,b) / p(b), assuming p(b) > 0."""
    raise NotImplementedError

def bayes(p_b_given_a, p_a, p_b):
    """Bayes' theorem: p(a|b) = p(b|a) p(a) / p(b)."""
    raise NotImplementedError


In [None]:
res = test_exercise_1_probability({"joint": joint, "conditional": conditional, "bayes": bayes})
show_result("Exercise 1 – Probability", res)


## 2. Naive Bayes as a Generative Model

- Model $p(x \mid y)$ and $p(y)$, and compute $p(y \mid x)$ by Bayes’ rule
- Naive assumption: features are conditionally independent given $y$
- Bernoulli variant uses binary word presence; Multinomial uses word counts


In [None]:
texts, labels = tiny_spam_dataset()
print(f"Dataset size: {len(texts)}  |  ham={sum(1 for y in labels if y==0)}  spam={sum(1 for y in labels if y==1)}")
for t, y in list(zip(texts, labels)):
    print(f"[{y}] {t}")


In [None]:
vocab = build_vocab(texts, min_freq=1, max_size=2000)
print(f"Vocabulary (size={len(vocab)}): {list(vocab)[:10]}{'...' if len(vocab) > 10 else ''}")

# Whether a word is present or not
X_bin = vectorize_bow(texts, vocab, binary=True)

# Count of words
X_cnt = vectorize_bow(texts, vocab, binary=False)

Xtr_bin, Xte_bin, ytr, yte = train_test_split(X_bin, labels, test_size=0.3, seed=7)
Xtr_cnt, Xte_cnt, _, _ = train_test_split(X_cnt, labels, test_size=0.3, seed=7)

## 3. Fit a Naive Bayes Classifier

Complete `student_fit_func(...)`:
1) Build vocabulary
2) Vectorize (binary for Bernoulli, counts for Multinomial)
3) Split into train/test
4) Train `NaiveBayesText(mode, alpha)` and return test accuracy


In [None]:
def student_fit_func(texts, labels, mode='bernoulli', alpha=1.0):
    """
    Returns test accuracy on the tiny dataset.
    """
    raise NotImplementedError


In [None]:
res = test_exercise_2_nb_fit_predict(student_fit_func)
show_result("Exercise 2 – Fit & Predict", res)


In [None]:
sample_text = "free prize claim now"
nb_bin = NaiveBayesText(mode='bernoulli', alpha=1.0)
nb_bin.fit(Xtr_bin, ytr)
vec = vectorize_bow([sample_text], vocab, binary=True)
proba = nb_bin.predict_proba(vec)[0]
pred_label = nb_bin.predict(vec)[0]
label_name = 'spam' if pred_label == 1 else 'ham'
print(f"Sample text: '{sample_text}'")
print(f"Predicted label: {pred_label} ({label_name})")
print(f"Posterior probabilities -> ham: {proba[0]:.3f}, spam: {proba[1]:.3f}")

## 4. Smoothing

Implement `student_train_eval(alpha)` to train once (choose a mode) and return `(train_acc, test_acc)`. Then try several values of $\alpha$.


In [None]:
def student_train_eval(alpha=1.0, mode='bernoulli'):
    """
    Train Naive Bayes with the given alpha; return (train_acc, test_acc).
    """
    raise NotImplementedError

res = test_exercise_3_smoothing(student_train_eval)
show_result("Exercise 3 – Smoothing", res)

for a in [0.1, 0.5, 1.0, 2.0, 5.0]:
    tr, te = student_train_eval(a, mode='bernoulli')
    print(f"alpha={a:.1f} -> train={tr:.3f} | test={te:.3f}")


## 5. Generative vs. Discriminative (Short Answer)

1) How does a generative classifier differ from a discriminative classifier?  
2) Why can Naive Bayes be viewed as a simple text generator?  
3) Briefly relate Naive Bayes to modern generative models (e.g., GPT).


_Answer here._
