# 8. Multinomial Naive Bayes

**Purpose:** Learn and revise **Multinomial Naive Bayes** in Scikit-learn.

---

## What is Multinomial Naive Bayes?

**Multinomial Naive Bayes** is commonly used for **discrete count data**, especially **text** (e.g. word counts or term frequencies). It assumes features are **counts** and models the probability of counts with a multinomial distribution. For each class \( y \) and feature \( j \):

\[
P(x_j \mid y) \propto \theta_{jy}^{x_j}
\]

where \( \theta_{jy} \) is the (smoothed) frequency of feature \( j \) in class \( y \) (estimated from training data). The likelihood of a document/sample is proportional to:

\[
P(X \mid y) \propto \prod_{j} \theta_{jy}^{x_j}
\]

- **When to use:** Text classification (e.g. spam vs ham) with **count** or **tf-idf** features. Features should be **non-negative counts**.

## Concepts to Remember

| Concept | Description |
|--------|-------------|
| **Counts** | Each \( x_j \) is a count (e.g. word frequency); model uses these counts. |
| **alpha** | Laplace/additive smoothing so unseen words don’t get zero probability. |
| **fit_prior** | Whether to learn class priors \( P(y) \) from data (True) or use uniform (False). |
| **vs Bernoulli** | Multinomial: "how many times"; Bernoulli: "present or absent" (binary). |

In [None]:
import numpy as np
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [None]:
# Simulated count data (e.g. word counts in 4 "words", 2 classes)
np.random.seed(42)
n = 120
X = np.random.poisson(lam=2, size=(n, 4)).astype(float)  # count-like, non-negative
X += 0.1  # avoid exact zeros if needed
y = (X[:, 0] + X[:, 1] > X[:, 2] + X[:, 3]).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
model = MultinomialNB(alpha=1.0)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion matrix:\n", confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

## Key Takeaways

- **MultinomialNB** expects **non-negative count** features (e.g. bag-of-words counts or tf-idf).
- **alpha**: smoothing; typically 0.01–1.0; prevents zero probabilities for unseen tokens.
- Classic choice for **text classification** when using count-based or tf-idf feature vectors.