# 7. Bernoulli Naive Bayes

**Purpose:** Learn and revise **Bernoulli Naive Bayes** in Scikit-learn.

---

## What is Bernoulli Naive Bayes?

**Bernoulli Naive Bayes** assumes each feature is **binary** (0 or 1). It models the probability that feature \( j \) equals 1 given class \( y \):

\[
P(x_j = 1 \mid y) = p_{jy}
\]

Each \( p_{jy} \) is estimated from the training data (e.g. fraction of samples in class \( y \) where \( x_j = 1 \)). The likelihood for a sample \( X \) is:

\[
P(X \mid y) = \prod_{j} p_{jy}^{x_j} (1 - p_{jy})^{1-x_j}
\]

- **When to use:** Binary/Boolean features (e.g. word present/absent in text, yes/no flags). In Scikit-learn, **BernoulliNB** binarizes input by default (threshold 0) so you can also feed count-like data; it then treats it as binary.

## Concepts to Remember

| Concept | Description |
|--------|-------------|
| **Binary features** | Each \( x_j \in \{0, 1\} \); model learns \( P(x_j=1 \mid y) \) per class. |
| **Binarize** | If you pass counts, set **binarize** to a threshold (e.g. 0); values > threshold become 1. |
| **Laplace smoothing** | **alpha** adds a small count to avoid zero probabilities. |
| **vs Multinomial NB** | Bernoulli: "word present or not"; Multinomial: "how many times" (counts). |

In [None]:
import numpy as np
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [None]:
# Binary features: 5 features, two classes
np.random.seed(42)
n = 100
X = np.random.randint(0, 2, size=(n, 5))  # 0/1 only
y = (X[:, 0] + X[:, 1] + np.random.randint(0, 2, n)) % 2  # loosely related to first 2 features
y = np.clip(y, 0, 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
model = BernoulliNB(alpha=1.0)  # Laplace smoothing
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion matrix:\n", confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

## Key Takeaways

- **BernoulliNB** expects (or binarizes to) **binary** features. Use for presence/absence data.
- **alpha**: smoothing parameter; helps when a feature never appears in a class (avoids \( P=0 \)).
- **binarize**: If None, input is assumed binary; set to a number to threshold continuous/count inputs to 0/1.