## Naive Bayes: An Introduction

Naive Bayes is a straightforward and effective classification algorithm that applies **Bayes’ theorem** with the assumption that all features are **conditionally independent** given the class label.

### Bayes’ Theorem

Bayes’ theorem describes the probability of a class $C$ given observed data $X$:

$$
P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}
$$

* $P(C|X)$: Probability of class $C$ given features $X$
* $P(X|C)$: Probability of features $X$ given class $C$
* $P(C)$: Prior probability of class $C$
* $P(X)$: Probability of features $X$

### Conditional Independence

Naive Bayes assumes that each feature is independent of the others if the class is known.
This simplifies the calculation, as:

$$
P(X|C) = P(x_1|C) \times P(x_2|C) \times \cdots \times P(x_n|C)
$$

---

## Main Types of Naive Bayes

### 1. Gaussian Naive Bayes

* Used for **continuous data** (like age or height)
* Assumes each feature follows a normal (Gaussian) distribution

**Example:**
If “Age” for class “Buy” has mean = 30, variance = 25, and the observed age is 28:

$$
P(\text{Age}=28 | \text{Buy}) = \frac{1}{\sqrt{2\pi \times 25}} \exp\left(-\frac{(28-30)^2}{2 \times 25}\right)
$$

---

### 2. Multinomial Naive Bayes

* Used for **discrete counts**, such as word counts in text documents

**Example:**
Email: “free money free”
For class “Spam”, suppose:

* $P(\text{free}|\text{Spam}) = 0.5$
* $P(\text{money}|\text{Spam}) = 0.4$

Then,

$$
P = 0.5^2 \times 0.4^1 = 0.1
$$

---

### 3. Bernoulli Naive Bayes

* Used for **binary data** (feature is either present or absent)

**Example:**
Email: “free money”
Feature vector: \[free=1, money=1, now=0]
For class “Spam”, suppose:

* $P(\text{free}|\text{Spam}) = 0.7$
* $P(\text{money}|\text{Spam}) = 0.5$
* $P(\text{now}|\text{Spam}) = 0.1$

Then,

$$
P = 0.7^1 \times 0.5^1 \times 0.9^1 = 0.315
$$

---

## Summary Table

| Model       | Suitable For    | Example      |
| ----------- | --------------- | ------------ |
| Gaussian    | Continuous data | Age, height  |
| Multinomial | Count data      | Word counts  |
| Bernoulli   | Binary data     | Word present |



## **1. Gaussian Naive Bayes**

Suppose we have:

| Person | Age | Class   |
| ------ | --- | ------- |
| A      | 22  | Buy     |
| B      | 28  | Buy     |
| C      | 25  | Not Buy |
| D      | 32  | Buy     |
| E      | 35  | Not Buy |

We want to predict the class for **Age = 30**.

**Calculate mean and variance for each class:**

* **Buy:**
  Ages = 22, 28, 32
  Mean:
  $\mu_{Buy} = \frac{22 + 28 + 32}{3} = 27.33$
  Variance:
  $\sigma^2_{Buy} = \frac{(22-27.33)^2 + (28-27.33)^2 + (32-27.33)^2}{3} = 16.89$

* **Not Buy:**
  Ages = 25, 35
  Mean:
  $\mu_{NotBuy} = \frac{25 + 35}{2} = 30$
  Variance:
  $\sigma^2_{NotBuy} = \frac{(25-30)^2 + (35-30)^2}{2} = 25$

**Gaussian Probability:**

$$
P(30|\text{Buy}) = \frac{1}{\sqrt{2\pi \times 16.89}} \exp \left( -\frac{(30-27.33)^2}{2 \times 16.89} \right)
$$

$$
P(30|\text{NotBuy}) = \frac{1}{\sqrt{2\pi \times 25}} \exp \left( -\frac{(30-30)^2}{2 \times 25} \right)
$$

---

## **2. Multinomial Naive Bayes**

Suppose we have word counts in emails:

| Email | Free | Money | Now | Class    |
| ----- | ---- | ----- | --- | -------- |
| 1     | 2    | 1     | 0   | Spam     |
| 2     | 1    | 0     | 1   | Spam     |
| 3     | 0    | 1     | 2   | Not Spam |

**Estimate probabilities for Spam:**

* Total ‘free’ in Spam = 2 + 1 = 3
* Total ‘money’ in Spam = 1 + 0 = 1
* Total ‘now’ in Spam = 0 + 1 = 1
* Total words in Spam = 3 + 1 + 1 = 5

$$
P(\text{free}|Spam) = \frac{3}{5} = 0.6 \\
P(\text{money}|Spam) = \frac{1}{5} = 0.2 \\
P(\text{now}|Spam) = \frac{1}{5} = 0.2
$$

**Test email:** \[free=1, money=1, now=0]

$$
P(X|\text{Spam}) \propto 0.6^1 \times 0.2^1 \times 0.2^0 = 0.6 \times 0.2 \times 1 = 0.12
$$

---

## **3. Bernoulli Naive Bayes**

Suppose we have binary data (presence/absence of words):

| Email | Free | Money | Now | Class    |
| ----- | ---- | ----- | --- | -------- |
| 1     | 1    | 1     | 0   | Spam     |
| 2     | 1    | 0     | 1   | Spam     |
| 3     | 0    | 1     | 1   | Not Spam |

**Estimate for Spam:**

* $P(\text{free}=1|Spam) = 2/2 = 1.0$
* $P(\text{money}=1|Spam) = 1/2 = 0.5$
* $P(\text{now}=1|Spam) = 1/2 = 0.5$

**Test email:** \[free=1, money=1, now=0]

$$
P(X|\text{Spam}) = 1.0^1 \times 0.5^1 \times (1-0.5)^1 = 1 \times 0.5 \times 0.5 = 0.25
$$



In [7]:
import math

# Data
data = [
    {"age": 22, "label": "Buy"},
    {"age": 28, "label": "Buy"},
    {"age": 32, "label": "Buy"},
    {"age": 25, "label": "Not Buy"},
    {"age": 35, "label": "Not Buy"}
]

# Separate ages by class
buy_ages = [d["age"] for d in data if d["label"] == "Buy"]
notbuy_ages = [d["age"] for d in data if d["label"] == "Not Buy"]

# Calculate mean and variance
def mean(nums):
    return sum(nums) / len(nums)

def variance(nums, m):
    return sum((x - m) ** 2 for x in nums) / len(nums)

mean_buy = mean(buy_ages)
var_buy = variance(buy_ages, mean_buy)

mean_notbuy = mean(notbuy_ages)
var_notbuy = variance(notbuy_ages, mean_notbuy)

# Predict for age = 30
def gaussian(x, mu, var):
    return (1 / math.sqrt(2 * math.pi * var)) * math.exp(-((x - mu) ** 2) / (2 * var))

p_30_buy = gaussian(30, mean_buy, var_buy)
p_30_notbuy = gaussian(30, mean_notbuy, var_notbuy)

print(f"P(age=30 | Buy) = {p_30_buy:.4f}")
print(f"P(age=30 | Not Buy) = {p_30_notbuy:.4f}")

P(age=30 | Buy) = 0.0786
P(age=30 | Not Buy) = 0.0798


# Multinomial Naive Bayes

In [5]:
# Training data: word counts and class
data = [
    {"free": 2, "money": 1, "now": 0, "label": "Spam"},
    {"free": 1, "money": 0, "now": 1, "label": "Spam"},
    {"free": 0, "money": 1, "now": 2, "label": "Not Spam"},
]

# Estimate probabilities for 'Spam'
spam = [d for d in data if d["label"] == "Spam"]
total_words = sum(d["free"] + d["money"] + d["now"] for d in spam)
p_free = sum(d["free"] for d in spam) / total_words
p_money = sum(d["money"] for d in spam) / total_words
p_now = sum(d["now"] for d in spam) / total_words

# Test email: free=1, money=1, now=0
test = {"free": 1, "money": 1, "now": 0}
prob = (p_free ** test["free"]) * (p_money ** test["money"]) * (p_now ** test["now"])
print(f"P(X | Spam) = {prob:.4f}")


P(X | Spam) = 0.1200


# Bernoulli Naive Bayes

In [6]:
# Training data: word presence (1/0) and class
data = [
    {"free": 1, "money": 1, "now": 0, "label": "Spam"},
    {"free": 1, "money": 0, "now": 1, "label": "Spam"},
    {"free": 0, "money": 1, "now": 1, "label": "Not Spam"},
]

# Estimate P(word=1 | Spam)
spam = [d for d in data if d["label"] == "Spam"]
p_free = sum(d["free"] for d in spam) / len(spam)
p_money = sum(d["money"] for d in spam) / len(spam)
p_now = sum(d["now"] for d in spam) / len(spam)

# Test email: free=1, money=1, now=0
test = {"free": 1, "money": 1, "now": 0}
prob = (p_free ** test["free"]) * (p_money ** test["money"]) * ((1 - p_now) ** (1 - test["now"]))
print(f"P(X | Spam) = {prob:.4f}")


P(X | Spam) = 0.2500
