# 📌 Laplace Smoothing with Naïve Bayes

This notebook demonstrates **Laplace Smoothing** in a simple **Naïve Bayes text classifier**.
We will:
- Create a small dataset of text messages (spam vs not spam)
- Train a Naïve Bayes classifier
- Apply Laplace Smoothing to handle zero probabilities
- Predict categories for test messages


## 📌 Step 1: Load Required Libraries

In [None]:
import numpy as np
from collections import defaultdict


## 📌 Step 2: Define Training Data
We use a small dataset of text messages labeled as **Spam (1)** or **Not Spam (0)**.

In [None]:
training_data = [
    ("buy cheap meds now", 1),
    ("discount on new phones", 1),
    ("win lottery cash prize", 1),
    ("hello, how are you?", 0),
    ("let's meet for coffee", 0),
    ("have a great day", 0)
]

## 📌 Step 3: Tokenization and Vocabulary Creation
We split sentences into words and count their occurrences in spam and non-spam messages.

In [None]:
def tokenize(text):
    return text.lower().split()

# Create word count dictionaries
word_counts = {0: defaultdict(int), 1: defaultdict(int)}
class_counts = {0: 0, 1: 0}
vocab = set()

# Populate word frequency per class
for text, label in training_data:
    words = tokenize(text)
    class_counts[label] += 1
    for word in words:
        word_counts[label][word] += 1
        vocab.add(word)

# Vocabulary Size
V = len(vocab)

## 📌 Step 4: Laplace Smoothing Function
We use the formula:

\[
P(w | C) = \frac{\text{count}(w, C) + 1}{\sum \text{count}(w', C) + |V|}
\]

This ensures that every word has a nonzero probability.

In [None]:
def laplace_probability(word, label, alpha=1):
    word_freq = word_counts[label][word]  # Frequency of word in class
    total_words_in_class = sum(word_counts[label].values())  # Total words in class
    return (word_freq + alpha) / (total_words_in_class + V * alpha)

## 📌 Step 5: Prediction Function
We compute log probabilities for spam and non-spam classes and choose the one with the highest probability.

In [None]:
def predict(text):
    words = tokenize(text)
    probs = {0: np.log(class_counts[0] / sum(class_counts.values())),
             1: np.log(class_counts[1] / sum(class_counts.values()))}

    for word in words:
        probs[0] += np.log(laplace_probability(word, 0))
        probs[1] += np.log(laplace_probability(word, 1))

    return 1 if probs[1] > probs[0] else 0  # Return class with higher probability

## 📌 Step 6: Testing the Model
Let's test our model with new messages.

In [None]:
test_messages = [
    "win cash now",
    "coffee meeting tomorrow",
    "cheap phones available",
    "great day ahead"
]

for message in test_messages:
    print(f"Message: '{message}' → Predicted Class: {predict(message)}")