# Smoothing Techniques in Language Models

This notebook introduces the concept of smoothing in language models. Smoothing helps prevent the problem of zero probabilities when encountering unseen data.

## 🎛️ Advanced: Smoothing Techniques

**Handling the "zero probability" problem**

⚡ *Advanced technique to make Language Models more robust*

## What is Smoothing?

- 🎯 **Problem:** Unseen N-grams get 0% probability
- ✨ **Solution:** Redistribute probability mass
- 🛡️ **Benefit:** Model doesn't break on new data
- ⚖️ **Trade-off:** Slight accuracy loss for better robustness

## Common Smoothing Methods

- ➕ **Add-One (Laplace):** Add 1 to all N-gram counts
- 📊 **Good-Turing:** Sophisticated probability redistribution
- 🔙 **Back-off:** Fall back to smaller N-grams
- 🎯 **Interpolation:** Blend different N-gram sizes

## Simple Add-One Smoothing

Below is an example of how add-one smoothing can be implemented in Python.

In [None]:
# Before smoothing: unseen bigrams get 0 probability
bigram_counts = {"I love": 5, "love cats": 3, "cats are": 2}
vocab_size = 1000

# Add-one smoothing
def smooth_probability(count, total_bigrams, vocab_size):
    return (count + 1) / (total_bigrams + vocab_size)

# Now even unseen bigrams get small probability > 0
unseen_prob = smooth_probability(0, 10, vocab_size)
seen_prob = smooth_probability(5, 10, vocab_size)

print(f"Unseen bigram probability: {unseen_prob:.6f}")
print(f"Seen bigram probability: {seen_prob:.6f}")

**Result:** No more zero probabilities!

## Why Smoothing Matters

**🎯 Real-world robustness:**

- 📱 **Autocorrect:** Handles typos and new words
- 🌐 **Search:** Works with never-seen query combinations
- 🗣️ **Speech recognition:** Manages accents and variations
- 🧠 **Foundation:** Led to more sophisticated neural approaches

*Modern neural networks do implicit smoothing through their architecture!*