# Smoothing Techniques in Language Models

This notebook introduces the concept of smoothing in language models. Smoothing helps prevent the problem of zero probabilities when encountering unseen data.

## üéõÔ∏è Advanced: Smoothing Techniques

**Handling the "zero probability" problem**

‚ö° *Advanced technique to make Language Models more robust*

## What is Smoothing?

- üéØ **Problem:** Unseen N-grams get 0% probability
- ‚ú® **Solution:** Redistribute probability mass
- üõ°Ô∏è **Benefit:** Model doesn't break on new data
- ‚öñÔ∏è **Trade-off:** Slight accuracy loss for better robustness

## Common Smoothing Methods

- ‚ûï **Add-One (Laplace):** Add 1 to all N-gram counts
- üìä **Good-Turing:** Sophisticated probability redistribution
- üîô **Back-off:** Fall back to smaller N-grams
- üéØ **Interpolation:** Blend different N-gram sizes

## Simple Add-One Smoothing

Below is an example of how add-one smoothing can be implemented in Python.

In [None]:
# Before smoothing: unseen bigrams get 0 probability
bigram_counts = {"I love": 5, "love cats": 3, "cats are": 2}
vocab_size = 1000

# Add-one smoothing
def smooth_probability(count, total_bigrams, vocab_size):
    return (count + 1) / (total_bigrams + vocab_size)

# Now even unseen bigrams get small probability > 0
unseen_prob = smooth_probability(0, 10, vocab_size)
seen_prob = smooth_probability(5, 10, vocab_size)

print(f"Unseen bigram probability: {unseen_prob:.6f}")
print(f"Seen bigram probability: {seen_prob:.6f}")

**Result:** No more zero probabilities!

## Why Smoothing Matters

**üéØ Real-world robustness:**

- üì± **Autocorrect:** Handles typos and new words
- üåê **Search:** Works with never-seen query combinations
- üó£Ô∏è **Speech recognition:** Manages accents and variations
- üß† **Foundation:** Led to more sophisticated neural approaches

*Modern neural networks do implicit smoothing through their architecture!*