# N-gram Language Models

Welcome to this beginner-friendly introduction to N-gram models! In this notebook, we'll explore how sequences of words can help us predict what comes next in a sentence.

Let's get started!

## What are N-grams?

- **N-gram**: A sequence of N consecutive words.
- **Unigram (1-gram)**: Single word (e.g., "cat")
- **Bigram (2-gram)**: Two words together (e.g., "black cat")
- **Trigram (3-gram)**: Three words (e.g., "the black cat")

**Higher N:** Gives more context but requires more data to learn effectively.

## Real-World Analogy

**🎵 N-grams are like musical phrases:**

- **Unigram:** Single note - "Do"
- **Bigram:** Two notes - "Do-Re"
- **Trigram:** Three notes - "Do-Re-Mi"
- **Prediction:** After "Do-Re", the next note is likely "Mi"

*Musicians recognize patterns, and N-grams do too!*

## N-gram Examples in Action

Text: "The quick brown fox jumps"

- **Unigrams:** ["The", "quick", "brown", "fox", "jumps"]
- **Bigrams:** ["The quick", "quick brown", "brown fox", "fox jumps"]
- **Trigrams:** ["The quick brown", "quick brown fox", "brown fox jumps"]

**Prediction:** After "fox jumps", what's next? "over"?

Let's see how we can build and use N-grams!

## Building an N-gram Model Demo

We'll create a simple bigram model from some example text. This model will help us see patterns in word sequences and predict the next word.

In [None]:
# Building a simple bigram model
text = "I love cats. I love dogs. I love animals."
words = text.replace('.', '').lower().split()

# Extract bigrams
bigrams = []
for i in range(len(words) - 1):
    bigram = (words[i], words[i+1])
    bigrams.append(bigram)

print("Bigrams found:", bigrams)

In [None]:
# Count occurrences for prediction
from collections import Counter
bigram_counts = Counter(bigrams)
print("Most common:", bigram_counts.most_common(3))

## N-grams Made Simple

**Core Process:**

- ✂️ **Step 1:** Slice text into overlapping chunks
- 📊 **Step 2:** Count how often each N-gram appears
- 🔮 **Step 3:** Use counts to predict the next word
- 🎲 **Higher count:** More likely next word


## Visualizing N-grams

**Interactive Visualization:** Building Predictions

<svg id="ngram-svg" width="800" height="400" class="svg"></svg>

Let's see how bigrams can help us predict "The cat ___"!

## The N-gram Trade-off

**⚖️ Context vs Data Requirements:**

- 📏 **Small N (unigram):** Fast, less data needed, but no context
- 📐 **Large N (5-gram):** Rich context, but needs massive data
- 🎯 **Optimal choice:** Bigrams or trigrams are often best for many tasks
- 🚀 **Modern Language Models:** Can handle much larger contexts efficiently

*N-grams laid the foundation for today's advanced models.*

## Puzzle Time!

**If you train a bigram model only on sports articles, what would happen when you ask it to complete: "The recipe calls for..."**

*Think about how domain-specific data influences predictions.*