# Understanding Sentence Probability

## What is Sentence Probability?

Ever wondered how likely a sentence is to occur? 
In language models, we assign a **probability score** to sentences to measure how natural or common they are.

### Key Ideas:
- The probability of a sentence is represented as P(sentence).
- We use the **chain rule** to compute it:
  
  P(w1, w2, w3, ...) = P(w1) × P(w2|w1) × P(w3|w1,w2) ...
- Higher probability indicates a more natural or common sentence.
- Lower probability suggests the sentence is unusual or unlikely.

## Real-World Analogy

Think of sentence probability like following a recipe:
- A **common recipe** like "Add salt and pepper" has **HIGH probability**.
- An **odd recipe** like "Add salt and toothpaste" has **LOW probability**.

The more experience you have (more cookbooks you read), the better you can judge this.
Language models do the same, but with precise math!

## Examples of Sentence Probability

- **High Probability:** "I love pizza" (sounds natural)
- **Medium Probability:** "Pizza loves I" (grammatically incorrect)
- **Low Probability:** "Elephant purple flying Wednesday" (nonsense)

The role of language models is to assign accurate probability scores to sentences.

_Better models = more accurate probability estimates!_

## Demonstration: Computing Sentence Probability

Let's walk through an example to see how sentence probabilities are calculated!

### Example: Probability of "I love cats"

In [None]:
# Calculating: P("I love cats")
import math

# Word probabilities from training data
word_probs = {
    "I": 0.1,           # P(I)
    "love|I": 0.3,      # P(love | I)
    "cats|love": 0.15   # P(cats | love)
}

# Sentence probability = product of conditional probabilities
sentence_prob = word_probs["I"] * word_probs["love|I"] * word_probs["cats|love"]
print(f"P('I love cats') = {sentence_prob:.4f}")
print(f"Log probability = {math.log(sentence_prob):.2f}")

**Output:**
P('I love cats') = 0.0045
Log probability = -5.41

## Simplifying Sentence Probability

Think of sentence probability as a multiplication chain:
- Each word's probability depends on the previous words.
- Multiply these individual probabilities together.
- The final result indicates how likely the entire sentence is.
- A higher score means a more natural sentence.

## Visualizing Probability as a Chain

Let's imagine the sentence as a chain of probabilities:
- e.g., "The cat sat"

We can visually trace the probability flow step-by-step.

## Why Sentence Probability Matters

Practical applications include:
- **Text quality:** Distinguishing human-written from AI-generated text
- **Autocorrect:** Fixing misspelled or incorrect words
- **Speech recognition:** Choosing the most probable words
- **Text generation:** Creating more natural and coherent sentences

_Understanding sentence probability helps improve many language-based technologies._

## A Thought to Consider...

Multiplying many tiny probabilities results in very small numbers.
Why might this be a problem for computers?
And how could we address this issue?