# Lab: Experiment With N-Gram Models
## Purpose:
- Estimate next-word probabilities
- Build a (small) n-gram model on a (tiny) dataset.
- Understand n-gram models & their limitations
### Topics:
- Tokenization
- Probability estimation
- Token prediction

Date: 2026-02-14

Source: https://colab.research.google.com/github/google-deepmind/ai-foundations/blob/master/course_1/gdm_lab_1_2_experiment_with_n_gram_models.ipynb#scrollTo=pbtgZxrpjm6j

References: https://github.com/google-deepmind/ai-foundations
- GH repo from DeepMind used in AI training courses at the university & college level.

### Understanding the math
**N-gram**: A continuous sequence of $n$ words.

**Context**: The preceding sequence of $n-1$ words.

**How are n-grams related to the context?** N-gram models use n-grams to estimate the probability of the next word based on the context.

**Text Corpus**: A dataset consisting of a collection of texts

Computing the Probability of the next word
---
Given $\mbox{A}$ is the context

Given $\mbox{B}$ is the next word

Compute the probability $P(\mbox{B} \mid \mbox{A})$:

$$P(\mbox{B} \mid \mbox{A}) = \frac{\mbox{Count}(\mbox{A B})}{\mbox{Count}(\mbox{A})}$$

The full n-gram counts, $\mbox{ Count}(\mbox{A B})$, and the context n-gram counts, $\mbox{ Count}(\mbox{A})$, can be computed by counting n-grams in a dataset (**text corpus**).

### Set up the environment
- for information only

sudo apt update

sudo apt install python3.12

sudo apt install python3.12-venv

python3.12 -m venv .venv

source .venv/bin/activate

In [4]:
%%capture
%%bash
cd .venv/bin
pip3.12 install "git+https://github.com/google-deepmind/ai-foundations.git@main"

In [3]:
# Packages used.
import random # For sampling from probability distributions.
from collections import Counter, defaultdict # For counting n-grams.

import textwrap # For automatically addding linebreaks to long texts.
import pandas as pd # For construction and visualizing tables.

# Custom functions for providing feedback on your solutions.
from ai_foundations.feedback.course_1 import ngrams

ModuleNotFoundError: No module named 'ai_foundations'