<a href="https://colab.research.google.com/github/NaomiGu/Cooking_Frenzy/blob/main/%E2%80%9CLab_1_COGS_150_SP2025_Naomi_ipynb%E2%80%9D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# COGS 150: Lab 1

**Naomi Gu**

*Spring 2025*

This lab is all about $n$-gram language models. Broadly, we'll cover the following concepts:

- What are $n$-grams?  
- Building a simple $n$-gram model.  
- Using an $n$-gram model to calculate **surprisal**.
- Using a language model to **generate** text.

To use this lab:

- First, **make a copy**. Click "File" --> "Save a copy in Drive", then open up and use that version instead.  
- Then, work through the different `code` and *text* cells in this notebook. To modify a cell, *double-click* it. (Try not to delete existing code, as that could cause errors.)
- Once you're done responding to all the questions, **save the notebook as a PDF** and upload it to your Canvas assignment.

In [None]:
%pip install nltk
import nltk
nltk.download('gutenberg')



[nltk_data] Downloading package gutenberg to /root/nltk_data...
[nltk_data]   Unzipping corpora/gutenberg.zip.


True

## Part 1: Introducing $n$-grams

> An $n$-gram is an adjacent sequence of length $n$ of characters or words in a corpus of text.

In this section, you'll learn how to extract $n$-grams. There are a few components to this process:

- **Tokenizing** the text: This means identifying all the words, as well as where each sentence starts and ends.  
- **Extracting $n$-grams**: given a list of all the tokens in a corpus, identify all the sequences of length $n$.  
- **Applying** these components to a larger corpus.

In [None]:
### For making visualizations
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format = 'retina'  # makes figs nicer!

### 1a. Tokenizing

There are many ways to **tokenize** a corpus. Here, we'll opt for a simple approach, which gets rid of all unwanted punctuation (e.g., commas), identifies all the words, and also identifies the *beginning* and *end* of each sentence.

- Each unique word token will be represented as an item in a Python list.  
- The beginning and end of a sentence will be represented as `<s>` and `</s>`, respectively.

**Reflection**: If we're buliding a model of language, why is it useful to identify where sentences tend to begin and end?

In [None]:
small_corpus = "The cat chased the mouse. The mouse hid in the wall. The cat could not find the mouse."

In [None]:
import re

def tokenize(text):
    # Split the text into sentences
    sentences = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text)

    # Tokenize each sentence and add start (<s>) and stop (</s>) tokens
    tokens = []
    for sentence in sentences:
        # Add the start token
        tokens.append('<s>')
        # Split the sentence into words by whitespace and remove non-alphanumeric characters, then add to tokens
        tokens.extend(re.findall(r'\b\w+\b', sentence.lower()))
        # Add the stop token
        tokens.append('</s>')

    return tokens

In [None]:
### Here's an example of how this works.
tokenize("The man walked to the store.")

['<s>', 'the', 'man', 'walked', 'to', 'the', 'store', '</s>']

#### Questions

Write a line of code that calls "tokenize" on "small_corpus". Call the result `tokens`.

In [None]:
### Your code here
tokens = tokenize(small_corpus)

### 1b. Extract (and count) $n$-grams

To get a better sense of what an $n$-gram is, we'll first look at a function for identifying and counting all the $n$-grams in a big corpus of text.

In [None]:
from collections import defaultdict, Counter

def extract_ngrams(tokens, n):
    """Identifies sequences of length n in tokens, and counts how many times it occurs."""
    ngrams = zip(*[tokens[i:] for i in range(n)])
    return Counter([" ".join(ngram) for ngram in ngrams])

In [None]:
### Here's an example of how this function works
ngrams = extract_ngrams(tokens, 2)
len(ngrams)

16

#### Questions

1. How many unique $n$-grams of length 2 are in `small_corpus`? (Hint: use `len` on `ngrams`.)
2. What is the most common $n$-gram? (Hint: use `ngrams.most_common()`)

In [None]:
### Your code here
len(ngrams)

16

In [None]:
ngrams.most_common()

[('<s> the', 3),
 ('the mouse', 3),
 ('the cat', 2),
 ('mouse </s>', 2),
 ('</s> <s>', 2),
 ('cat chased', 1),
 ('chased the', 1),
 ('mouse hid', 1),
 ('hid in', 1),
 ('in the', 1),
 ('the wall', 1),
 ('wall </s>', 1),
 ('cat could', 1),
 ('could not', 1),
 ('not find', 1),
 ('find the', 1)]

In [None]:
print("1. 16")
print("2. '<s> the' and 'the mouse' is the most common n-gram.")

1. 16
2. '<s> the' and 'the mouse' is the most common n-gram.


### 1c. Apply this to a larger corpus.

Now, let's apply this to a much larger **corpus** of text: the book *Emma*, by Jane Austen.

In [None]:
import nltk

emma = ' '.join(nltk.corpus.gutenberg.words('austen-emma.txt'))
emma = emma.replace("Mr .", "Mr").replace("Mrs .", "Mrs")

In [None]:
emma_tokens = tokenize(emma)
len(emma_tokens)

172495

#### Questions

1. Use `extract_ngrams` to identify all n-grams of length $2$ from `emma_tokens`.
2. How many are there? (Use `len`.)
3. What is the most common n-gram? What about the second most common? (Use `ngrams.most_common()`.)  
4. Now use `extract_ngrams` to identify all n-grams of length $3$. How many are there? Which is most common?

In [None]:
### Your code here
bigram = extract_ngrams(emma_tokens,2)

In [None]:
len(bigram)

65567

In [None]:
bigram.most_common()

[('</s> <s>', 5255),
 ('<s> i', 676),
 ('to be', 607),
 ('of the', 564),
 ('it was', 448),
 ('in the', 445),
 ('<s> she', 431),
 ('i am', 395),
 ('she had', 332),
 ('she was', 328),
 ('<s> he', 319),
 ('had been', 308),
 ('it is', 299),
 ('mr knightley', 299),
 ('<s> it', 286),
 ('i have', 281),
 ('could not', 278),
 ('of her', 262),
 ('<s> the', 258),
 ('mrs weston', 256),
 ('have been', 243),
 ('he had', 240),
 ('to the', 237),
 ('do not', 235),
 ('mr elton', 229),
 ('and the', 224),
 ('he was', 222),
 ('would be', 216),
 ('such a', 200),
 ('a very', 199),
 ('of his', 191),
 ('to her', 188),
 ('and i', 186),
 ('to have', 184),
 ('that she', 184),
 ('did not', 183),
 ('must be', 181),
 ('that he', 181),
 ('i do', 181),
 ('in a', 180),
 ('<s> you', 179),
 ('miss woodhouse', 173),
 ('she could', 171),
 ('<s> but', 170),
 ('for the', 169),
 ('mr weston', 167),
 ('all the', 167),
 ('it </s>', 166),
 ('any thing', 165),
 ('was not', 164),
 ('will be', 160),
 ('but i', 154),
 ('frank church

In [None]:
trigram = extract_ngrams(emma_tokens,3)

In [None]:
trigram.most_common()

[('</s> <s> i', 676),
 ('</s> <s> she', 431),
 ('</s> <s> he', 319),
 ('</s> <s> it', 286),
 ('</s> <s> the', 258),
 ('</s> <s> you', 179),
 ('</s> <s> but', 170),
 ('it </s> <s>', 166),
 ('</s> <s> mr', 138),
 ('i do not', 135),
 ('<s> it was', 119),
 ('</s> <s> they', 117),
 ('her </s> <s>', 114),
 ('i am sure', 109),
 ('</s> <s> and', 108),
 ('</s> <s> emma', 104),
 ('<s> i am', 93),
 ('</s> <s> a', 88),
 ('</s> <s> there', 87),
 ('<s> she was', 86),
 ('</s> <s> if', 78),
 ('</s> <s> her', 76),
 ('</s> <s> mrs', 76),
 ('she could not', 72),
 ('</s> <s> this', 70),
 ('<s> it is', 69),
 ('him </s> <s>', 69),
 ('you </s> <s>', 68),
 ('</s> <s> we', 67),
 ('<s> i have', 67),
 ('a great deal', 64),
 ('<s> he had', 63),
 ('it would be', 63),
 ('<s> she had', 62),
 ('would have been', 60),
 ('me </s> <s>', 60),
 ('</s> <s> my', 59),
 ('do not know', 55),
 ('it was not', 55),
 ('</s> <s> what', 54),
 ('it was a', 53),
 ('she had been', 53),
 ('</s> <s> oh', 52),
 ('them </s> <s>', 50),
 ('i

In [None]:
print("2.65567")
print("3.'</s> <s>'is the most common and '<s> i' is the second most common")
print("4.'</s> <s> i' is the most common")

2.65567
3.'</s> <s>'is the most common and '<s> i' is the second most common
4.'</s> <s> i' is the most common


## Part 2: Building a simple $n$-gram model

> An **n-gram language model** is a statistical language model, which assigns a probability to some word $w$ as a function of the $(n-1)$ words preceding $w$. For a bigram model, then, this could be written as: $p(w_i | w_{i-1})$


We'll break this down into steps:

1. Theoretical foundations.  
2. Building a simple *bigram* model.  
3. Generalizing to an $n$-gram model.

### 2a: Theoretical foundations

We want to estimate: $p(w_i | w_{i-1})$

Usually, this **conditional probability** is based on the number of times word $w_i$ occurs in a given context, relative to the number of times that *context* appears.

For a bigram model, we could write this as follows:

$p(w_i | w_{i-1}) = \frac{Count(w_{i-1}, w_i)}{Count(w_{i-1})}$

For example, $p(dog|the)$ would be calculated by dividing the number of times "the dog" occurs by the number of times "the" occurs.

### 2b: Build a *bigram* model.  

Now let's build a **bigram** model. To represent our $n$-grams, we'll use a nested dictionary structure that looks something like this:

```python
{'the':
     {'dog': 5,
      'cat': 5,
      'person': 10}
}
```

Here, each number represents the number of times that word (e.g., "dog") occurs after the word `"the"`. Those numbers could be converted to probabilities by dividing them by the *sum* of their values.

```python
{'the':
     {'dog': .25,
      'cat': .25,
      'person': .5}
}
```

The function below builds a bigram model for you. Take a look at the function and see if you can figure out what it's doing.

In [None]:
from collections import defaultdict

def build_bigram_model(tokens):
    model = defaultdict(lambda: defaultdict(int))
    for i in range(len(tokens)-1):
        # Get the n-gram and the word following it
        ngram = tuple(tokens[i:i+1])
        next_word = tokens[i+1]
        model[ngram][next_word] += 1
    return model

In [None]:
tokens = tokenize(small_corpus)
bigram_model = build_bigram_model(tokens)
bigram_model

defaultdict(<function __main__.build_bigram_model.<locals>.<lambda>()>,
            {('<s>',): defaultdict(int, {'the': 3}),
             ('the',): defaultdict(int, {'cat': 2, 'mouse': 3, 'wall': 1}),
             ('cat',): defaultdict(int, {'chased': 1, 'could': 1}),
             ('chased',): defaultdict(int, {'the': 1}),
             ('mouse',): defaultdict(int, {'</s>': 2, 'hid': 1}),
             ('</s>',): defaultdict(int, {'<s>': 2}),
             ('hid',): defaultdict(int, {'in': 1}),
             ('in',): defaultdict(int, {'the': 1}),
             ('wall',): defaultdict(int, {'</s>': 1}),
             ('could',): defaultdict(int, {'not': 1}),
             ('not',): defaultdict(int, {'find': 1}),
             ('find',): defaultdict(int, {'the': 1})})

In [None]:
### index into the model like so
bigram_model[('the',)]

defaultdict(int, {'cat': 2, 'mouse': 3, 'wall': 1})

#### Questions

1. What are all the words that occur after the word `the`?  
2. What is the probability: $p(cat | the)$?
3. What about $p(mouse | the)$?

In [None]:
### Your code here
bigram_model[('the',)]

defaultdict(int, {'cat': 2, 'mouse': 3, 'wall': 1})

In [None]:
print("1.Words that occur after the word 'the' are: 'cat', 'mouse', and 'wall'")
print("2.𝑝(𝑐𝑎𝑡|𝑡ℎ𝑒) = 2/(2+3+1) = 2/6 = 0.333")
print("3.𝑝(𝑚𝑜𝑢𝑠𝑒|𝑡ℎ𝑒) = 3/(2+3+1) = 3/6 = 0.5")

1.Words that occur after the word 'the' are: 'cat', 'mouse', and 'wall'
2.𝑝(𝑐𝑎𝑡|𝑡ℎ𝑒) = 2/(2+3+1) = 2/6 = 0.333
3.𝑝(𝑚𝑜𝑢𝑠𝑒|𝑡ℎ𝑒) = 3/(2+3+1) = 3/6 = 0.5


### 2c: Build an *n-gram* model.  

Now let's build a more general $n$-gram model. Unlike the bigram model, this will allow us to represent contexts of arbitrary length `n`.

In [None]:
def build_ngram_model(tokens, n):
    model = defaultdict(lambda: defaultdict(int))
    for i in range(len(tokens)-(n-1)):
        # Get the n-gram and the word following it
        ngram = tuple(tokens[i:i+n-1])
        next_word = tokens[i+n-1]
        model[ngram][next_word] += 1
    return model

In [None]:
tokens = tokenize(small_corpus)
trigram_model = build_ngram_model(tokens, 3)
trigram_model

defaultdict(<function __main__.build_ngram_model.<locals>.<lambda>()>,
            {('<s>', 'the'): defaultdict(int, {'cat': 2, 'mouse': 1}),
             ('the', 'cat'): defaultdict(int, {'chased': 1, 'could': 1}),
             ('cat', 'chased'): defaultdict(int, {'the': 1}),
             ('chased', 'the'): defaultdict(int, {'mouse': 1}),
             ('the', 'mouse'): defaultdict(int, {'</s>': 2, 'hid': 1}),
             ('mouse', '</s>'): defaultdict(int, {'<s>': 1}),
             ('</s>', '<s>'): defaultdict(int, {'the': 2}),
             ('mouse', 'hid'): defaultdict(int, {'in': 1}),
             ('hid', 'in'): defaultdict(int, {'the': 1}),
             ('in', 'the'): defaultdict(int, {'wall': 1}),
             ('the', 'wall'): defaultdict(int, {'</s>': 1}),
             ('wall', '</s>'): defaultdict(int, {'<s>': 1}),
             ('cat', 'could'): defaultdict(int, {'not': 1}),
             ('could', 'not'): defaultdict(int, {'find': 1}),
             ('not', 'find'): defaultdict(

#### Questions

1. Calculate the probability: $p(cat|<s>, the)$. How does this compare to $p(cat|the)$ from earlier?
2. Calculate the probability: $p(mouse|<s>, the)$. How does this compare to $p(mouse|the)$ from earlier?
3. Why do you think these probabilities are different from the bigram model?

In [None]:
### Your code here
tokens = tokenize(small_corpus)
trigram_model = build_ngram_model(tokens, 3)
trigram_model

defaultdict(<function __main__.build_ngram_model.<locals>.<lambda>()>,
            {('<s>', 'the'): defaultdict(int, {'cat': 2, 'mouse': 1}),
             ('the', 'cat'): defaultdict(int, {'chased': 1, 'could': 1}),
             ('cat', 'chased'): defaultdict(int, {'the': 1}),
             ('chased', 'the'): defaultdict(int, {'mouse': 1}),
             ('the', 'mouse'): defaultdict(int, {'</s>': 2, 'hid': 1}),
             ('mouse', '</s>'): defaultdict(int, {'<s>': 1}),
             ('</s>', '<s>'): defaultdict(int, {'the': 2}),
             ('mouse', 'hid'): defaultdict(int, {'in': 1}),
             ('hid', 'in'): defaultdict(int, {'the': 1}),
             ('in', 'the'): defaultdict(int, {'wall': 1}),
             ('the', 'wall'): defaultdict(int, {'</s>': 1}),
             ('wall', '</s>'): defaultdict(int, {'<s>': 1}),
             ('cat', 'could'): defaultdict(int, {'not': 1}),
             ('could', 'not'): defaultdict(int, {'find': 1}),
             ('not', 'find'): defaultdict(

In [None]:
print("1.𝑝(𝑐𝑎𝑡|<𝑠>,𝑡ℎ𝑒) = 2/3 = 0.667, it has higher probability compare to 𝑝(𝑐𝑎𝑡|𝑡ℎ𝑒) from earlier")
print("2.𝑝(𝑚𝑜𝑢𝑠𝑒|<𝑠>,𝑡ℎ𝑒) = 1/3 = 0.333, it has lower probability compare to 𝑝(𝑐𝑎𝑡|𝑡ℎ𝑒) from earlier")
print("3.Because unlike the trigram model that considers two-word context- '<𝑠>,𝑡ℎ𝑒', the bigram model only considers one word 'the'. In this case, the trigram model is more specific and narrow down the search, which alters the probability.")

1.𝑝(𝑐𝑎𝑡|<𝑠>,𝑡ℎ𝑒) = 2/3 = 0.667, it has higher probability compare to 𝑝(𝑐𝑎𝑡|𝑡ℎ𝑒) from earlier
2.𝑝(𝑚𝑜𝑢𝑠𝑒|<𝑠>,𝑡ℎ𝑒) = 1/3 = 0.333, it has lower probability compare to 𝑝(𝑐𝑎𝑡|𝑡ℎ𝑒) from earlier
3.Because unlike the trigram model that considers two-word context- '<𝑠>,𝑡ℎ𝑒', the bigram model only considers one word 'the'. In this case, the trigram model is more specific and narrow down the search, which alters the probability.


## Part 3: Calculating *surprisal*

> **Surprisal** is defined as the negative log probability of an event. This term comes from [information theory](https://en.wikipedia.org/wiki/Information_content), and measures the "unexpectedness" of an event.

Surprisal is defined as follows: $Surprisal(x) = -log_2(p(x))$. For language models, surprisal ends up being a really useful way to *evaluate* the model, and also *measure* how likely different words are in different contexts.

In this section, we'll:

- Learn about *bits*.  
- Implement and use a function to calculate the surprisal of a word in context, from an n-gram model.

### 3a: What are "bits"?

Surprisal is usually measured in $log_2$.

This is because information theory is interested in [bits](https://en.wikipedia.org/wiki/Bit): a logical state with two possible values (`1` vs. `0`). You can think of a *bits* as measuring the number of binary coin flips you'd need to arrive at a certain outcome. E.g., for an event of probability $p = 0.5$, you only need to flip a coin once to determine the outcome.  

In [None]:
import math

def surprisal(p):
    return -math.log2(p)

In [None]:
surprisal(.5)

1.0

#### Questions

1. What is the `surprisal` of $p = 0.25$?  
2. What about $p = 0.1$?
3. Which is larger? What does that tell us about how `surprisal` relates to probability?

In [None]:
### Your code here
surprisal(.25)

2.0

In [None]:
surprisal(.1)

3.321928094887362

In [None]:
print("1.Surprisal of 𝑝 = 0.25 is 2.0")
print("2.Surprisal of 𝑝 = 0.1 is 3.322")
print("3.Surprisal of 𝑝 = 0.1 is larger. High probability has low surprisal, low probability has high surprisal.")

1.Surprisal of 𝑝 = 0.25 is 2.0
2.Surprisal of 𝑝 = 0.1 is 3.322
3.Surprisal of 𝑝 = 0.1 is larger. High probability has low surprisal, low probability has high surprisal.


### 3b: Surprisal and n-gram models

In this section, we define a new function, called `calculate_surprisal`. Given a trained n-gram `model`, a `context`, and a `word`, this function calculates the probability of `word` given the `context`.

- Take a moment to look through the function and see if you can understand how it works.
- The cell below the function contains some examples of the function in action. Feel free to modify these or write your own to experiment.


In [None]:
def calculate_surprisal(model, context, word):
    # Calculate the probability of the word given the context
    # In a bigram model, the context is just the previous word
    # In an n-gram model, the context is the previous n-1 words
    context = tuple(context)
    if context in model and word in model[context]:
        # Calculate the probability of the word given the context
        word_count = model[context][word]
        total_count = sum(model[context].values())
        probability = word_count / total_count
        # Calculate the surprisal
        surprisal = -math.log2(probability)
    else:
        # If the context or word is not found, the surprisal is infinite
        surprisal = -float('inf')
    return surprisal

In [None]:
## Example
print(calculate_surprisal(trigram_model, ('<s>', 'the'), 'mouse'))
print(calculate_surprisal(trigram_model, ('<s>', 'the'), 'cat'))
print(calculate_surprisal(trigram_model, ('<s>', 'the'), 'dog'))

1.5849625007211563
0.5849625007211563
-inf


#### Questions

1. Why is the surprisal of "mouse" higher than "cat"?  
2. Why is the surprisal of "dog" `-inf`?
3. What technique discussed in class would address the issue of a surprisal of `-inf`?

In [None]:
print("1.Because probability wise, cat's more likely to show up than mouse.")
print("2.Because dog is not in the model, it will never shows up.")
print("3.With smoothing, dog will get a very small probability, and the outcome will not become -inf.")

1.Because probability wise, cat's more likely to show up than mouse.
2.Because dog is not in the model, it will never shows up.
3.With smoothing, dog will get a very small probability, and the outcome will not become -inf.


## Part 4: Generating Text

In this section, we'll use our n-gram models to **generate** text.

This process typically works as follows:

- First, **fit** an n-gram model to a corpus.  
- Then, **seed** the n-gram model with a start character (e.g., `<s>`).  
- Select the most likely next token, given that start character.  
- Continue this process until you've generated either the desired number of tokens or generated an end-of-sentence character (e.g., `</s>`).

Because this will involve more complex functions, we'll move away from *custom* functions and we'll use functions from an existing library called `nltk`.

In [None]:
### Libraries to import
from nltk.lm import Laplace
from nltk.lm.preprocessing import padded_everygram_pipeline, pad_both_ends
from nltk.util import ngrams

### 4a. Fitting bigram model.

In this section, we fit a bigram model using the `Laplace` and `fit` functions.

For more details, check out the [`nltk.lm` package documentation](https://www.nltk.org/api/nltk.lm.html). Try to understand what each code block is doing below.

In [None]:
# Generate padded bigrams and vocabulary for training data
n = 2  # Bigram model
train_data, vocab = padded_everygram_pipeline(n, [emma_tokens])

In [None]:
# Build and train the bigram model
model = Laplace(n)
model.fit(train_data, vocab)

#### Questions

1. What does `Laplace` refer to, and why is it used?
2. What other related techniques did we discuss in class?

In [None]:
print("1.Laplace refers to a smoothing technique in n-gram model. Laplace adds 1 to the probability of every word, so that there will not be none zero probability.")
print("2.We also discussed the back-off technique in class.")

1.Laplace refers to a smoothing technique in n-gram model. Laplace adds 1 to the probability of every word, so that there will not be none zero probability.
2.We also discussed the back-off technique in class.


### 4b. Generating text

We can now *generate* text using this fit model, using:

```
model.generate(num_words)
```

As you'll see, we can also add a `text_seed` to specify which word we want to start with.


In [None]:
### No text seed
num_tokens = 10
tokens = model.generate(num_tokens)
print(' '.join(tokens))

passed since my friend and growing late </s> <s> but


In [None]:
tokens = model.generate(num_tokens, text_seed = ['this'])
print(' '.join(tokens))

reflected more voluntary praise emma putting all reaching it was


#### Questions

1. What do you think about the text the model is generating? Totally random? Sensible at all?
2. Try fitting the model object to different values of $n$. Qualitatively, do you notice any differences between, say, a unigram model ($n = 1$) and a trigram model ($n = 3$)?

**Note**: To refit the model, you'll need to rerun these lines:

```
train_data, vocab = padded_everygram_pipeline(n, [emma_tokens])
model = Laplace(n) ### Where n = the desired n-gram model
model.fit(train_data, vocab)
```

In [None]:
### Your code here
train_data, vocab = padded_everygram_pipeline(n, [emma_tokens])
model = Laplace(n) ### Where n = the desired n-gram model
model.fit(train_data, vocab)

In [None]:
train_data, vocab = padded_everygram_pipeline(n, [emma_tokens])
model = Laplace(3)
model.fit(train_data, vocab)

In [None]:
print("1.The sentence is not totally random, but clearly doesn't make full sense.")
print("2.A trigram model seems to generate texts that make more sense than a unigram model.")

1.The sentence is not totally random, but clearly doesn't make full sense.
2.A trigram model seems to generate texts that make more sense than a unigram model.


## Final reflections

Now that you've learned a little more about how $n$-gram models work (and how to build simple versions in Python): do you think an $n$-gram model "understands" language? Why or why not?

Although n-gram models can generate language-like sentence, I don't think these models really "understand" language like humans do. Because the calculations are solely based on statistical patterns.