# AIG230 NLP (Week 3 Lab) â€” Notebook 2: Statistical Language Models (Train, Test, Evaluate)

This notebook focuses on **n-gram Statistical Language Models (SLMs)**:
- Train **unigram**, **bigram**, **trigram** models
- Handle **OOV** with `<UNK>`
- Apply **smoothing** (Add-k)
- Evaluate with **cross-entropy** and **perplexity**
- Do **next-word prediction** and simple **text generation**

> Industry framing: even if modern systems use neural LMs, n-gram LMs are still useful for
baselines, constrained domains, and for understanding evaluation.


## 0) Setup


In [None]:
import re
import math
import random
from collections import Counter, defaultdict
from typing import List, Tuple, Dict


## 1) Data: domain text you might see in real systems


We use short texts that resemble:
- release notes
- incident summaries
- operational runbooks
- customer support messaging

In practice, you would load thousands to millions of lines.


In [57]:

corpus = [
    "vpn disconnects frequently after windows update",
    "password reset link expired user cannot login",
    "api requests timeout when latency spikes",
    "portal returns 500 error after deployment",
    "email delivery delayed messages queued",
    "mfa prompt never arrives user stuck at login",
    "wifi drops in meeting rooms access point reboot helps",
    "outlook search not returning results index corrupted",
    "printer driver install fails with error 1603",
    "teams calls choppy audio jitter high",
    "permission denied accessing shared drive though in correct group",
    "battery drains fast after bios update power settings unchanged",
    "push notifications not working on android app",
    "mailbox full cannot receive emails auto archive not running",
    "server overloaded capacity reached",
    "database connection lost service unavailable",
    "disk space low performance degraded",
    "application crashed unexpectedly log analysis needed",
    "network latency high affecting user experience",
    "authentication failed invalid credentials provided",
    "storage quota exceeded cannot upload files",
    "memory leak detected service restarting",
    "cpu utilization spiking abnormal process activity",
    "firewall blocking access to external resources",
    "dns resolution failing domain not found",
    "load balancer misconfigured traffic routing issues",
    "certificate expired secure connection failed",
    "backup process failed data integrity compromised",
    "monitoring system alert threshold breached",
    "container exited unexpectedly check logs",
    "queue full message processing stalled",
    "api key invalid access denied",
    "firmware update failed device unresponsive",
    "geolocation service inaccurate location data",
    "billing discrepancy incorrect invoice generated",
    "user interface unresponsive freezing on interactions",
    "data synchronization error inconsistent records",
    "scheduler failed tasks not executing",
    "encryption key missing secure communication impossible"
]

# Train/test split at sentence level
random.seed(42)
random.shuffle(corpus)
split = int(0.75 * len(corpus))
train_texts = corpus[:split]
test_texts = corpus[split:]

len(train_texts), len(test_texts), train_texts[:2], test_texts[:2]


(29,
 10,
 ['firewall blocking access to external resources',
  'geolocation service inaccurate location data'],
 ['mailbox full cannot receive emails auto archive not running',
  'network latency high affecting user experience'])

## 2) Tokenization + special tokens


We will:
- lowercase
- keep alphanumerics
- split on whitespace
- add sentence boundary tokens: `<s>` and `</s>`

We will also map rare tokens to `<UNK>` based on training frequency.


In [58]:

def tokenize(text: str) -> List[str]:
    text = text.lower()
    text = re.sub(r"[^a-z0-9\s]+", " ", text)
    text = re.sub(r"\s+", " ", text).strip()
    return text.split()

def add_boundaries(tokens: List[str], n: int) -> List[str]:
    # For n-grams, prepend (n-1) start tokens for simpler context handling
    return ["<s>"]*(n-1) + tokens + ["</s>"]

# Example
tokens = tokenize("Printer driver install fails with error 1603")
add_boundaries(tokens, n=3)


['<s>',
 '<s>',
 'printer',
 'driver',
 'install',
 'fails',
 'with',
 'error',
 '1603',
 '</s>']

## 3) Build vocabulary and handle OOV with <UNK>


In [59]:

# Build vocab from training data
train_tokens_flat = []
for t in train_texts:
    train_tokens_flat.extend(tokenize(t))

freq = Counter(train_tokens_flat)

# Typical practical rule: map tokens with frequency <= 1 to <UNK> in small corpora
min_count = 1
vocab = {w for w, c in freq.items() if c >= min_count}
vocab |= {"<UNK>", "<s>", "</s>"}

def replace_oov(tokens: List[str], vocab: set) -> List[str]:
    return [tok if tok in vocab else "<UNK>" for tok in tokens]

# Show OOV effect
sample = tokenize(test_texts[0])
sample, replace_oov(sample, vocab)


(['mailbox',
  'full',
  'cannot',
  'receive',
  'emails',
  'auto',
  'archive',
  'not',
  'running'],
 ['<UNK>',
  'full',
  'cannot',
  '<UNK>',
  '<UNK>',
  '<UNK>',
  '<UNK>',
  'not',
  '<UNK>'])

## 4) Train n-gram counts (unigram, bigram, trigram)


We will compute:
- `ngram_counts[(w1,...,wn)]`
- `context_counts[(w1,...,w_{n-1})]`

Then probability:
\ndefault:  P(w_n | context) = count(context + w_n) / count(context)

This fails when an n-gram is unseen, so we add smoothing.


In [60]:
def get_ngrams(tokens: List[str], n: int) -> List[Tuple[str, ...]]:
    return [tuple(tokens[i:i+n]) for i in range(len(tokens)-n+1)]
def train_ngrams_counts(texts: List[str], n: int, vocab: set) -> Dict[Tuple[str, ...], int]:
    ngram_counts = Counter()
    context_counts = Counter()
    for t in texts:
        tokens = replace_oov(tokenize(t), vocab)
        tokens = add_boundaries(tokens, n)
        ngrams = get_ngrams(tokens, n)
        for ng in ngrams:
            ngram_counts[ng] += 1
            context = ng[:-1]
            context_counts[context] += 1
    return ngram_counts, context_counts
    

In [61]:
uni_counts, uni_ctx = train_ngrams_counts(train_texts, 1, vocab) # type: ignore

In [62]:
uni_counts

Counter({('</s>',): 29,
         ('failed',): 5,
         ('data',): 3,
         ('after',): 3,
         ('update',): 3,
         ('not',): 3,
         ('access',): 2,
         ('service',): 2,
         ('error',): 2,
         ('denied',): 2,
         ('unresponsive',): 2,
         ('on',): 2,
         ('api',): 2,
         ('key',): 2,
         ('invalid',): 2,
         ('process',): 2,
         ('secure',): 2,
         ('firewall',): 1,
         ('blocking',): 1,
         ('to',): 1,
         ('external',): 1,
         ('resources',): 1,
         ('geolocation',): 1,
         ('inaccurate',): 1,
         ('location',): 1,
         ('portal',): 1,
         ('returns',): 1,
         ('500',): 1,
         ('deployment',): 1,
         ('battery',): 1,
         ('drains',): 1,
         ('fast',): 1,
         ('bios',): 1,
         ('power',): 1,
         ('settings',): 1,
         ('unchanged',): 1,
         ('permission',): 1,
         ('accessing',): 1,
         ('shared',): 1,
        

In [63]:
bi_counts, bi_ctx = train_ngrams_counts(train_texts, 2, vocab)

In [64]:
bi_counts

Counter({('<s>', 'api'): 2,
         ('<s>', 'firewall'): 1,
         ('firewall', 'blocking'): 1,
         ('blocking', 'access'): 1,
         ('access', 'to'): 1,
         ('to', 'external'): 1,
         ('external', 'resources'): 1,
         ('resources', '</s>'): 1,
         ('<s>', 'geolocation'): 1,
         ('geolocation', 'service'): 1,
         ('service', 'inaccurate'): 1,
         ('inaccurate', 'location'): 1,
         ('location', 'data'): 1,
         ('data', '</s>'): 1,
         ('<s>', 'portal'): 1,
         ('portal', 'returns'): 1,
         ('returns', '500'): 1,
         ('500', 'error'): 1,
         ('error', 'after'): 1,
         ('after', 'deployment'): 1,
         ('deployment', '</s>'): 1,
         ('<s>', 'battery'): 1,
         ('battery', 'drains'): 1,
         ('drains', 'fast'): 1,
         ('fast', 'after'): 1,
         ('after', 'bios'): 1,
         ('bios', 'update'): 1,
         ('update', 'power'): 1,
         ('power', 'settings'): 1,
         ('setti

In [65]:
tri_counts, tri_ctx = train_ngrams_counts(train_texts, 3, vocab)

In [66]:
tri_counts

Counter({('<s>', '<s>', 'api'): 2,
         ('<s>', '<s>', 'firewall'): 1,
         ('<s>', 'firewall', 'blocking'): 1,
         ('firewall', 'blocking', 'access'): 1,
         ('blocking', 'access', 'to'): 1,
         ('access', 'to', 'external'): 1,
         ('to', 'external', 'resources'): 1,
         ('external', 'resources', '</s>'): 1,
         ('<s>', '<s>', 'geolocation'): 1,
         ('<s>', 'geolocation', 'service'): 1,
         ('geolocation', 'service', 'inaccurate'): 1,
         ('service', 'inaccurate', 'location'): 1,
         ('inaccurate', 'location', 'data'): 1,
         ('location', 'data', '</s>'): 1,
         ('<s>', '<s>', 'portal'): 1,
         ('<s>', 'portal', 'returns'): 1,
         ('portal', 'returns', '500'): 1,
         ('returns', '500', 'error'): 1,
         ('500', 'error', 'after'): 1,
         ('error', 'after', 'deployment'): 1,
         ('after', 'deployment', '</s>'): 1,
         ('<s>', '<s>', 'battery'): 1,
         ('<s>', 'battery', 'drains'): 

## 5) Add-k smoothing and probability function


Add-k smoothing (a common baseline):
\na) Add *k* to every possible next word count  
b) Normalize by context_count + k * |V|

P_k(w|h) = (count(h,w) + k) / (count(h) + k*|V|)

Where V is the vocabulary.


In [67]:
def prob_addk(ngram: Tuple[str, ...], ngram_counts: Dict[Tuple[str, ...], int], context_counts: Dict[Tuple[str, ...], int], V: int, k: float = 1.0) -> float:
    # (count(h,w) + k) / (count(h) + k*|V|)
    if len(ngram) == 1: # Unigram probability
        word = ngram[0]
        total_words_plus_start = context_counts[tuple()] if tuple() in context_counts else sum(ngram_counts.values())

        return (ngram_counts.get(ngram, 0) + k) / (total_words_plus_start + k * V)
    else:
        context = ngram[:-1]
        numerator = ngram_counts.get(ngram, 0) + k
        denominator = context_counts.get(context, 0) + k * V
        if denominator == 0:
            return 1e-10
        return numerator / denominator


uni_counts, uni_ctx = train_ngrams_counts(train_texts, 1, vocab)
bi_counts, bi_ctx = train_ngrams_counts(train_texts, 2, vocab)
tri_counts, tri_ctx = train_ngrams_counts(train_texts, 3, vocab)




## 6) Evaluate: cross-entropy and perplexity on test set


We evaluate an LM by how well it predicts held-out text.

Cross-entropy (average negative log probability):
H = - (1/N) * sum log2 P(w_i | context)

Perplexity:
PP = 2^H

Lower perplexity is better.


In [68]:
def perplexity(test_texts: List[str], n: int,
               ngram_counts: Dict[Tuple[str, ...], int],
               context_counts: Dict[Tuple[str, ...], int],
               vocab: set,
               delta: float = 0.0) -> float:
    N = 0  # total number of tokens
    log_prob_sum = 0.0  # sum of log probabilities

    V = len(vocab)

    for t in test_texts:
        tokens = replace_oov(tokenize(t), vocab)
        tokens = add_boundaries(tokens, n)
        ngrams = get_ngrams(tokens, n)

        for ng in ngrams:
            N += 1
            context = ng[:-1]
            count_ng = ngram_counts.get(ng, 0)
            count_ctx = context_counts.get(context, 0)

            # Apply add-delta smoothing
            prob = (count_ng + delta) / (count_ctx + delta * V)
            log_prob_sum += math.log(prob) if prob > 0 else float('-inf')

    avg_log_prob = log_prob_sum / N
    perplexity = math.exp(-avg_log_prob)
    return perplexity


In [69]:
pp_uni = perplexity(test_texts, 1, uni_counts, uni_ctx, vocab, delta=1.0)
pp_bi = perplexity(test_texts, 2, bi_counts, bi_ctx, vocab, delta=1.0)
pp_tri = perplexity(test_texts, 3, tri_counts, tri_ctx, vocab, delta=1.0)
print(f"Unigram Perplexity: {pp_uni:.2f}")
print(f"Bigram Perplexity: {pp_bi:.2f}")
print(f"Trigram Perplexity: {pp_tri:.2f}")

Unigram Perplexity: 184.42
Bigram Perplexity: 150.75
Trigram Perplexity: 150.43


## 7) Next-word prediction (top-k)


Given a context, compute the probability of each candidate next token and return the top-k.

This mirrors:
- autocomplete in constrained domains
- template suggestion systems
- command prediction in runbooks


In [70]:

def next_word_topk(context_tokens: List[str], n: int, ngram_counts: Dict[Tuple[str, ...], int], context_counts: Dict[Tuple[str, ...], int], vocab: set, k_smooth: float = 1.0, top_k: int = 5) -> List[Tuple[str, float]]:
	V = len(vocab)
	context = tuple(context_tokens[-(n-1):]) if n > 1 else tuple()
	candidates = [(w, prob_addk(context + (w,), ngram_counts, context_counts, V, k=k_smooth)) for w in vocab if w not in {"<s>", "</s>"}]
	candidates.sort(key=lambda x: x[1], reverse=True)
	return candidates[:top_k]

next_word_topk(["<s>"], n=2, ngram_counts=bi_counts, context_counts=bi_ctx, vocab=vocab, k_smooth=5, top_k=3)


[('api', 0.009162303664921465),
 ('vpn', 0.007853403141361256),
 ('disk', 0.007853403141361256)]

## 8) Simple generation (bigram or trigram)


Text generation is not the main goal in SLMs, but it helps you verify:
- boundary handling
- smoothing
- OOV decisions

We will sample tokens until we hit `</s>`.


In [71]:

def sample_next(context_tokens: List[str], n: int, ngram_counts: Counter, context_counts: Counter, vocab: set, k_smooth: float = 0.5):
    V = len(vocab)
    context = tuple(context_tokens[-(n-1):]) if n > 1 else tuple()
    words = [w for w in vocab if w != "<s>"]
    probs = []
    for w in words:
        ng = context + (w,)
        probs.append(prob_addk(ng, ngram_counts, context_counts, V, k=k_smooth))
    # Normalize
    s = sum(probs)
    probs = [p/s for p in probs]
    return random.choices(words, weights=probs, k=1)[0]

def generate(n: int, ngram_counts: Counter, context_counts: Counter, vocab: set, max_len: int = 20, k_smooth: float = 0.5):
    tokens = ["<s>"]*(n-1) if n > 1 else []
    out = []
    for _ in range(max_len):
        w = sample_next(tokens, n, ngram_counts, context_counts, vocab, k_smooth=k_smooth)
        if w == "</s>":
            break
        out.append(w)
        tokens.append(w)
    return " ".join(out)

for _ in range(5):
    print("BIGRAM:", generate(2, bi_counts, bi_ctx, vocab, max_len=18))


BIGRAM: battery spiking message group geolocation audio discrepancy timeout threshold device message spiking scheduler power vpn windows resolution
BIGRAM: when on tasks invoice <UNK> disconnects when teams denied external routing working connection detected disconnects permission unexpectedly inconsistent
BIGRAM: interactions integrity check when domain unresponsive in when queue deployment low latency timeout notifications to teams in email
BIGRAM: accessing 500 traffic incorrect full blocking alert provided vpn api invoice container load disk disconnects blocking executing data
BIGRAM: spiking jitter geolocation delivery audio authentication invoice api returns to drains deployment in unchanged detected notifications requests low


## 9) Model comparison: effect of n and smoothing


Try different `k` values. Notes:
- `k=1.0` is Laplace smoothing (often too strong)
- smaller `k` (like 0.1 to 0.5) is often better

In real corpora, trigrams often beat bigrams, but require more data.


## Exercises (do these during lab)
1) Add 20 more realistic domain sentences to the corpus and re-run training/evaluation.  
2) Change `min_count` (OOV threshold) and explain how perplexity changes.  
3) Implement **backoff**: if a trigram is unseen, fall back to bigram; if unseen, fall back to unigram.  
4) Create a function that returns **top-5 next words** given a phrase like: `"user cannot"`.


Changing the Out-of-Vocabulary threshold from two to one drastically increases the perplexity; Unigram goes from 2.20 to 168.83, Bigram goes from 2.91 to 150.75, and Trigram goes from 3.20 to 150.43 (both of these applies to the expanded corpus).

In [78]:
def simple_backoff(ngram: Tuple[str, ...],
                        uni_counts: Dict[Tuple[str, ...], int],
                        uni_ctx: Dict[Tuple[str, ...], int],
                        bi_counts: Dict[Tuple[str, ...], int],
                        bi_ctx: Dict[Tuple[str, ...], int],
                        tri_counts: Dict[Tuple[str, ...], int],
                        tri_ctx: Dict[Tuple[str, ...], int],
                        vocab_size: int,
                        k_smooth: float = 0.5) -> float:
    n = len(ngram)

    if n == 3:
        # Check if trigram exists (count > 0). If so, use its probability.
        if tri_counts.get(ngram, 0) > 0:
            return prob_addk(ngram, tri_counts, tri_ctx, vocab_size, k=k_smooth)
        else:
            # Otherwise, fall back to bigram (last two tokens).
            return simple_backoff(ngram[1:], uni_counts, uni_ctx, bi_counts, bi_ctx, tri_counts, tri_ctx, vocab_size, k_smooth)
    elif n == 2:
        # Check if bigram exists (count > 0). If so, use its probability.
        if bi_counts.get(ngram, 0) > 0:
            return prob_addk(ngram, bi_counts, bi_ctx, vocab_size, k=k_smooth)
        else:
            # Otherwise, fall back to unigram (last token).
            return simple_backoff(ngram[1:], uni_counts, uni_ctx, bi_counts, bi_ctx, tri_counts, tri_ctx, vocab_size, k_smooth)
    elif n == 1:
        # Base case: unigram probability. No further backoff.
        return prob_addk(ngram, uni_counts, uni_ctx, vocab_size, k=k_smooth)
    else:
        # Should not be reached for valid n-grams (n=1, 2, or 3)
        return 1e-10 # Return a very small probability for unsupported n-gram lengths

In [79]:
#Perxplexity function must be modified to consider backoff algorithm.
def backoff_perplexity(test_texts: List[str],
                            n_model: int,
                            uni_counts: Dict[Tuple[str, ...], int],
                            uni_ctx: Dict[Tuple[str, ...], int],
                            bi_counts: Dict[Tuple[str, ...], int],
                            bi_ctx: Dict[Tuple[str, ...], int],
                            tri_counts: Dict[Tuple[str, ...], int],
                            tri_ctx: Dict[Tuple[str, ...], int],
                            vocab: set,
                            k_smooth: float = 0.5) -> float:
    N = 0  # total number of tokens
    log_prob_sum = 0.0  # sum of log probabilities

    V = len(vocab)

    for t in test_texts:
        tokens = replace_oov(tokenize(t), vocab)
        tokens = add_boundaries(tokens, n_model)
        
        # When calculating perplexity for an n-gram model, we are interested in P(w_i | w_{i-1}...w_{i-n+1})
        # So we iterate through the tokens to form n-grams where the last token is the one being predicted.
        # The `get_ngrams` function gives us (w_{i-n+1}, ..., w_i)
        for i in range(len(tokens) - n_model + 1):
            ngram_to_evaluate = tuple(tokens[i : i + n_model])
            
            # We use prob_backoff_simple here
            prob = simple_backoff(ngram_to_evaluate,
                                     uni_counts, uni_ctx,
                                     bi_counts, bi_ctx,
                                     tri_counts, tri_ctx,
                                     V, k_smooth)

            # For each word in the test set (excluding initial start tokens for context),
            # we count it as N. So for an n-gram (w_1, ..., w_n), w_n is the word being predicted.
            # The boundary token '</s>' also counts as a word.
            if ngram_to_evaluate[-1] != '<s>': # Don't count '<s>' as N for perplexity calculation
                N += 1
                log_prob_sum += math.log2(prob) if prob > 0 else float('-inf')
    
    # Ensure N is not zero to avoid division by zero
    if N == 0: 
        return float('inf')

    avg_log_prob = log_prob_sum / N
    perplexity_score = 2 ** (-avg_log_prob)
    return perplexity_score

In [80]:
pp_bi_backoff = backoff_perplexity(test_texts, 2, uni_counts, uni_ctx, bi_counts, bi_ctx, tri_counts, tri_ctx, vocab, k_smooth=0.5)
pp_tri_backoff = backoff_perplexity(test_texts, 3, uni_counts, uni_ctx, bi_counts, bi_ctx, tri_counts, tri_ctx, vocab, k_smooth=0.5)

print(f"Bigram Perplexity (Add-k only): {pp_bi:.2f}")
print(f"Trigram Perplexity (Add-k only): {pp_tri:.2f}")
print(f"Bigram Perplexity (with backoff): {pp_bi_backoff:.2f}")
print(f"Trigram Perplexity (with backoff): {pp_tri_backoff:.2f}")

Bigram Perplexity (Add-k only): 150.75
Trigram Perplexity (Add-k only): 150.43
Bigram Perplexity (with backoff): 241.32
Trigram Perplexity (with backoff): 241.32


In [81]:
def get_top_n_next_words_from_phrase(phrase: str, n_model: int, vocab: set,
                                     uni_counts: Dict[Tuple[str, ...], int],
                                     uni_ctx: Dict[Tuple[str, ...], int],
                                     bi_counts: Dict[Tuple[str, ...], int],
                                     bi_ctx: Dict[Tuple[str, ...], int],
                                     tri_counts: Dict[Tuple[str, ...], int],
                                     tri_ctx: Dict[Tuple[str, ...], int],
                                     k_smooth: float = 1.0, top_k: int = 5) -> List[Tuple[str, float]]:
    """
    Given a phrase, returns the top-k next words predicted by an n-gram model.
    The prediction uses the specified n-gram model (unigram, bigram, or trigram).
    """
    # Helper to get the correct counts based on n_model
    def _get_ngram_models(n_val: int):
        if n_val == 1:
            return uni_counts, uni_ctx
        elif n_val == 2:
            return bi_counts, bi_ctx
        elif n_val == 3:
            return tri_counts, tri_ctx
        else:
            raise ValueError("n_model must be 1, 2, or 3")

    # 1. Tokenize and handle OOV for the input phrase
    phrase_tokens = tokenize(phrase)
    phrase_tokens_oov = replace_oov(phrase_tokens, vocab)

    # 2. Get the context for prediction
    # For an n-gram model, the context length is n-1.
    # e.g., for trigram (n_model=3), context is last 2 words.
    # If the phrase is shorter than n-1, use what's available.
    context_tokens = phrase_tokens_oov[-(n_model - 1):] if n_model > 1 else []

    # 3. Get the correct n-gram counts and context counts for the specified model order
    ngram_counts, context_counts = _get_ngram_models(n_model)

    # 4. Use the existing next_word_topk function (from cell 11363d81)
    return next_word_topk(context_tokens, n_model, ngram_counts, context_counts, vocab, k_smooth, top_k)

# Examples of usage
# Using k_smooth=0.5 as it was used in the generate function previously.

print("Top 5 next words for 'user cannot' (trigram model, k_smooth=0.5):")
top_words_tri = get_top_n_next_words_from_phrase("user cannot", 3, vocab,
                                                uni_counts, uni_ctx,
                                                bi_counts, bi_ctx,
                                                tri_counts, tri_ctx, k_smooth=0.5, top_k=5)
for word, prob in top_words_tri:
    print(f"- {word}: {prob:.4f}")

print("\nTop 5 next words for 'mfa' (bigram model, k_smooth=0.5):")
top_words_bi = get_top_n_next_words_from_phrase("mfa", 2, vocab,
                                               uni_counts, uni_ctx,
                                               bi_counts, bi_ctx,
                                               tri_counts, tri_ctx, k_smooth=0.5, top_k=5)
for word, prob in top_words_bi:
    print(f"- {word}: {prob:.4f}")

print("\nTop 5 next words for 'disconnected' (bigram model, k_smooth=0.5): This word might be OOV.")
top_words_bi_oov = get_top_n_next_words_from_phrase("disconnected", 2, vocab,
                                                  uni_counts, uni_ctx,
                                                  bi_counts, bi_ctx,
                                                  tri_counts, tri_ctx, k_smooth=0.5, top_k=5)
for word, prob in top_words_bi_oov:
    print(f"- {word}: {prob:.4f}")

Top 5 next words for 'user cannot' (trigram model, k_smooth=0.5):
- restarting: 0.0068
- jitter: 0.0068
- processing: 0.0068
- update: 0.0068
- domain: 0.0068

Top 5 next words for 'mfa' (bigram model, k_smooth=0.5):
- restarting: 0.0068
- jitter: 0.0068
- processing: 0.0068
- update: 0.0068
- domain: 0.0068

Top 5 next words for 'disconnected' (bigram model, k_smooth=0.5): This word might be OOV.
- restarting: 0.0068
- jitter: 0.0068
- processing: 0.0068
- update: 0.0068
- domain: 0.0068
