# Lab 3 — Decoding Strategies in NMT

**Goal:** Explore decoding algorithms (greedy, beam, sampling) on a pretrained Hugging Face translation model.  
This notebook is Kaggle-friendly: it downloads models from Hugging Face and runs locally.

**Instructions for students:** run the cells in order. Sections contain short reflection prompts: you need to answer them in your report.


**BEFORE YOU START:**

If you modify a version that is NOT in your account, you will lose all your changes. If you want to save any changes, make a copy of this notebook on your own drive. That is the one you will be able modify and the changes will be saved. In a jupyter notebook, you can run the code cells by pressing `Ctrl+Enter` or `Shift+Enter`.

## Setup the environment
Install and import required libraries. On Kaggle this will work out of the box so you can skip this step; on other environments you may need to `pip install transformers datasets torch`.

In [None]:
# If running on a fresh environment uncomment the following line:
# !pip install transformers sentencepiece torch evaluate

In [1]:
import math
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from typing import List, Dict

# Check device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Device:', device)

Device: cpu


## Model selection
You need to pick a language direction you want to work with. Below are some common OPUS-MT choices. Change `model_name` to test another direction.

Here you can find all the models available from the Helsinki-NLP group: https://huggingface.co/Helsinki-NLP/models

In [2]:
EXAMPLE_MODEL_OPTIONS = {
    'en-de': 'Helsinki-NLP/opus-mt-en-de',
    'en-fr': 'Helsinki-NLP/opus-mt-en-fr',
    'en-cs': 'Helsinki-NLP/opus-mt-en-cs',
}

print('Examples of available models:')
for k in EXAMPLE_MODEL_OPTIONS:
    print(' -', k)


model_name = 'Helsinki-NLP/opus-mt-en-zh' # Change this variable to work with anothe model (and translation dierection)
print('\nUsing model:', model_name)
print('\nChange the variable model_name to work with anothe model (and translation dierection)')

Examples of available models:
 - en-de
 - en-fr
 - en-cs

Using model: Helsinki-NLP/opus-mt-en-zh

Change the variable model_name to work with anothe model (and translation dierection)


## Load tokenizer & model
We will load the tokenizer and the Seq2Seq model. We will use `model.generate` for convenience, but we also inspect the `scores` that generate can return.

In [3]:
from huggingface_hub import snapshot_download
from transformers import MarianTokenizer, MarianMTModel

# download model repo into a local folder (cache) because the newest transformers version requires some chat templates, which NMT  models do not use.
repo_dir = snapshot_download(model_name, force_download=False, repo_type="model")
print("Downloaded to:", repo_dir)

# now load tokenizer/model from the local directory, forcing local files only
tokenizer = MarianTokenizer.from_pretrained(repo_dir, local_files_only=True)
model = MarianMTModel.from_pretrained(repo_dir, local_files_only=True)
model.eval()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/312M [00:00<?, ?B/s]

.gitattributes:   0%|          | 0.00/391 [00:00<?, ?B/s]

rust_model.ot:   0%|          | 0.00/578M [00:00<?, ?B/s]

metadata.json: 0.00B [00:00, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

flax_model.msgpack:   0%|          | 0.00/310M [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

source.spm:   0%|          | 0.00/806k [00:00<?, ?B/s]

tf_model.h5:   0%|          | 0.00/313M [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/805k [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

Downloaded to: /root/.cache/huggingface/hub/models--Helsinki-NLP--opus-mt-en-zh/snapshots/408d9bc410a388e1d9aef112a2daba955b945255




MarianMTModel(
  (model): MarianModel(
    (shared): Embedding(65001, 512, padding_idx=65000)
    (encoder): MarianEncoder(
      (embed_tokens): Embedding(65001, 512, padding_idx=65000)
      (embed_positions): MarianSinusoidalPositionalEmbedding(512, 512)
      (layers): ModuleList(
        (0-5): 6 x MarianEncoderLayer(
          (self_attn): MarianAttention(
            (k_proj): Linear(in_features=512, out_features=512, bias=True)
            (v_proj): Linear(in_features=512, out_features=512, bias=True)
            (q_proj): Linear(in_features=512, out_features=512, bias=True)
            (out_proj): Linear(in_features=512, out_features=512, bias=True)
          )
          (self_attn_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          (activation_fn): SiLU()
          (fc1): Linear(in_features=512, out_features=2048, bias=True)
          (fc2): Linear(in_features=2048, out_features=512, bias=True)
          (final_layer_norm): LayerNorm((512,), eps=1e-05

## Utilities: inspect next-token logits / probabilities
`model.generate(..., output_scores=True, return_dict_in_generate=True)` returns `sequences` and `scores` where `scores[i]` is a tensor of logits (before softmax) for the i-th generated token **after** the input / previous tokens.

In [5]:
import numpy as np

def display_topk_from_logits(logits, tokenizer, k=10):
    # this is a helper function to go through the exercises... will show only the top k logits
    # Logits: 1D tensor of shape (vocab_size,)
    probs = F.softmax(logits, dim=-1)
    topk = torch.topk(probs, k)
    idxs = topk.indices.cpu().numpy()
    vals = topk.values.cpu().numpy()
    items = [(tokenizer.convert_ids_to_tokens(int(i)), float(v)) for i, v in zip(idxs, vals)]
    for tok, p in items:
        print(f"{tok:15} \t {p:.4f}")
    return items

# Example: tokenizing an input and inspecting model encoder output size
example = "Two dogs are running in the park."
inputs = tokenizer(example, return_tensors='pt')
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']
encoder_outputs = model.get_encoder()(input_ids, attention_mask=attention_mask)
print('encoder hidden states shape:', encoder_outputs.last_hidden_state.shape)

encoder hidden states shape: torch.Size([1, 9, 512])


## Small test-sentence set
We provide 5 simple and 5 tricky sentences. Students should add 3 of their own (see Task C). Change or extend this list to experiment.

In [6]:
from pprint import pprint
SENTENCES = [
    # simple
    "A man is riding a bicycle.",
    "Children are playing in the garden.",
    "The dog is sleeping on the sofa.",
    "She opened the window.",
    "The book is on the table.",
    # ambiguous
    "He saw her duck in front of the store.",  # duck: verb or noun
    "I saw the man with the telescope.",
    "They are serving turkey at the restaurant.",
    "The bank will not approve the loan.",
    "She always likes to read before bed.",
]

print('First 5 example sentences:')
for s in SENTENCES[:5]:
    print(' -', s)

# EXERCISE: Add 3-5 sentences that are tricky to translate below
YOUR_SENTENCES = ['Visiting relatives can be boring.', 'He said the teacher was angry yesterday.', 'The chicken is ready to eat.', 'The fisherman went to the bank.']
EXERCISE_SENTENCES = SENTENCES+YOUR_SENTENCES
print("\n\nYou will be working with these sentences through out this lab")
pprint(EXERCISE_SENTENCES)

First 5 example sentences:
 - A man is riding a bicycle.
 - Children are playing in the garden.
 - The dog is sleeping on the sofa.
 - She opened the window.
 - The book is on the table.


You will be working with these sentences through out this lab
['A man is riding a bicycle.',
 'Children are playing in the garden.',
 'The dog is sleeping on the sofa.',
 'She opened the window.',
 'The book is on the table.',
 'He saw her duck in front of the store.',
 'I saw the man with the telescope.',
 'They are serving turkey at the restaurant.',
 'The bank will not approve the loan.',
 'She always likes to read before bed.',
 'Visiting relatives can be boring.',
 'He said the teacher was angry yesterday.',
 'The chicken is ready to eat.',
 'The fisherman went to the bank.']


## 1. Greedy decoding and logits exploration
For each sentence do:
1. Tokenize and run generate with `output_scores=True` and `return_dict_in_generate=True`.
2. Show the generated text (greedy) and show the top-10 token probs for the first 3 decoding steps.

In [7]:
from transformers import GenerationConfig

def generate_with_scores(src: str, max_new_tokens: int=40, **gen_kwargs):
    inputs = tokenizer(src, return_tensors='pt').to(device)
    gen = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        output_scores=True,
        return_dict_in_generate=True,
        **gen_kwargs
    )
    sequences = gen.sequences
    scores = gen.scores  # list of tensors (len = generated_length) each shape (batch_size, vocab_size)
    return sequences, scores, inputs

# demo for one sentence
sent = SENTENCES[0]
seqs, scores, inputs = generate_with_scores(sent, do_sample=False)  # greedy by default
decoded_greedy = tokenizer.decode(seqs[0], skip_special_tokens=True)
print('SRC:', sent)
print('Decoded (greedy):', decoded_greedy)
print('\nTop tokens and their probabilities for first 5 generated steps:')
for i, logits in enumerate(scores[:5]):
    print(f'--- Step {i+1} ---')
    display_topk_from_logits(logits[0], tokenizer, k=10)

SRC: A man is riding a bicycle.
Decoded (greedy): 一个人骑自行车

Top tokens and their probabilities for first 5 generated steps:
--- Step 1 ---
▁               	 0.4898
▁一个             	 0.1966
▁有人             	 0.0209
▁-              	 0.0207
▁一              	 0.0129
▁有              	 0.0120
▁他              	 0.0094
▁有一个            	 0.0079
▁我              	 0.0074
▁A              	 0.0053
--- Step 2 ---
一个人             	 0.5738
男人              	 0.1645
男子              	 0.0266
人               	 0.0175
一名              	 0.0160
有个              	 0.0141
骑               	 0.0125
男               	 0.0101
一位              	 0.0088
男性              	 0.0050
--- Step 3 ---
骑               	 0.5279
在               	 0.1100
正在              	 0.0945
是               	 0.0241
正               	 0.0224
是在              	 0.0088
要               	 0.0066
驾驶              	 0.0050
会               	 0.0045
騎               	 0.0043
--- Step 4 ---
自行车             	 0.4908
着               	 0.2604
单               	

**Exercise 1:**
- In your report describe what you observe in the top-k lists. Are the top tokens plausible continuations? Why might the model select the highest-probability token at each step?

## 2. Beam search <-> Greedy decoding
Generate translations with greedy decoding and different beam sizes. For beams we'll use `num_beams` in `generate` with `do_sample=False`.

In [8]:
import tqdm
import time
def translate_with_beams(src: str, beams: List[int]=[1,3,5,10], max_new_tokens=60):
    results = {}
    for b in beams:
        start_time = time.time()
        # REMEMBER: a beam_size of 1 is the same as greedy decoding (num_beams=1)
        output = model.generate(
            **tokenizer(src, return_tensors='pt').to(device),
            max_new_tokens=max_new_tokens,
            num_beams=b,
            early_stopping=False,
            do_sample=False,
            return_dict_in_generate=True,    # crucial for counting
            output_scores=True               #  gives per-step scores
        )
        elapsed = time.time() - start_time  # time taken for this beam
        # Count iterations = how many decoding time steps occurred (scores is a list: one tensor of scores per time step)
        num_steps = len(output.scores)
        translation = tokenizer.decode(output.sequences[0], skip_special_tokens=True)
        results[b] = {"translation": translation, "steps": num_steps, "time_sec": elapsed}

    return results

# Demo compare
src = "Y'all'd've met srendipitous spam that took you into a multitasking loophole"
beam_results = translate_with_beams(src, beams=[1,3,5,10,25,50,100])

print('Source:', src)
for b, info in beam_results.items():
    print(f"\n### Beam size={b}:")
    print("Translation:", info["translation"])
    print(f"Decoding steps: {info['steps']}  Time taken: {info['time_sec']:.4f} sec")

Source: Y'all'd've met srendipitous spam that took you into a multitasking loophole

### Beam size=1:
Translation: 你们会遇到些不愉快的垃圾垃圾 让你陷入了多任务漏洞
Decoding steps: 17  Time taken: 0.7489 sec

### Beam size=3:
Translation: 你们会遇到些反常的垃圾垃圾 把你带入一个多任务漏洞
Decoding steps: 18  Time taken: 1.1124 sec

### Beam size=5:
Translation: 你们会遇到些反常的垃圾垃圾 把你带入一个多任务漏洞
Decoding steps: 18  Time taken: 1.5828 sec

### Beam size=10:
Translation: 你们都遇到过恶作剧般的垃圾垃圾 将你们带入一个多任务漏洞
Decoding steps: 21  Time taken: 2.0236 sec

### Beam size=25:
Translation: 你们会遇到恶作剧的垃圾垃圾 将你们带入一个多任务漏洞
Decoding steps: 20  Time taken: 3.0582 sec

### Beam size=50:
Translation: {\fn黑体\fs22\bord1\shad0\3aHBE\4aH00\fscx67\fscy66\2cHFFFFFF\3cH808080}你们会遇到恶作剧的垃圾垃圾垃圾 {\fn黑体\fs22\bord1\shad0\3aHBE\4aH00\fscx67\fscy66\2cHFFFFFF\3cH
Decoding steps: 60  Time taken: 17.8510 sec

### Beam size=100:
Translation: {\fn黑体\fs22\bord1\shad0\3aHBE\4aH00\fscx67\fscy66\2cHFFFFFF\3cH808080}你们会遇到恶作剧的垃圾垃圾垃圾 {\fn黑体\fs22\bord1\shad0\3aHBE\4aH00\fscx67\fscy66\2cHFFFFFF\3cH
D

In [9]:
def translate_with_beams(
    src: str,
    beams: List[int] = [1, 3, 5, 10],
    max_new_tokens=60,
):
    results = {}
    for b in beams:
        start_time = time.time()
        # chose decoding strategy that avoids tweaks for smooth beam search:
        output = model.generate(
            **tokenizer(src, return_tensors='pt').to(device),
            max_new_tokens=max_new_tokens,   # force to stop at some point
            num_beams=b,
            num_return_sequences=b,          # <-- return all beams
            return_dict_in_generate=True,
            output_scores=True,
            early_stopping=False,     # don’t stop early
            length_penalty=1.0,       # no length penalty
            no_repeat_ngram_size=0,   # allow repeated n-grams
            do_sample=False,          # pure beam search, no sampling
        )
        elapsed = time.time() - start_time

        # Decode all beams
        translations = [tokenizer.decode(seq, skip_special_tokens=True) for seq in output.sequences]
        # recover sequence scores (log probs)... we need to compute sequence log-probabilities manually
        seq_scores = []
        if hasattr(output, "scores") and output.scores is not None:
            # output.scores is a list of logits for each generation step
            # shape of each: [batch*num_beams, vocab_size]
            # We'll sum log-probabilities along the sequence
            for i, seq in enumerate(output.sequences):
                logprob = 0.0
                # skip the first token (usually start token)
                for step, logits in enumerate(output.scores):
                    token_id = seq[step + 1]  # shift by 1 to skip first token
                    logprob += torch.log_softmax(logits[i], dim=-1)[token_id].item()
                seq_scores.append(logprob)
        else:
            seq_scores = [None] * len(translations)


        results[b] = {
            "translations": translations,
            "scores": seq_scores,  # add scores
            "time_sec": elapsed,
            "steps": len(output.scores),
        }

    return results


In [10]:
src = "Y'all'd've met serendipitous spam that took you into a multitasking loophole..."
beam_results = translate_with_beams(src, beams=[1,3,5,10])

print('Source:', src)
for b, info in beam_results.items():
    print(f"\n### Beam size={b}: ({info['time_sec']:.2f}s, {info['steps']} steps)")
    for i, (t, s) in enumerate(zip(info["translations"], info["scores"]), 1):
        print(f"  Beam {i}: {t}  |  Score: {s:.4f}")


Source: Y'all'd've met serendipitous spam that took you into a multitasking loophole...

### Beam size=1: (0.69s, 16 steps)
  Beam 1: 你们都遇到了些天大的垃圾邮件 让你陷入了多重任务漏洞  |  Score: -22.2210

### Beam size=3: (0.94s, 18 steps)
  Beam 1: 你们会遇到些天大的垃圾邮件 把你带入一个多任务漏洞...  |  Score: -63.4182
  Beam 2: 你们会遇到些天大的垃圾邮件 把你带入一个多任务漏洞  |  Score: -inf
  Beam 3: 你们会遇到些天大的垃圾邮件 让你陷入了多重任务漏洞...  |  Score: -inf

### Beam size=5: (1.08s, 18 steps)
  Beam 1: 你们都遇到过 疯狂的垃圾邮件 让你陷入了多重任务漏洞...  |  Score: -95.1318
  Beam 2: 你们都遇到过天大的垃圾邮件 让你陷入了多重任务漏洞...  |  Score: -inf
  Beam 3: 你们都遇到过 疯狂的垃圾邮件 把你带入一个多任务漏洞  |  Score: -93.7423
  Beam 4: 你们都遇到过 疯狂的垃圾邮件 把你带进了一个多任务漏洞  |  Score: -61.5165
  Beam 5: 你们都遇到过 疯狂的垃圾邮件 让你陷入了多重任务漏洞  |  Score: -inf

### Beam size=10: (1.41s, 18 steps)
  Beam 1: 你们都遇到过疯狂的垃圾邮件 把你带入一个多任务漏洞...  |  Score: -75.2324
  Beam 2: 你们都遇到过疯狂的垃圾邮件 把你带进了一个多任务漏洞...  |  Score: -96.2743
  Beam 3: 你们都遇到过疯狂的垃圾邮件 把你带入了一个多任务漏洞...  |  Score: -64.6789
  Beam 4: 你们都遇到过疯狂的垃圾邮件 让你陷入了多重任务漏洞...  |  Score: -inf
  Beam 5: 你们都遇到过疯狂的垃圾邮件 把你带

In [15]:
for src in EXERCISE_SENTENCES:

    print("\n=============================================")
    print("Source:", src)

    beam_results = translate_with_beams(src, beams=[1, 3, 5, 10])

    # Print each beam size result
    for b, info in beam_results.items():
        print(f"\n### Beam size={b}: ({info['time_sec']:.2f}s, {info['steps']} steps)")
        for i, (t, s) in enumerate(zip(info["translations"], info["scores"]), 1):
            print(f"  Beam {i}: {t}  |  Score: {s:.4f}")


Source: A man is riding a bicycle.

### Beam size=1: (0.49s, 5 steps)
  Beam 1: 一个人骑自行车  |  Score: -3.4484

### Beam size=3: (0.35s, 6 steps)
  Beam 1: 一个人骑自行车  |  Score: -inf
  Beam 2: 一个人骑着自行车  |  Score: -9.8062
  Beam 3: 一个人骑自行车。  |  Score: -10.7673

### Beam size=5: (0.39s, 6 steps)
  Beam 1: 一个人骑自行车  |  Score: -inf
  Beam 2: 一个人骑着自行车  |  Score: -10.1718
  Beam 3: 一个人骑自行车。  |  Score: -11.2589
  Beam 4: 一个人在骑自行车  |  Score: -11.0427
  Beam 5: 一个男人骑着自行车  |  Score: -11.9525

### Beam size=10: (0.54s, 6 steps)
  Beam 1: 一个人骑自行车  |  Score: -inf
  Beam 2: 一个人骑着自行车  |  Score: -10.1718
  Beam 3: 一个人骑自行车。  |  Score: -11.2782
  Beam 4: 一个人在骑自行车  |  Score: -11.5339
  Beam 5: 一个人正在骑自行车  |  Score: -9.1643
  Beam 6: 一个男人骑着自行车  |  Score: -13.1168
  Beam 7: 一个男人在骑自行车  |  Score: -24.6367
  Beam 8: 一个男人骑自行车  |  Score: -inf
  Beam 9: 一个男人正在骑自行车  |  Score: -17.1664
  Beam 10: 一个男人骑自行车。  |  Score: -34.2340

Source: Children are playing in the garden.

### Beam size=1: (0.28s, 8 steps)
  Beam 1: 孩子们在花园里

**Exercise 2:**
- Perform beam search translations for `EXERCISE_SENTENCES` using 3 redically different `beam_size` configurations (avoid the lengthier beam_size=50,100 because it takes long per sentence).
- In your report answer: How do translations change as beam size increases? Is there a consistent improvement in adequacy or fluency? Note any length bias or repetition?
- What happens if you turn off the early stopping in the code above? (i.e., change the line `early_stopping=False` to be True, inside of the call to `model.generate`). Do you see changes in the num of decoding steps reported? why did it increase/decrease? What about the time taken to translate?

## 3. Sampling strategies: Temperature, Top-k, Top-p
We will produce multiple samples for each strategy and inspect diversity and adequacy.

In [12]:
def sample_translations(src: str, method: str='temperature', params: Dict=None, n_samples=5, max_new_tokens=60):
    params = params or {}
    gen = model.generate(
            **tokenizer(src, return_tensors='pt').to(device),
            max_new_tokens=max_new_tokens,
            do_sample=True,
            output_scores=False,
            return_dict_in_generate=False,
            num_return_sequences=n_samples,
            num_beams=1,                 # <-- force no beam search
            **params
        )
    outputs = [tokenizer.decode(seq, skip_special_tokens=True) for seq in gen]
    return outputs

# temperature examples
src = "You and you, no, not you, you, your job is to translate 'you' for yourselves!" #<- this a pretty difficult sentence when no context is provided
for T in [0.2, 1.0, 2.5]:
    outs = sample_translations(src, method='temperature', params={'temperature': T}, n_samples=5)
    print(f'\nTemperature={T} samples:')
    for o in outs:
        print(' -', o)

# top-k examples
for k in [3, 10, 50]:
    outs = sample_translations(src, method='top-k', params={'top_k': k}, n_samples=5)
    print(f'\nTop-k={k} samples:')
    for o in outs:
        print(' -', o)

# top-p examples
for p in [0.6, 0.9, 0.95]:
    outs = sample_translations(src, method='top-p', params={'top_p': p}, n_samples=5)
    print(f'\nTop-p={p} samples:')
    for o in outs:
        print(' -', o)


Temperature=0.2 samples:
 - 你和你的,不,不是你和你的,你,你的工作是翻译"你" 为自己!
 - 你,不,不是你自己,你,你,你的工作是翻译"你" 为自己!
 - 你,不,不是你,是你,你,你的工作是翻译"你" 为自己!
 - 你,不,不是你自己,你,你,你的工作是翻译"你" 为自己!
 - 你 你 你 不 你 你 你 你的工作 就是 为自己翻译"你"

Temperature=1.0 samples:
 - 你们的工作是自己翻译"你们"自己 这是你们的工作?
 - 你、你... 不 你、你 你的工作是替自己演"你"
 - 你和你们... 不 不是你 你的工作为你们翻译"他们"
 - 你和你的 不 不是你 你 你的工作是代自己翻译"你自己"!
 - 轮到你你你,不, 不是你,你, 你, 你的任务是翻译 “Youth”... ... 为自己!

Temperature=2.5 samples:
 - 不对错 你的作为是对 你们俩都满意 你们在"你们自己
 - 你不能让这种人知道... 你在演给你自己的女人唱! 不 你来啊 不 你! 你一个人干么这么麻烦吗
 - 不 你说"自己"也代替你 还有 翻译自己"你们会"的 就是你的职责吧 她会死于空洞地"被流掉
 - 是为自己传译自己! -什么
 - 你不会觉得这是你的事情的. 那是让你也活在现实中. 不要不是你们自己做你的功活. 你们就是把自己做一个为自己翻译了自我!

Top-k=3 samples:
 - 你和你的,不,不是你,你 你的工作是翻译"自己"!
 - 你和你的, 不, 而不是你, 你, 你,你的工作 是翻译"你"自己!
 - 你和你的 不 不是你 你的任务 就是为自己翻译你
 - 你和你 不 不 你 你的工作 就是为自己 翻译"你"
 - 和你 不 不是你和你的 你的工作是为自己翻译"你"

Top-k=10 samples:
 - 你和你 你的任务是自己翻译"你"
 - 你和你,没有的,不是的, 不是你,是你,你,你的工作 是翻译“你”的自己!
 - 和你和你的 不是妳,你的工作就是翻译自己
 - 不,不是你,是你,是你,你的工作就是自己翻译“你”
 - 你 你 不 你不是 你 你的职责 就是为自己翻译自己"自己"

Top-k=50 samples:
 - 你和你的 不 不是

**Exercise 3:**
- Compare temperature, top-k and top-p results. Which strategy produced the most diverse outputs? Which ones stayed more faithful to the source? Explain why.

## 4. Inspecting logits during sampling
For one step we can inspect how temperature / top-k / top-p change the distribution.

In [13]:
from torch.nn.functional import softmax

def get_next_token_logits(input_ids, attention_mask=None):
    # run one forward pass and return logits for the decoder next token using generate with output_scores
    gen_out = model.generate(
        **tokenizer(input_ids, return_tensors='pt').to(device) if isinstance(input_ids, str) else dict(input_ids=input_ids, attention_mask=attention_mask),
        max_new_tokens=1,
        output_scores=True,
        return_dict_in_generate=True,
        do_sample=False
    )
    # scores is a list of length generated tokens (1) each tensor (batch, vocab)
    scores = gen_out.scores[0][0]  # (vocab,)
    return scores

# We'll show logits/probs for the first decoding step of our example
src = "The bank will not approve the loan."
inputs = tokenizer(src, return_tensors='pt').to(device)
# use generate with return_dict to obtain scores for the first generated token
gen = model.generate(**inputs, max_new_tokens=3, output_scores=True, return_dict_in_generate=True, do_sample=False)
print('Generated text (greedy):', tokenizer.decode(gen.sequences[0], skip_special_tokens=True))
# show topk for first token
first_logits = gen.scores[0][0]
print('\nTop-10 tokens for step 1 (greedy logits->probs):')
display_topk_from_logits(first_logits, tokenizer, k=10)

# Now apply temperature scaling to the same logits and show top tokens
for T in [0.3, 1.0, 1.5]:
    scaled = first_logits / T
    probs = F.softmax(scaled, dim=-1)
    topk = torch.topk(probs, 10)
    items = [(tokenizer.convert_ids_to_tokens(int(i)), float(v)) for i, v in zip(topk.indices.cpu().numpy(), topk.values.cpu().numpy())]
    print(f"\nTop-10 with temperature={T}:")
    for tok, p in items:
        print(f"{tok:15} \t {p:.4f}")

Generated text (greedy): 银行

Top-10 tokens for step 1 (greedy logits->probs):
▁               	 0.7729
▁该              	 0.0146
▁但              	 0.0067
▁这              	 0.0065
▁-              	 0.0062
▁但是             	 0.0044
▁而              	 0.0036
▁那              	 0.0033
▁如果             	 0.0030
▁这个             	 0.0016

Top-10 with temperature=0.3:
▁               	 1.0000
▁该              	 0.0000
▁但              	 0.0000
▁这              	 0.0000
▁-              	 0.0000
▁但是             	 0.0000
▁而              	 0.0000
▁那              	 0.0000
▁如果             	 0.0000
▁这个             	 0.0000

Top-10 with temperature=1.0:
▁               	 0.7729
▁该              	 0.0146
▁但              	 0.0067
▁这              	 0.0065
▁-              	 0.0062
▁但是             	 0.0044
▁而              	 0.0036
▁那              	 0.0033
▁如果             	 0.0030
▁这个             	 0.0016

Top-10 with temperature=1.5:
▁               	 0.0769
▁该              	 0.0055
▁但              	 0.0032
▁这     

**Exercise 4:**
- How does temperature change the probability mass among top tokens? Which temperature produces a distribution closest to greedy behavior?

## 5. Experiment: sample a pool of translations and analyze them
For a single source sentence, collect a pool of 100 translations using nucleus sampling (p=0.9) and greedy/beam outputs. Can you analyze the pool of outputs for: diversity (unique outputs), adequacy (human judgment), and token-level differences.

In [14]:
from collections import Counter

src = "She always likes to read before bed."
print('Source:', src)

# collect 100 nucleus samples
pool = sample_translations(src, method='top-p', params={'top_p':0.9}, n_samples=100, max_new_tokens=40)
unique_pool = list(dict.fromkeys(pool))  # preserve order & unique
print('Pool size:', len(pool), 'Unique outputs:', len(unique_pool))

# frequency of top 10 outputs
freq = Counter(pool).most_common(10)
print('\nTop outputs frequency:')
for txt, c in freq:
    print(f' {c:3d}x -> {txt}')

# get greedy and beam for comparison
greedy = sample_translations(src, method='greedy', params={}, n_samples=1)[0]
beam5 = model.generate(**tokenizer(src, return_tensors='pt').to(device), num_beams=5, max_new_tokens=50, early_stopping=True)
beam15 = model.generate(**tokenizer(src, return_tensors='pt').to(device), num_beams=15, max_new_tokens=50, early_stopping=True)
beam5_decoded = tokenizer.decode(beam5[0], skip_special_tokens=True)
beam15_decoded = tokenizer.decode(beam15[0], skip_special_tokens=True)
print('\nGreedy:', greedy)
print('Beam-5:', beam5_decoded)
print('Beam-15:', beam5_decoded)

Source: She always likes to read before bed.
Pool size: 100 Unique outputs: 80

Top outputs frequency:
   7x -> 她总是喜欢睡前看书
   3x -> 她总是喜欢睡觉前读书
   3x -> 她总是喜欢在床上读书
   3x -> 她总是喜欢在床前读书
   2x -> 她总是喜欢在床上看书.
   2x -> 她总是喜欢在睡前看书
   2x -> 她总是喜欢在睡觉前念书
   2x -> 她总是喜欢睡前看书的
   2x -> 她总是喜欢在睡前阅读。
   2x -> 她总是喜欢睡前阅读

Greedy: 羆琌尺舧玡弄弄
Beam-5: 她总是喜欢睡前看书
Beam-15: 她总是喜欢睡前看书


**Exercise 5:**
- How many unique translations did the sampling produce? Are many outputs minor variants (word order, punctuation) or truly different paraphrases?
- Compare the most frequent sample to the beam outputs. Which is more adequate? Which is more fluent? Which would you pick as the "final" translation and why?

## 6. Assignment and submission
**Deliverables:**
1. Submit your report on moodle with:
   - Answers to all reflection prompts in this notebook
   - A table comparing outputs (greedy, beam=5, top-p=0.9, temperature=1.5) for 5 chosen sentences
2. Add a link to you notebook with any additional experiments students ran (e.g., changing the model, and adding your example sentences)