# Implementing Transformer Models
## Practical X
Carel van Niekerk & Hsien-Chin Lin

6-11.01.2025

---

In this practical we will evaluate the performance of the transformer model we trained.

### 1. Autoregressive Generation

In order to generate a translation we will use the autoregressive property of the transformer model. We will use the following procedure to generate a translation:

1. Encode the source sentence using the encoder.
2. Initialize the decoder with the encoded source sentence.
3. Generate the first token of the translation by passing the start of text token through the decoder.
4. Pass the generated token through the decoder to generate the next token and repeat until the end of text token is generated.

#### 1.1. Greedy Decoding

The simplest way to generate a translation is to use greedy decoding. In greedy decoding we simply select the token with the highest probability at each step.

### 2. Evaluation

In order to evaluate the performance of the model we will use the BLEU score. The BLEU score is a metric that measures the similarity between two sentences. See the [huggingface evaluate documentation](https://huggingface.co/spaces/evaluate-metric/bleu) for more information on the BLEU score, as well as details on using the metric in huggingface evaluate.



# Exercises

1. Implement the autoregressive generation procedure described above using your transformer model. (Using greedy decoding, remember to add a maximum length to the generation procedure to prevent infinite generation.)
2. Generate translations for the test set (or a subset of the test set) of WMT17 German-English.
3. Evaluate the BLEU score of your model on the test set (or a subset of the test set) of WMT17 German-English.
4. Evaluate some of the translations generated by your model. Do they make sense? What are some of the errors made by your model?

In [1]:
import transformers
from modelling.model import Transformer
from train import TransformerModel
import torch
from dataset import get_costum_dataset
import evaluate


tokenizer = transformers.GPT2TokenizerFast.from_pretrained("modelling/bpe_v=30016_l=64")
# pytorch lightning takes care of loading configuration and checkpoint
model = TransformerModel.load_from_checkpoint("lightning_logs/without_source_BOS/checkpoints/epoch=9-step=132074.ckpt")
model.eval()
print(model.device)
src_input = "Mein Name ist Leon und ich bin ein Student."
tgt_input = "My name is Leon and I am a student."
tokenizer.model_max_length


  Referenced from: <CFED5F8E-EC3F-36FD-AAA3-2C6C7F8D3DD9> /Users/leonmarkwart/miniconda3/lib/python3.11/site-packages/torchvision/image.so
  warn(
/Users/leonmarkwart/miniconda3/lib/python3.11/site-packages/lightning_fabric/utilities/cloud_io.py:57: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full contro

mps:0


64

In [2]:
def greedy_translate(src_input_sentence, model, tokenizer, first_token_bos=False):
    print(src_input_sentence)
    src_input = tokenizer(src_input_sentence, truncation=True, padding="max_length", return_tensors="pt", max_length=tokenizer.model_max_length +1)['input_ids']
    if first_token_bos:
        src_input = src_input[:, :-1]
    else:
        src_input = src_input[:, 1:]
    print(src_input.shape)
    print(tokenizer.decode(src_input[0]))
    #shift source input one to the left and replace the last token with the end padding token
    src_input = src_input[:, 1:]
    tgt_input = torch.zeros_like(src_input)
    # set the first token to the start of sentence token
    tgt_input[:, 0] = 1 #tokenizer.
    for i in range(1, tokenizer.model_max_length):
        output = model(src_input.to(model.device), tgt_input.to(model.device)).softmax(dim=-1)
        score, output_token = output[:, i].max(dim=-1)
        if output_token.item() in [tokenizer.convert_tokens_to_ids('[PAD]'), tokenizer.convert_tokens_to_ids('[EOS]')]:
            break
        tgt_output = tgt_input.clone()
        tgt_output[:, i] = output_token
        print(tokenizer.decode(output_token), '\t', (score*100).item(), '%')
        tgt_input = tgt_output

    return tokenizer.decode(tgt_input[0], skip_special_tokens=True)

greedy_translate(src_input, model, tokenizer)

Mein Name ist Leon und ich bin ein Student.
torch.Size([1, 64])
Mein Name ist Leon und ich bin ein Student.[EOS][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD]
a 	 12.54144287109375 %
 name 	 72.57445526123047 %
 is 	 19.266252517700195 %
 Leon 	 11.922411918640137 %
 and 	 67.21080017089844 %
. 	 20.942171096801758 %
 is 	 6.50723934173584 %
. 	 4.774482250213623 %
 to 	 4.989753246307373 %
 a 	 7.778448581695557 %
. 	 6.572515487670898 %


'a name is Leon and. is. to a.'

In [3]:
test_ds = get_costum_dataset("test")

with torch.no_grad():
    for sample in test_ds:
        src_input = sample['src_input']
        tgt_input = sample['tgt_output']
        model.predict_step(sample)
        break

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


TypeError: TransformerModel.predict_step() missing 1 required positional argument: 'batch_idx'