<div align="center">

---
# Transformer from Scratch [Python]
---
</div>

This Notebook mainly focuses on Trainning the Transformer (Perform Translation Tasks) as well as to Validate and Test it's performance upon new instances of data (Unseen Data).

---
## Train

The developed Transformer is going to be used within Translation Tasks between English and Portuguese using the HuggingFace `opus_books` dataset for both Train and Test / Validation Steps.

In [30]:
%load_ext autoreload
%autoreload 2

# Importing Dependencies
import warnings
import pandas as pd
import numpy as np
import torch
from torch import (nn)
from Model import (Transformer)
from Configuration import (Get_Configuration, Get_Weights_File_Path)
from Train import (Train_Model, Get_Model, Get_Dataset)
from Validation import (Greedy_Decode, Run_Validation)
from AttentionVisualization import (Load_Next_Batch, Get_All_Attention_Maps)
import altair as alt

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
# Filtering the Pytorch Warnings
warnings.filterwarnings('ignore')

# Loading the Configuration Dictionary
config = Get_Configuration()

In [3]:
# Train the Model
Train_Model(config, validate=False)

Using device: cuda
Max Length of Source Sentence: 204
Max Length of Target Sentence: 196


Processing epoch 00: 100%|███████████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=5.883]
Processing epoch 01: 100%|███████████████████████████████████████| 158/158 [01:24<00:00,  1.86it/s, Loss=5.553]
Processing epoch 02: 100%|███████████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=4.920]
Processing epoch 03: 100%|███████████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=4.447]
Processing epoch 04: 100%|███████████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=4.736]
Processing epoch 05: 100%|███████████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=4.123]
Processing epoch 06: 100%|███████████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=4.167]
Processing epoch 07: 100%|███████████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=3.806]
Processing epoch 08: 100%|███████████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss

---
## Test / Validation

Now, let's take a look on how the Model performs translating new instances / texts, i.e., let's perform inference with the trained model.

In [4]:
# Define the Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using Device: {device}")
# config = Get_Configuration()
train_dataloader, test_dataloader, tokenizer_source, tokenizer_target = Get_Dataset(config)
model = Get_Model(config, tokenizer_source.get_vocab_size(), tokenizer_target.get_vocab_size()).to(device)

Using Device: cuda
Max Length of Source Sentence: 204
Max Length of Target Sentence: 196


In [5]:
# Load the Pretrained Model
model_filename = Get_Weights_File_Path(config, f"10")
state = torch.load(model_filename)
model.load_state_dict(state['model_state_dict'])

<All keys matched successfully>

In [6]:
# Run Validation
Run_Validation(model, test_dataloader,
               tokenizer_source, tokenizer_target,
               config['sequence_length'], device, lambda msg: print(msg), 0, None, num_examples=5)

--------------------------------------------------------------------------------
[SOURCE]: CHAPTER XI Who Stole the Tarts?
[TARGET]: Capítulo XI Quem Roubou as Tortas?
[PREDICTED]: Capítulo Quem ?
--------------------------------------------------------------------------------
[SOURCE]: Who for such dainties would not stoop?
[TARGET]: Para tal guloseima não iria parar?
[PREDICTED]: Quem não que não um ?
--------------------------------------------------------------------------------
[SOURCE]: But the insolence of his Normans--" How are you getting on now, my dear?' it continued, turning to Alice as it spoke.
[TARGET]: Mas a insolência de seus Normandos... Como está você agora, minha querida?", continuou, virando-se para Alice enquanto falava.
[PREDICTED]: Mas o que não de a , que ela continuou a , " O que você está ?"
--------------------------------------------------------------------------------
[SOURCE]: 'She'd soon fetch it back!'
[TARGET]: 'Ela logo o traria de volta!'
[PREDICTED]

---
## Attention Visualization

In [7]:
# Create an instance of the Model
train_dataloader, test_dataloader, vocabulary_source, vocabulary_target = Get_Dataset(config)
model = Get_Model(config, vocabulary_source.get_vocab_size(), vocabulary_target.get_vocab_size()).to(device)

Max Length of Source Sentence: 204
Max Length of Target Sentence: 196


In [8]:
# Load the Pretrained Model
model_filename = Get_Weights_File_Path(config, f"10")
state = torch.load(model_filename)
model.load_state_dict(state['model_state_dict'])

<All keys matched successfully>

In [20]:
# Visualize the Sentence
batch, encoder_input_tokens, decoder_input_tokens = Load_Next_Batch(config, model, test_dataloader,
                                                                    vocabulary_source, vocabulary_target,
                                                                    device)
print(f'[SOURCE]: {batch["source_text"]}')
print(f'[TARGET]: {batch["target_text"]}')

# Calculate the Sentence Length
sentence_length = encoder_input_tokens.index('[PAD]')

[SOURCE]: ["Besides, she's she, and I'm I, and--oh dear, how puzzling it all is!"]
[TARGET]: ['Além disso, ela é ela e eu sou eu, e-- Puxa, quão misterioso é tudo isso!']


In [21]:
# Define the layers and heads to which we want to visualize the Attention
layers = [0, 1, 2]
heads = [0, 1, 2, 3, 4, 5, 6, 7]

In [24]:
# Visualize the Encoder Self-Attention
Get_All_Attention_Maps(model, "encoder", layers, heads, encoder_input_tokens, encoder_input_tokens, min(20, sentence_length))

In [25]:
# Visualize the Decoder Self-Attention
Get_All_Attention_Maps(model, "decoder", layers, heads, decoder_input_tokens, decoder_input_tokens, min(20, sentence_length))

In [26]:
# Let's take a look into the Cross Attention [Where the translation task happens]
Get_All_Attention_Maps(model, "encoder-decoder", layers, heads, encoder_input_tokens, decoder_input_tokens, min(20, sentence_length))

---