<div align="center">

---
# Transformer from Scratch [Python]
---
</div>

This Notebook mainly focuses on Trainning the Transformer (Perform Translation Tasks) as well as to Validate and Test it's performance upon new instances of data (Unseen Data).

---
## Train

The developed Transformer is going to be used within Translation Tasks between English and Portuguese using the HuggingFace `opus_books` dataset for both Train and Test / Validation Steps.

In [1]:
%load_ext autoreload
%autoreload 2

# Importing Dependencies
import warnings
import pandas as pd
import numpy as np
import torch
from torch import (nn)
from Transformer.Model import (Transformer)
from Transformer.Configuration import (Get_Configuration, Get_Weights_File_Path)
from Transformer.Train import (Train_Model, Get_Model, Get_Dataset)
from Transformer.Validation import (Greedy_Decode, Run_Validation)
from Transformer.AttentionVisualization import (Load_Next_Batch, Get_All_Attention_Maps)
import altair as alt

In [2]:
# Filtering the Pytorch Warnings
warnings.filterwarnings('ignore')

# Loading the Configuration Dictionary
config = Get_Configuration()

In [3]:
# Train the Model
Train_Model(config, validate=False)

Using device: cuda
Max Length of Source Sentence: 204
Max Length of Target Sentence: 196


Processing epoch 00: 100%|█████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=5.983]
Processing epoch 01: 100%|█████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=5.108]
Processing epoch 02: 100%|█████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=5.014]
Processing epoch 03: 100%|█████████████████████████████████| 158/158 [01:25<00:00,  1.84it/s, Loss=4.966]
Processing epoch 04: 100%|█████████████████████████████████| 158/158 [01:25<00:00,  1.85it/s, Loss=3.899]
Processing epoch 05: 100%|█████████████████████████████████| 158/158 [01:25<00:00,  1.84it/s, Loss=4.434]
Processing epoch 06: 100%|█████████████████████████████████| 158/158 [01:25<00:00,  1.84it/s, Loss=3.999]
Processing epoch 07: 100%|█████████████████████████████████| 158/158 [01:25<00:00,  1.84it/s, Loss=4.140]
Processing epoch 08: 100%|█████████████████████████████████| 158/158 [01:25<00:00,  1.84it/s, Loss=3.638]
Processing epoch 09: 100%|████████████████████

---
## Test / Validation

Now, let's take a look on how the Model performs translating new instances / texts, i.e., let's perform inference with the trained model.

In [4]:
# Define the Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using Device: {device}")
# config = Get_Configuration()
train_dataloader, test_dataloader, tokenizer_source, tokenizer_target = Get_Dataset(config)
model = Get_Model(config, tokenizer_source.get_vocab_size(), tokenizer_target.get_vocab_size()).to(device)

Using Device: cuda
Max Length of Source Sentence: 204
Max Length of Target Sentence: 196


In [5]:
# Load the Pretrained Model
model_filename = Get_Weights_File_Path(config, f"100")
state = torch.load(model_filename)
model.load_state_dict(state['model_state_dict'])

<All keys matched successfully>

In [6]:
# Run Validation
Run_Validation(model, test_dataloader,
               tokenizer_source, tokenizer_target,
               config['sequence_length'], device, lambda msg: print(msg), 0, None, num_examples=5)

--------------------------------------------------------------------------------
[SOURCE]: 'I call it purring, not growling,' said Alice.
[TARGET]: "Eu chamaria isso de ronronar, não rosnar", disse Alice.
[PREDICTED]: " Eu isso de , não ", disse Alice .
--------------------------------------------------------------------------------
[SOURCE]: 'Of the mushroom,' said the Caterpillar, just as if she had asked it aloud; and in another moment it was out of sight.
[TARGET]: 'Do cogumelo', disse a Lagarta, como se ela tivesse perguntado em voz alta; e em seguida ela estava fora de vista.
[PREDICTED]: ' Claro que era ', disse a Lagarta ; e ela , se - se em um tom de novo ; e começou a cabeça .
--------------------------------------------------------------------------------
[SOURCE]: He looked at Alice, and tried to speak, but for a minute or two sobs choked his voice.
[TARGET]: Ele olhou para Alice e tentou falar, mas por um minuto ou dois o choro sufocou sua voz.
[PREDICTED]: Ele olhou para 

---
## Attention Visualization

In [7]:
# Create an instance of the Model
train_dataloader, test_dataloader, vocabulary_source, vocabulary_target = Get_Dataset(config)
model = Get_Model(config, vocabulary_source.get_vocab_size(), vocabulary_target.get_vocab_size()).to(device)

Max Length of Source Sentence: 204
Max Length of Target Sentence: 196


In [8]:
# Load the Pretrained Model
model_filename = Get_Weights_File_Path(config, f"100")
state = torch.load(model_filename)
model.load_state_dict(state['model_state_dict'])

<All keys matched successfully>

In [9]:
# Visualize the Sentence
batch, encoder_input_tokens, decoder_input_tokens = Load_Next_Batch(config, model, test_dataloader,
                                                                    vocabulary_source, vocabulary_target,
                                                                    device)
print(f'[SOURCE]: {batch["source_text"]}')
print(f'[TARGET]: {batch["target_text"]}')

# Calculate the Sentence Length
sentence_length = encoder_input_tokens.index('[PAD]')

[SOURCE]: ["'One side of what?"]
[TARGET]: ["'Um lado de quê?"]


In [10]:
# Define the layers and heads to which we want to visualize the Attention
layers = [0, 1, 2]
heads = [0, 1, 2, 3, 4, 5, 6, 7]

In [11]:
# Visualize the Encoder Self-Attention
Get_All_Attention_Maps(model, "encoder", layers, heads, encoder_input_tokens, encoder_input_tokens, min(20, sentence_length))

In [12]:
# Visualize the Decoder Self-Attention
Get_All_Attention_Maps(model, "decoder", layers, heads, decoder_input_tokens, decoder_input_tokens, min(20, sentence_length))

In [13]:
# Let's take a look into the Cross Attention [Where the translation task happens]
Get_All_Attention_Maps(model, "encoder-decoder", layers, heads, encoder_input_tokens, decoder_input_tokens, min(20, sentence_length))

---