<div align="center">

---
# Transformer from Scratch [Python]
---
</div>

This Notebook mainly focuses on Trainning the Transformer (Perform Translation Tasks) as well as to Validate and Test it's performance upon new instances of data (Unseen Data).

---
## Train

The developed Transformer is going to be used within Translation Tasks between English and Portuguese using the HuggingFace `opus_books` dataset for both Train and Test / Validation Steps.

In [None]:
%load_ext autoreload
%autoreload 2

# Importing Dependencies
import warnings
import pandas as pd
import numpy as np
import torch
from torch import (nn)
from Transformer.Model import (Transformer)
from Transformer.Configuration import (Get_Configuration, Get_Weights_File_Path)
from Transformer.Train import (Train_Model, Get_Model, Get_Dataset)
from Transformer.Validation import (Greedy_Decode, Run_Validation)
from Transformer.AttentionVisualization import (Load_Next_Batch, Get_All_Attention_Maps)
import altair as alt

In [None]:
# Filtering the Pytorch Warnings
warnings.filterwarnings('ignore')

# Loading the Configuration Dictionary
config = Get_Configuration()

In [None]:
# Train the Model
Train_Model(config, validate=False)

---
## Test / Validation

Now, let's take a look on how the Model performs translating new instances / texts, i.e., let's perform inference with the trained model.

In [None]:
# Define the Device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using Device: {device}")
# config = Get_Configuration()
train_dataloader, test_dataloader, tokenizer_source, tokenizer_target = Get_Dataset(config)
model = Get_Model(config, tokenizer_source.get_vocab_size(), tokenizer_target.get_vocab_size()).to(device)

In [None]:
# Load the Pretrained Model
model_filename = Get_Weights_File_Path(config, f"100")
state = torch.load(model_filename)
model.load_state_dict(state['model_state_dict'])

In [None]:
# Run Validation
Run_Validation(model, test_dataloader,
               tokenizer_source, tokenizer_target,
               config['sequence_length'], device, lambda msg: print(msg), 0, None, num_examples=5)

---
## Attention Visualization

In [None]:
# Create an instance of the Model
train_dataloader, test_dataloader, vocabulary_source, vocabulary_target = Get_Dataset(config)
model = Get_Model(config, vocabulary_source.get_vocab_size(), vocabulary_target.get_vocab_size()).to(device)

In [None]:
# Load the Pretrained Model
model_filename = Get_Weights_File_Path(config, f"100")
state = torch.load(model_filename)
model.load_state_dict(state['model_state_dict'])

In [None]:
# Visualize the Sentence
batch, encoder_input_tokens, decoder_input_tokens = Load_Next_Batch(config, model, test_dataloader,
                                                                    vocabulary_source, vocabulary_target,
                                                                    device)
print(f'[SOURCE]: {batch["source_text"]}')
print(f'[TARGET]: {batch["target_text"]}')

# Calculate the Sentence Length
sentence_length = encoder_input_tokens.index('[PAD]')

In [None]:
# Define the layers and heads to which we want to visualize the Attention
layers = [0, 1, 2]
heads = [0, 1, 2, 3, 4, 5, 6, 7]

In [None]:
# Visualize the Encoder Self-Attention
Get_All_Attention_Maps(model, "encoder", layers, heads, encoder_input_tokens, encoder_input_tokens, min(20, sentence_length))

In [None]:
# Visualize the Decoder Self-Attention
Get_All_Attention_Maps(model, "decoder", layers, heads, decoder_input_tokens, decoder_input_tokens, min(20, sentence_length))

In [None]:
# Let's take a look into the Cross Attention [Where the translation task happens]
Get_All_Attention_Maps(model, "encoder-decoder", layers, heads, encoder_input_tokens, decoder_input_tokens, min(20, sentence_length))

---