### Seq2Seq With Attention
In this notebook we are going to learn new things. WE are going to learn the following:

1. Using the Dataset Class to load our own data
2. Creating a model that will be able to translate from French to English.


### Basic Imports
WE are going to import modules we need as we go. These are the basic moodules that we may need to import first.

In [1]:
import torch
import time, os, math, random

from torch import nn
from torch.nn import functional as F
from torch.utils.data import Dataset, DataLoader, random_split
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence, pad_sequence

import numpy as np 

from collections import Counter

import pandas as pd

torch.__version__

'1.9.0+cu102'

### Seeds 

In [2]:
SEED = 42

np.random.seed(SEED)
torch.manual_seed(SEED)
random.seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deteministic = True

### Device

In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

### Data
We are going to load the data from my google drive. I've uploaded a txt file that we can load and make some damage on it. First we need to mount the drive

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Path to data.

In [5]:
base_path = '/content/drive/MyDrive/NLP Data/seq2seq/en-fr'
os.path.exists(base_path)

True

### Data processing

We are going to create a new file called `en-fr.txt` that will contain english to french sentence pairs. We are going this because our giant file `fra.txt` contain some unnecessary or extra pair on each line. We are going to trim that and come up with a fresh text file in a couple of code cells that follows:
 

In [6]:
lines = open(os.path.join(base_path, 'fra.txt')).read().split('\n')
lines[:2]

['Go.\tVa !\tCC-BY 2.0 (France) Attribution: tatoeba.org #2877272 (CM) & #1158250 (Wittydev)',
 'Go.\tMarche.\tCC-BY 2.0 (France) Attribution: tatoeba.org #2877272 (CM) & #8090732 (Micsmithel)']

In [7]:
clean_lines = []
for line in lines:
  try:
    en, fr, _ = line.split('\t')
    clean_lines.append(f'{en}\t{fr}')
  except:
    pass
len(clean_lines)

190206

In [8]:
clean_lines[:2]

['Go.\tVa !', 'Go.\tMarche.']

Now that we have line that are paired. We are now ready to create our file.

In [9]:
with open(os.path.join(base_path, 'en-fr.txt' ), 'w') as f:
  f.write("\n".join(clean_lines))
print("Created file en-fr.txt")

Created file en-fr.txt


### Tokenizers for each language.
We are going to use the spacy tokenizer to create two tokenizers. One will be for `french` and the other one will be for `english`.

In [10]:
from torchtext.data.utils import get_tokenizer

In [11]:
!python -m spacy download en
!python -m spacy download fr

Collecting en_core_web_sm==2.2.5
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz (12.0 MB)
[K     |████████████████████████████████| 12.0 MB 5.4 MB/s 
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.7/dist-packages/en_core_web_sm -->
/usr/local/lib/python3.7/dist-packages/spacy/data/en
You can now load the model via spacy.load('en')
Collecting fr_core_news_sm==2.2.5
  Downloading https://github.com/explosion/spacy-models/releases/download/fr_core_news_sm-2.2.5/fr_core_news_sm-2.2.5.tar.gz (14.7 MB)
[K     |████████████████████████████████| 14.7 MB 5.5 MB/s 
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('fr_core_news_sm')
[38;5;2m✔ Linking successful[0m
/usr/local/lib/python3.7/dist-packages/fr_core_news_sm -->
/usr/local/lib/python3.7/dist-pa

In [12]:
fr_tokenizer = get_tokenizer('spacy', language="fr")
en_tokenizer = get_tokenizer('spacy', language="en")

In [13]:
en_tokenizer("This is a boy?")

['This', 'is', 'a', 'boy', '?']

Now we are ready to create our Dataset class. We are going to call this dataset class `EnglishFrenchDataset`. If you want to understand more about how to inherit from the pytorch `Dataset` class and the magic i recommend [this](https://github.com/CrispenGari/pytorch-python/blob/main/01_PyTorch_Basics/08_DataLoaders/DataLoader.ipynb)


### `EnglishFrenchDataset`:

Args:
```
  path:        path to a text file
  transform:   list transformation
  max_vocab:   maximum number for words
  shuffle:     shuffling the data
```

In [14]:
PATH = os.path.join(base_path, 'en-fr.txt')

In [15]:
class EnglishFrenchDataset(Dataset):
  def __init__(self, path:str, transform=None,
               max_vocab = None):
    self.max_vocab = max_vocab
    self.transform = transform

    # Special tokens
    self.pad = '<pad>'
    self.sos = '<sos>'
    self.eos = '<eos>'
    self.unk = '<unk>'

    # Helper functions that flattens lists
    self.flatten = lambda x: [sublist for lst in x for sublist in lst]
    
    # Loading the dataset
    df = pd.read_csv(path, names=["en", "fr"], sep="\t")
    self.len = len(df)

    # Tokenize src (en -> fr)
    self.tokenize_df(df)
   
    # Replacing rare occuring words in the corpus to be <unk>
    # self.replace_rare_tokens(df)

    # tokens to index maping (stoi -> string to intenger)
    self.stoi(df)
    # Remove sequences with mostly <UNK>
    if self.max_vocab is not None:
      df = self.remove_mostly_unk(df)
      
    # Convert tokens to indices
    self.tokens_to_indices(df)
    self.df = df

  def __len__(self):
    return self.len

  def __getitem__(self, index):
    return self.tokens_pairs[index][0], self.tokens_pairs[index][1]
  # Every sequence (input and target) should start with <sos>  and end with <eos>
  def add_start_and_end_to_tokens(self, x):
    return  [self.sos] + x + [self.eos]

  def tokenize_df(self, df):
    """Turn src/trg into tokens"""
    df['src'] = df.fr.apply(lambda x: x.lower()).apply(fr_tokenizer).apply(self.add_start_and_end_to_tokens)
    df['trg'] = df.en.apply(lambda x: x.lower()).apply(en_tokenizer).apply(self.add_start_and_end_to_tokens)

  def replace_rare_tokens(self, df):
    """replacing rare tokens with unk token"""
    common_tokens_src = self.get_most_common_tokens(
        df.src.tolist()
    )
    common_tokens_trg = self.get_most_common_tokens(
        df.src.tolist()
    )
    df.loc[:, 'src'] = df.src.apply(
          lambda tokens: [token if token in common_tokens_src 
                          else self.unk for token in tokens]
      )
    df.loc[:, 'trg'] = df.trg.apply(
          lambda tokens: [token if token in common_tokens_trg 
                          else self.unk for token in tokens]
    )

  def get_most_common_tokens(self, tokens):
    """Return the max_vocab most common tokens."""
    all_tokens = self.flatten(tokens)
    # Substract 4 for <pad>, <sos>, <eos>, and <unk>
    common_tokens = set(list(zip(*Counter(all_tokens).most_common(
            self.max_vocab - 4)))[0])
    return common_tokens

  def remove_mostly_unk(self, df, threshold=0.99):
      """Remove sequences with mostly <UNK>."""
      calculate_ratio = (
          lambda tokens: sum(1 for token in tokens if token != '<UNK>')
          / len(tokens) > threshold
      )
      df = df[df.src.apply(calculate_ratio)]
      df = df[df.trg.apply(calculate_ratio)]
      return df
  def stoi(self, df):
    unique_tokens_src = set(self.flatten(df.src))
    unique_tokens_trg = set(self.flatten(df.trg))
    for token in reversed([self.pad, self.sos, self.eos, self.unk]):
      if token in unique_tokens_src:
        unique_tokens_src.remove(token)
      if token in unique_tokens_trg:
        unique_tokens_trg.remove(token)
            
    unique_tokens_src = sorted(list(unique_tokens_src))
    unique_tokens_trg = sorted(list(unique_tokens_trg))

    # Add <pad>, <sos>, <eos>, and <unk> tokens
    for token in reversed([self.pad, self.sos, self.eos, self.unk]):
      unique_tokens_src = [token] + unique_tokens_src
      unique_tokens_trg = [token] + unique_tokens_trg
            
      self.stoi_src = {token: idx for idx, token
                                 in enumerate(unique_tokens_src)}
      self.itos_src = {idx: token for token, idx
                                 in self.stoi_src.items()}
      self.stoi_trg = {token: idx for idx, token
                                  in enumerate(unique_tokens_trg)}
      self.itos_trg = {idx: token for token, idx
                                  in self.stoi_trg.items()}
    
  def tokens_to_indices(self, df):
    """Convert tokens to indices."""
    df['src_tokens'] = df.src.apply(
        lambda tokens: [self.stoi_src[token] for token in tokens])
   
    df['trg_tokens'] = df.trg.apply(
        lambda tokens: [self.stoi_trg[token] for token in tokens])
    self.tokens_pairs = list(zip(df.src_tokens, df.trg_tokens))


### Custom transformation
We are going to create a custom transformation class that will convert list to tensors. We are going to call the `ToTensor` class.

In [16]:
class ToTensor:
  def __init__(self):
    pass

  def __call__(self, sample):
    _, __ = sample
    return torch.tensor(_), torch.tensor(__)

### Creating the dataset

In [17]:
%%time
dataset = EnglishFrenchDataset(
    os.path.join(base_path, 'en-fr.txt'),
    transform = ToTensor()
)

CPU times: user 23.7 s, sys: 456 ms, total: 24.2 s
Wall time: 24.2 s


In [19]:
len(dataset)

190206

In [20]:
dataset.df.head(10).iloc[:, 2:]

Unnamed: 0,src,trg,src_tokens,trg_tokens
0,"[<sos>, va, !, <eos>]","[<sos>, go, ., <eos>]","[1, 24169, 4, 2]","[1, 5989, 20, 2]"
1,"[<sos>, marche, ., <eos>]","[<sos>, go, ., <eos>]","[1, 14284, 18, 2]","[1, 5989, 20, 2]"
2,"[<sos>, bouge, !, <eos>]","[<sos>, go, ., <eos>]","[1, 3175, 4, 2]","[1, 5989, 20, 2]"
3,"[<sos>, salut, !, <eos>]","[<sos>, hi, ., <eos>]","[1, 21175, 4, 2]","[1, 6532, 20, 2]"
4,"[<sos>, salut, ., <eos>]","[<sos>, hi, ., <eos>]","[1, 21175, 18, 2]","[1, 6532, 20, 2]"
5,"[<sos>, cours, , !, <eos>]","[<sos>, run, !, <eos>]","[1, 5923, 25871, 4, 2]","[1, 11437, 4, 2]"
6,"[<sos>, courez, , !, <eos>]","[<sos>, run, !, <eos>]","[1, 5904, 25871, 4, 2]","[1, 11437, 4, 2]"
7,"[<sos>, prenez, vos, jambes, à, vos, cous, !, ...","[<sos>, run, !, <eos>]","[1, 17825, 24735, 13126, 24988, 24735, 5946, 4...","[1, 11437, 4, 2]"
8,"[<sos>, file, !, <eos>]","[<sos>, run, !, <eos>]","[1, 10134, 4, 2]","[1, 11437, 4, 2]"
9,"[<sos>, filez, !, <eos>]","[<sos>, run, !, <eos>]","[1, 10138, 4, 2]","[1, 11437, 4, 2]"


### Splitting Sets

In [27]:
train_size = int(len(dataset) * .9)
valid_size = int(len(dataset) * .08)
test_size = len(dataset) - train_size - valid_size
train_dataset,valid_dataset, test_dataset = torch.utils.data.random_split(dataset, [train_size, valid_size, test_size])

train_size, test_size, valid_size

(171185, 3805, 15216)

### Creating generators using DataLoader

In [28]:
def collate(batch):
  src = [torch.LongTensor(item[0]) for item in batch]
  trg = [torch.LongTensor(item[1])  for item in batch]

  padded_src = pad_sequence(src,
                               padding_value=dataset.stoi_src[dataset.pad],
                               batch_first=True)
  padded_trg = pad_sequence(trg,
                            padding_value=dataset.stoi_src[dataset.pad],
                            batch_first=True)

  # Sort by length for CUDA optimizations
  lengths = torch.LongTensor([len(x) for x in src])
  lengths, permutation = lengths.sort(dim=0, descending=True)

  return padded_src[permutation].to(device), padded_trg[permutation].to(device), lengths.to(device)
 


In [29]:
BATCH_SIZE = 128

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, collate_fn=collate)
valid_loader = DataLoader(valid_dataset, batch_size=BATCH_SIZE, collate_fn=collate)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, collate_fn=collate)

### Seq2Seq with Attention

### Encoder model

In [30]:
class Encoder(nn.Module):
  def __init__(self, vocab_size,
               embedding_dim,
               hidden_size, batch_size):
    super(Encoder, self).__init__()
    self.batch_size = batch_size
    self.hidden_size = hidden_size
    self.vocab_size = vocab_size
    self.embedding_dim = embedding_dim
    self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim)
    self.gru = nn.GRU(
        self.embedding_dim,
        self.hidden_size,
        batch_first=True,
    )
    
  def forward(self, src, lengths):
    self.batch_size = src.size(0)
    # Turn input indices into distributed embeddings
    x = self.embedding(src)
    # Remove padding for more efficient RNN application
    x = pack_padded_sequence(x, lengths.to('cpu'), batch_first=True)

    # Apply RNN to get hidden state at all timesteps (output)
    # and hidden state of last output (self.hidden)
    output, self.hidden = self.gru(x, self.init_hidden())
    # Pad the sequences like they were before
    output, _ = pad_packed_sequence(output)
    return output, self.hidden
  def init_hidden(self):
    # Randomly initialize the weights of the RNN
    return torch.randn(1, self.batch_size, self.hidden_size).to(device)


### Decoder Model

In [31]:
class Decoder(nn.Module):
  def __init__(
      self, 
      vocab_size,
      embedding_dim, 
      decoder_hidden_size,
      encoder_hidden_size, 
      batch_size):
    super(Decoder, self).__init__()
    self.batch_size = batch_size
    self.encoder_hidden_size = encoder_hidden_size
    self.decoder_hidden_size = decoder_hidden_size
    self.vocab_size = vocab_size
    self.embedding_dim = embedding_dim
    self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim)
    self.gru = nn.GRU(
        self.embedding_dim + self.encoder_hidden_size, 
        self.decoder_hidden_size,
        batch_first=True,
    )
    self.fc = nn.Linear(self.encoder_hidden_size, self.vocab_size)
    
    # Attention weights
    self.W1 = nn.Linear(self.encoder_hidden_size, self.decoder_hidden_size)
    self.W2 = nn.Linear(self.encoder_hidden_size, self.decoder_hidden_size)
    self.V = nn.Linear(self.encoder_hidden_size, 1)

  def forward(self, trg, hidden, encoder_output):
    self.batch_size = trg.size(0)
    # Switch the dimensions of sequence_length and batch_size
    encoder_output = encoder_output.permute(1, 0, 2)

    # Add an extra axis for a time dimension
    hidden_with_time_axis = hidden.permute(1, 0, 2)
    
    # Attention score (Bahdanaus)
    score = torch.tanh(self.W1(encoder_output) + self.W2(hidden_with_time_axis))

    # Attention weights
    attention_weights = torch.softmax(self.V(score), dim=1)
    
    # Find the context vectors
    context_vector = attention_weights * encoder_output
    context_vector = torch.sum(context_vector, dim=1)
    
    # Turn target indices into distributed embeddings
    x = self.embedding(trg)
    
    # Add the context representation to the target embeddings
    x = torch.cat((context_vector.unsqueeze(1), x), -1)
    
    # Apply the RNN
    output, state = self.gru(x, self.init_hidden())
    
    # Reshape the hidden states (output)
    output = output.view(-1, output.size(2))
    
    # Apply a linear layer
    x = self.fc(output)
    
    return x, state, attention_weights
  
  def init_hidden(self):
    # Randomly initialize the weights of the RNN
    return torch.randn(1, self.batch_size, self.decoder_hidden_size).to(device)

### Encoder Decoder Model

In [75]:
criterion = nn.CrossEntropyLoss(ignore_index=dataset.stoi_src[dataset.pad])
def loss_function(real, pred):
  # Use mask to only consider non-zero inputs in the loss
  mask = real.ge(1).float().to(device)
  loss_ = criterion(pred, real) * mask 
  return torch.mean(loss_)


class EncoderDecoder(nn.Module):

  def __init__(self, inputs_vocab_size,
                targets_vocab_size, hidden_size,
                embedding_dim, batch_size, 
                targets_start_idx, targets_stop_idx):
      super(EncoderDecoder, self).__init__()
      self.batch_size = batch_size
      self.targets_start_idx = targets_start_idx
      self.targets_stop_idx = targets_stop_idx
      
      self.encoder = Encoder(inputs_vocab_size, embedding_dim,
                              hidden_size, batch_size).to(device)
      
      self.decoder = Decoder(targets_vocab_size, embedding_dim,
                              hidden_size, hidden_size, batch_size).to(device)
      
  def predict(self, inputs, lengths):
    self.batch_size = inputs.size(0)
    
    encoder_output, encoder_hidden = self.encoder(
        inputs.to(device),
        lengths,
    )
    decoder_hidden = encoder_hidden

    # Initialize the input of the decoder to be <SOS>
    decoder_input = torch.LongTensor(
        [[self.targets_start_idx]] * self.batch_size,
    )
    
    # Output predictions instead of loss
    output = []
    for _ in range(20):
      predictions, decoder_hidden, _ = self.decoder(
          decoder_input.to(device), 
          decoder_hidden.to(device),
          encoder_output.to(device),
      )
      prediction = torch.multinomial(F.softmax(predictions, dim=1), 1)
      decoder_input = prediction
      
      prediction = prediction[0].item()
      output.append(prediction)

      if prediction == self.targets_stop_idx:
          return output
    return output

  def forward(self, inputs, targets, lengths):
    self.batch_size = inputs.size(0)
    
    encoder_output, encoder_hidden = self.encoder(
        inputs.to(device),
        lengths,
    )
    decoder_hidden = encoder_hidden
    
    # Initialize the input of the decoder to be <SOS>
    decoder_input = torch.LongTensor(
        [[self.targets_start_idx]] * self.batch_size,
    )
            
    # Use teacher forcing to train the model. Instead of feeding the model's
    # own predictions to itself, feed the target token at every timestep.
    # This leads to faster convergence
    loss = 0
    for timestep in range(1, targets.size(1)):
      predictions, decoder_hidden, _ = self.decoder(
          decoder_input.to(device), 
          decoder_hidden.to(device),
          encoder_output.to(device),
      )
      decoder_input = targets[:, timestep].unsqueeze(1)
      
      loss += loss_function(targets[:, timestep], predictions)
    return loss / targets.size(1)

### Model Instance

In [76]:
model = EncoderDecoder(
    inputs_vocab_size=len(dataset.stoi_src),
    targets_vocab_size=len(dataset.stoi_trg),
    hidden_size=256,
    embedding_dim=100, 
    batch_size=BATCH_SIZE, 
    targets_start_idx=dataset.stoi_trg[dataset.sos],
    targets_stop_idx=dataset.stoi_trg[dataset.eos],
).to(device)

model

EncoderDecoder(
  (encoder): Encoder(
    (embedding): Embedding(25876, 100)
    (gru): GRU(100, 256, batch_first=True)
  )
  (decoder): Decoder(
    (embedding): Embedding(15139, 100)
    (gru): GRU(356, 256, batch_first=True)
    (fc): Linear(in_features=256, out_features=15139, bias=True)
    (W1): Linear(in_features=256, out_features=256, bias=True)
    (W2): Linear(in_features=256, out_features=256, bias=True)
    (V): Linear(in_features=256, out_features=1, bias=True)
  )
)

### Counting model parameters

In [77]:
def count_trainable_params(model):
  return sum(p.numel() for p in model.parameters()), sum(p.numel() for p in model.parameters() if p.requires_grad)

n_params, trainable_params = count_trainable_params(model)
print(f"Total number of paramaters: {n_params:,}\nTotal tainable parameters: {trainable_params:,}")

Total number of paramaters: 8,870,560
Total tainable parameters: 8,870,560


### Initializing model weights

In [78]:
def init_weights(m: nn.Module):
  for name, param in m.named_parameters():
    if 'weight' in name:
      nn.init.normal_(param.data, mean=0, std=0.01)
    else:
      nn.init.constant_(param.data, 0)

In [79]:
model.apply(init_weights)

EncoderDecoder(
  (encoder): Encoder(
    (embedding): Embedding(25876, 100)
    (gru): GRU(100, 256, batch_first=True)
  )
  (decoder): Decoder(
    (embedding): Embedding(15139, 100)
    (gru): GRU(356, 256, batch_first=True)
    (fc): Linear(in_features=256, out_features=15139, bias=True)
    (W1): Linear(in_features=256, out_features=256, bias=True)
    (W2): Linear(in_features=256, out_features=256, bias=True)
    (V): Linear(in_features=256, out_features=1, bias=True)
  )
)

In [80]:
optimizer = torch.optim.Adam(model.parameters())

### Training and evaluate functions

In [81]:
def train(model, iterator, optimizer):
  model.train()
  total_loss = total = 0
  for index, (inputs, targets, lengths) in enumerate(iterator):
    optimizer.zero_grad()
    loss = model(inputs, targets, lengths)
    loss.backward()
    optimizer.step()
    total_loss += loss.item()
    total += targets.size(1)
  train_loss = total_loss / total
  return train_loss

def evaluate(model, iterator):
  model.eval()
  total_loss = total = 0
  with torch.no_grad():
      for index, (inputs, targets, lengths) in enumerate(iterator):
        loss = model(inputs, targets, lengths)
        total_loss += loss.item()
        total += targets.size(1)
  test_loss = total_loss / total
  return test_loss


### Helper functions

1. time to string

In [82]:
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

2. Tabulate training epoch

In [83]:
from prettytable import PrettyTable
def tabulate_training(column_names, data, title):
  table = PrettyTable(column_names)
  table.title= title
  table.align[column_names[0]] = 'l'
  table.align[column_names[1]] = 'r'
  table.align[column_names[2]] = 'r'
  table.align[column_names[3]] = 'r'
  for row in data:
    table.add_row(row)
  print(table)

### Train Loop

In [53]:
N_EPOCHS = 10
best_valid_loss = float('inf')
column_names = ["SET", "LOSS", "PPL", "ETA"]
print("TRAINING STARTS....")
for epoch in range(N_EPOCHS):
  start = time.time()
  train_loss = train(model, train_loader, optimizer )
  valid_loss = evaluate(model, valid_loader)
  end = time.time()
  title = f"EPOCH: {epoch+1:02}/{N_EPOCHS:02} | {'saving model...' if valid_loss < best_valid_loss else 'not saving...'}" 
  if valid_loss < best_valid_loss:
      best_valid_loss = valid_loss
      torch.save(model.state_dict(), 'best-model.pt')
  rows_data =[
        ["train", f"{train_loss:.3f}", f"{math.exp(train_loss):7.3f}", hms_string(end - start) ],
        ["val", f"{valid_loss:.3f}", f"{math.exp(train_loss):7.3f}", '' ]
  ]
  tabulate_training(column_names, rows_data, title)

print("TRAINING ENDS....")


TRAINING STARTS....
+--------------------------------------+
|    EPOCH: 01/10 | saving model...    |
+-------+-------+---------+------------+
| SET   |  LOSS |     PPL |        ETA |
+-------+-------+---------+------------+
| train | 0.094 |   1.099 | 0:03:48.32 |
| val   | 0.086 |   1.099 |            |
+-------+-------+---------+------------+
+--------------------------------------+
|    EPOCH: 02/10 | saving model...    |
+-------+-------+---------+------------+
| SET   |  LOSS |     PPL |        ETA |
+-------+-------+---------+------------+
| train | 0.071 |   1.074 | 0:03:48.55 |
| val   | 0.065 |   1.074 |            |
+-------+-------+---------+------------+
+--------------------------------------+
|    EPOCH: 03/10 | saving model...    |
+-------+-------+---------+------------+
| SET   |  LOSS |     PPL |        ETA |
+-------+-------+---------+------------+
| train | 0.055 |   1.056 | 0:03:48.87 |
| val   | 0.051 |   1.056 |            |
+-------+-------+---------+----------

### Evaluating the best model.

In [84]:
model.load_state_dict(torch.load('best-model.pt'))

test_loss = evaluate(model, test_loader)
title = "Model Evaluation Summary"
data_rows = [["Test", f'{test_loss:.3f}', f'{math.exp(test_loss):7.3f}', ""]]

tabulate_training(["SET", "LOSS", "PPL", "ETA"], data_rows, title)


+------------------------------+
|   Model Evaluation Summary   |
+------+-------+---------+-----+
| SET  |  LOSS |     PPL | ETA |
+------+-------+---------+-----+
| Test | 0.025 |   1.025 |     |
+------+-------+---------+-----+


### Inference

In [86]:
model.eval()
total_loss = total = 0
with torch.no_grad():
  for inputs, targets, lengths in test_loader:
    print('input| >', ' '.join([
        dataset.itos_src[idx]
        for idx in inputs.cpu()[0].numpy()[1:-1]
    ]))
    print('target| >', ' '.join([
        dataset.itos_trg[idx]
        for idx in targets.cpu()[0].numpy()[1:-1]
    ]))
    # Forwards pass
    outputs = model.predict(inputs, lengths)
    prediction = ' '.join([
        dataset.itos_trg[idx]
        for idx in outputs[:-1]
    ])
    print()
    print("predicted| =", prediction)
    print("*" * 100)
    print()

input| > après avoir réfléchi sur ma vie jusqu' à présent , j' ai décidé que j' avais besoin de changer mes objectifs .
target| > after reflecting on my life up to now , i decided that i needed to change my goals .

predicted| = after your soul in my english to him , i was decided i change .
****************************************************************************************************

input| > pourquoi ne restes - tu pas un moment après que tout le monde soit parti de manière à ce que nous puissions discuter   ?
target| > why do n't you hang around a while after everyone else leaves so we can talk ?

predicted| = why do n't you stay a while after everybody else that happen , what we can play himself about
****************************************************************************************************

input| > je me fiche que tu y ailles ou pas . j' y vais de toutes façons .
target| > i do n't care if you go or not . i 'm going anyway .

predicted| = i do n't care if you 're

### Conclusion

In this notebook we have learned ho to create and load our dataset from local files and prepare it step by step to make it ready for training.

### Ref 

* [this github repo](https://github.com/scoutbee/pytorch-nlp-notebooks)