# French to English Language Translation with RNN and Transformers

The notebook provides training for two language translation models using the RNN and Transformers algorithms. The architecture for the models can be found in their respective files, ```rnn.py``` and ```transformers.py```. Additionally, the data pre-processing code can be found in ```data.py```.

In [1]:
import os 
import random
import pandas as pd

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader

import warnings
warnings.filterwarnings('ignore')

from data import TranslationDataset
from rnn import RNN, RNNTools
from transformers import Transformers, TransformersTools

from IPython.display import Markdown

If you're retraining the model, set ```skip_training``` to ```True```. 
Newly trained models will be saved as ```models/{architecture}.pth```.

In [2]:
# configurable parameters, change as needed

# set to true if loading existing model file, false if training a new model
skip_training = True
data_dir = 'data'
rnn_model_save_path = 'models/rnn.pth'
tra_model_save_path = 'models/transformers.pth'


In [3]:
# create dirs if not existing
os.makedirs(data_dir, exist_ok=True)
os.makedirs('models', exist_ok=True)
os.makedirs('logs', exist_ok=True)

The device_type is automatically set to ```cuda``` if it's available; otherwise, it's set to ```cpu```. You can also manually overwrite it if you have a different device setup.

In [4]:
# additional settings, automatically selects cuda if available
if skip_training:
    device_type = 'cpu'
elif torch.cuda.is_available():
    device_type = 'cuda:0'
else:
    device_type = 'cpu'

# set manually if needed e.g. device_type = 'cpu'
print("Using device type:", device_type)
device = torch.device(device_type)

Using device type: cpu


Load the data and preprocess. 

Preprocessing includes tasks such as tokenization, where each sentence is split into individual words or subword units, and mapping each word or subword unit to an index value. This mapping creates a dictionary, which is used to convert the sentences into sequences of index values.

In [5]:
trainset = TranslationDataset(data_dir, train=True)
testset = TranslationDataset(data_dir, train=False)
print('Number of sentence pairs in the training set: ', len(trainset))
print('Number of sentence pairs in the test set: ', len(testset))

Number of sentence pairs in the training set:  8682
Number of sentence pairs in the test set:  2171


## Recurrent Neural Networks (RNN)

The next cell loads the dataset and processes it using a collate function. The collate function is responsible for processing and organizing the input data into batches that can be fed into the neural network for training.

The RNN collate function performs several important tasks, including padding the sequences to ensure that they are of equal length, sorting the sequences by length to optimize the training process, and converting the sequences into tensors that can be processed by the neural network.

In [6]:
trainloader = DataLoader(dataset=trainset, batch_size=64, shuffle=True, collate_fn=RNNTools.collate, pin_memory=True)
testloader = DataLoader(dataset=testset, batch_size=64, shuffle=False, collate_fn=RNNTools.collate)

### Encoder-Decoder architecture using RNN/GRU:

In [7]:
rnn = RNN(trainset.input_lang.n_words, trainset.output_lang.n_words, embed_size=256, hidden_size=256)
rnn.to(device)

RNN(
  (encoder): Encoder(
    (embedding): Embedding(4489, 256)
    (gru): GRU(256, 256)
  )
  (decoder): Decoder(
    (embedding): Embedding(2925, 256)
    (gru): GRU(256, 256)
    (out): Linear(in_features=256, out_features=2925, bias=True)
  )
)

### Training

The RNN model is optimized using the Adam optimizer and the negative log-likelihood loss (NLLLoss).

During training, the RNN model also uses a technique called teacher forcing, which involves feeding the correct previous word in the target sequence to the decoder as input, instead of using the predicted word from the previous time step.

Here, teacher forcing is used 50% of the time. This helps the model learn to generate translations more accurately by giving it access to the ground truth translations during training.

In [8]:
if not skip_training:
    PADDING_VALUE = 0 
    teacher_forcing_ratio = 0.5
    num_epochs = 2

    optimizer = torch.optim.Adam(rnn.parameters(), lr=0.001)    
    criterion = nn.NLLLoss(ignore_index=PADDING_VALUE)
    
    rnn.train()
    
    for epoch in range(num_epochs):
        total_loss = 0
        total_data = 0
        for src_seqs, src_seq_lengths, tgt_seqs in trainloader:
            src_seqs, tgt_seqs = src_seqs.to(device), tgt_seqs.to(device)
            
            if torch.rand(1) < teacher_forcing_ratio:
                teacher_forcing=True
            else:
                teacher_forcing=False
            
            # forward pass
            outputs = rnn(src_seqs, tgt_seqs, src_seq_lengths, teacher_forcing)
            loss = criterion(outputs.permute(0, 2, 1).to(device), tgt_seqs)
            
            # compute loss metric
            total_loss += (loss.item() * src_seqs.shape[1])
            total_data += src_seqs.shape[1]

            # backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print("epoch: {0} training loss: {1:.3f}".format(epoch, total_loss/total_data))

In [9]:
if not skip_training:
    torch.save(rnn.state_dict(), rnn_model_save_path)

In [10]:
if skip_training:
    rnn.load_state_dict(torch.load(rnn_model_save_path, map_location=lambda storage, loc: storage))
    print('RNN model loaded from: {}'.format(rnn_model_save_path))
    rnn.to(device)
    rnn.eval()

RNN model loaded from: models/rnn.pth


In [11]:
rnntools = RNNTools(device)

In [30]:
# initialize a dataframe where we will save our results for better display formatting
results_df = pd.DataFrame(index=range(20), columns=['batch_i', 'Source', 'Actual Translation', 'RNN Translation', 'Transformer Translation'])

## Results

The table below shows RNN- translations of randomly sampled sentences from the test dataset.

In [13]:
i = 0
for src_seqs, src_mask, tgt_seqs in testloader:
    if i >= 20:
        break

    out_seqs = rnntools.translate(rnn, src_seqs, src_mask)

    for r in random.sample(range(0, 64), 1):
        results_df.loc[i, 'batch_i'] = r
        results_df.loc[i, 'Source'] = rnntools.seq_to_string(src_seqs[:,r], testset.input_lang)
        results_df.loc[i, 'Actual Translation'] = rnntools.seq_to_string(tgt_seqs[:,r], testset.output_lang)
        results_df.loc[i, 'RNN Translation'] = rnntools.seq_to_string(out_seqs[:,r], testset.output_lang)

    i += 1

display(Markdown(results_df[['Source', 'Actual Translation', 'RNN Translation']].to_markdown()))

Test data:


|    | Source                                      | Actual Translation                | RNN Translation                   |
|---:|:--------------------------------------------|:----------------------------------|:----------------------------------|
|  0 | je suis en train de boire une biere .       | i m drinking a beer .             | i am drinking a letter .          |
|  1 | elles cherchent un bouc emissaire .         | they re looking for a scapegoat . | they re looking for a scapegoat . |
|  2 | ils ne constituent pas une menace .         | they re not a threat .            | they re not a bad good . .        |
|  3 | tu mens n est ce pas ?                      | you re lying aren t you ?         | you re staying aren t you ?       |
|  4 | je suis heureux de vous avoir invitee .     | i m glad i invited you .          | i m glad i invited you .          |
|  5 | il connait le maire .                       | he is acquainted with the mayor . | he s open a chinese .             |
|  6 | je suis interesse .                         | i m interested .                  | i m not .                         |
|  7 | nous sommes amoureux .                      | we re in love .                   | we re in .                        |
|  8 | elles sont chretiennes .                    | they are christians .             | they are christians .             |
|  9 | je crains que tu m aies mal compris .       | i m afraid you misunderstood me . | i m afraid that will be happy .   |
| 10 | on a vraiment besoin d eau .                | we are badly in want of water .   | we re really proud of this .      |
| 11 | je suis toujours heureux .                  | i m always happy .                | i m always happy .                |
| 12 | tu es tout ce que j ai .                    | you re all i ve got .             | you re all i ve got .             |
| 13 | je mange un sandwich .                      | i m eating a sandwich .           | i am eating a sandwich .          |
| 14 | il est trop sensible .                      | he is too sensitive .             | he s too drunk .                  |
| 15 | vous etes prevenant .                       | you re considerate .              | you re considerate .              |
| 16 | j y vais .                                  | i m going .                       | i m going going .                 |
| 17 | on n est jamais trop vieux pour apprendre . | you re never too old to learn .   | he s too old to learn too old .   |
| 18 | elle prepare le dejeuner .                  | she is making dinner .            | she s missed to the . .           |
| 19 | nous sommes tous en train de diner .        | we re all having lunch .          | we re all in . .                  |

Here is the BLEU score for the RNN translation, which ranges from 0 to 100 and is a metric used to measure the similarity between the machine-generated translation and the reference translations.

In [14]:
score = rnntools.compute_bleu_score(rnn, trainloader, trainset.output_lang)
print(f'BLEU score on training data: {score*100}')
score = rnntools.compute_bleu_score(rnn, testloader, trainset.output_lang)
print(f'BLEU score on test data: {score*100}')

BLEU score on training data: 96.69817090034485
BLEU score on test data: 47.73730933666229


## Transformers

In [15]:
# you can also set skip_training for Transformers different from the RNN
# skip_training = True

Similar to RNN, we first load the dataset and process it with collate function.

The Transformers' collate function takes in a batch of input sequences of varying lengths, pads them to the maximum length in the batch, and creates attention masks to indicate the padding locations. The collate function also creates a batch of target sequences by shifting the input sequences by one time step and adding a start-of-sequence token at the beginning of each target sequence.

In [16]:
trainloader = DataLoader(dataset=trainset, batch_size=64, shuffle=True, collate_fn=TransformersTools.collate, pin_memory=True)
testloader = DataLoader(dataset=testset, batch_size=64, shuffle=False, collate_fn=TransformersTools.collate)

### Encoder-Decoder architecture using Transformers:

In [17]:
tra = Transformers(trainset.input_lang.n_words, trainset.output_lang.n_words, n_blocks=3, n_features=256, n_heads=16, n_hidden=1024)
tra.to(device)

Transformers(
  (encoder): Encoder(
    (embedding): Embedding(4489, 256, padding_idx=0)
    (positional_encoding): PositionalEncoding(
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder_blocks): ModuleList(
      (0-2): 3 x EncoderBlock(
        (self_attention): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
        )
        (dropout1): Dropout(p=0.1, inplace=False)
        (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
        (mlp): Sequential(
          (0): Linear(in_features=256, out_features=1024, bias=True)
          (1): Dropout(p=0.1, inplace=False)
          (2): ReLU()
          (3): Linear(in_features=1024, out_features=256, bias=True)
        )
        (dropout2): Dropout(p=0.1, inplace=False)
        (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      )
    )
  )
  (decoder): Decoder(
    (embedding): Embedding(2925, 256, padding_idx=0)
 

## Training

Transformers is also trained with Adam optimizer and NLLLoss.

In [18]:
if not skip_training:
    PADDING_VALUE = 0
    num_epochs = 2

    optimizer = torch.optim.Adam(tra.parameters(), lr=0.001)
    criterion = nn.NLLLoss(ignore_index=PADDING_VALUE)

    for epoch in range(num_epochs):
        total_loss = 0
        total_data = 0
        for src_seqs, src_mask, tgt_seqs in trainloader:
            src_seqs, src_mask, tgt_seqs = src_seqs.to(device), src_mask.to(device), tgt_seqs.to(device)
            
            # forward
            outputs = tra(src_seqs, tgt_seqs, src_mask)
            
            # compute loss metric
            loss = criterion(outputs.permute(0, 2, 1).to(device), tgt_seqs[1:])
            total_loss += (loss.item() * src_seqs.shape[1])
            total_data += src_seqs.shape[1]

            # backward and optimize
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        print("epoch: {0} training loss: {1:.3f}".format(epoch, total_loss/total_data))


In [19]:
if not skip_training:
    torch.save(tra.state_dict(), tra_model_save_path)

In [20]:
if skip_training:
    tra.load_state_dict(torch.load(tra_model_save_path, map_location=lambda storage, loc: storage))
    print('Transformers model loaded from: {}'.format(tra_model_save_path))
    tra.to(device)
    tra.eval()

Transformers model loaded from: models/transformers.pth


In [21]:
tratools = TransformersTools(device)

## Results

The table below shows Transformers- translations of the same batch of test sentences as above.

In [22]:
i = 0
for src_seqs, src_mask, tgt_seqs in testloader:
    if i >= 20:
        break

    out_seqs = tratools.translate(tra, src_seqs, src_mask)

    r = results_df.loc[i, 'batch_i']
    results_df.loc[i, 'Transformer Translation'] = tratools.seq_to_string(out_seqs[:,r], testset.output_lang)

    i += 1

display(Markdown(results_df[['Source', 'Actual Translation', 'Transformer Translation']].to_markdown()))

Test data:


|    | Source                                      | Actual Translation                | Transformer Translation           |
|---:|:--------------------------------------------|:----------------------------------|:----------------------------------|
|  0 | je suis en train de boire une biere .       | i m drinking a beer .             | i am drinking a beer .            |
|  1 | elles cherchent un bouc emissaire .         | they re looking for a scapegoat . | they re looking for a scapegoat . |
|  2 | ils ne constituent pas une menace .         | they re not a threat .            | they re not watching .            |
|  3 | tu mens n est ce pas ?                      | you re lying aren t you ?         | you re lying aren t you ?         |
|  4 | je suis heureux de vous avoir invitee .     | i m glad i invited you .          | i m glad i invited you .          |
|  5 | il connait le maire .                       | he is acquainted with the mayor . | he s stalling for tea .           |
|  6 | je suis interesse .                         | i m interested .                  | i m interested .                  |
|  7 | nous sommes amoureux .                      | we re in love .                   | we re biased .                    |
|  8 | elles sont chretiennes .                    | they are christians .             | they are christians .             |
|  9 | je crains que tu m aies mal compris .       | i m afraid you misunderstood me . | i m afraid you will get may .     |
| 10 | on a vraiment besoin d eau .                | we are badly in want of water .   | we re truly need .                |
| 11 | je suis toujours heureux .                  | i m always happy .                | i m always happy .                |
| 12 | tu es tout ce que j ai .                    | you re all i ve got .             | you re all i ve got .             |
| 13 | je mange un sandwich .                      | i m eating a sandwich .           | i m eating a sandwich .           |
| 14 | il est trop sensible .                      | he is too sensitive .             | he s too sensitive .              |
| 15 | vous etes prevenant .                       | you re considerate .              | you re considerate .              |
| 16 | j y vais .                                  | i m going .                       | i m going there .                 |
| 17 | on n est jamais trop vieux pour apprendre . | you re never too old to learn .   | you are too old to learn .        |
| 18 | elle prepare le dejeuner .                  | she is making dinner .            | she is making dinner .            |
| 19 | nous sommes tous en train de diner .        | we re all having lunch .          | we re all growing .               |

Here is the BLEU score for the Transformers translation. It performed better than the RNN model.

In [23]:
score = tratools.compute_bleu_score(tra, trainloader, trainset.output_lang)
print(f'BLEU score on training data: {score*100}')
score = tratools.compute_bleu_score(tra, testloader, trainset.output_lang)
print(f'BLEU score on test data: {score*100}')

BLEU score on training data: 92.83134043323167
BLEU score on test data: 58.79185315508608


## RNN vs. Transformers: Combined Results

Here are the same sentences shown side by side for better comparison. It can be observed that the Transformers model has produced more accurate translations in this sample, which is consistent with the BLEU scores.

In [24]:
display(Markdown(results_df[['Source', 'Actual Translation', 'RNN Translation', 'Transformer Translation']].to_markdown()))

|    | Source                                      | Actual Translation                | RNN Translation                   | Transformer Translation           |
|---:|:--------------------------------------------|:----------------------------------|:----------------------------------|:----------------------------------|
|  0 | je suis en train de boire une biere .       | i m drinking a beer .             | i am drinking a letter .          | i am drinking a beer .            |
|  1 | elles cherchent un bouc emissaire .         | they re looking for a scapegoat . | they re looking for a scapegoat . | they re looking for a scapegoat . |
|  2 | ils ne constituent pas une menace .         | they re not a threat .            | they re not a bad good . .        | they re not watching .            |
|  3 | tu mens n est ce pas ?                      | you re lying aren t you ?         | you re staying aren t you ?       | you re lying aren t you ?         |
|  4 | je suis heureux de vous avoir invitee .     | i m glad i invited you .          | i m glad i invited you .          | i m glad i invited you .          |
|  5 | il connait le maire .                       | he is acquainted with the mayor . | he s open a chinese .             | he s stalling for tea .           |
|  6 | je suis interesse .                         | i m interested .                  | i m not .                         | i m interested .                  |
|  7 | nous sommes amoureux .                      | we re in love .                   | we re in .                        | we re biased .                    |
|  8 | elles sont chretiennes .                    | they are christians .             | they are christians .             | they are christians .             |
|  9 | je crains que tu m aies mal compris .       | i m afraid you misunderstood me . | i m afraid that will be happy .   | i m afraid you will get may .     |
| 10 | on a vraiment besoin d eau .                | we are badly in want of water .   | we re really proud of this .      | we re truly need .                |
| 11 | je suis toujours heureux .                  | i m always happy .                | i m always happy .                | i m always happy .                |
| 12 | tu es tout ce que j ai .                    | you re all i ve got .             | you re all i ve got .             | you re all i ve got .             |
| 13 | je mange un sandwich .                      | i m eating a sandwich .           | i am eating a sandwich .          | i m eating a sandwich .           |
| 14 | il est trop sensible .                      | he is too sensitive .             | he s too drunk .                  | he s too sensitive .              |
| 15 | vous etes prevenant .                       | you re considerate .              | you re considerate .              | you re considerate .              |
| 16 | j y vais .                                  | i m going .                       | i m going going .                 | i m going there .                 |
| 17 | on n est jamais trop vieux pour apprendre . | you re never too old to learn .   | he s too old to learn too old .   | you are too old to learn .        |
| 18 | elle prepare le dejeuner .                  | she is making dinner .            | she s missed to the . .           | she is making dinner .            |
| 19 | nous sommes tous en train de diner .        | we re all having lunch .          | we re all in . .                  | we re all growing .               |