# HW5 Coding - Machine Translation

In this assignment, we will create a very simple Seq2Seq or encoder-decoder model for machine translation. A small dataset is constructed for this assignment for translating short sentences in french to english and can be found in the data folder. 

Before you start working with this homework, make sure to install the necessary dependencies with `pip install -r requirements.txt` (preferably in a virtual environment running python=3.7)

In [1]:
!pip install -r requirements.txt

You should consider upgrading via the '/home/andre/miniconda3/bin/python3 -m pip install --upgrade pip' command.[0m


In [2]:
# Imports
import string
import re
import random

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
from torchtext.data.metrics import bleu_score
import pandas as pd
import time
import math
import numpy as np

# Plotting
# for colab
# %matplotlib inline 
# for local notebook
%matplotlib notebook 
import matplotlib.pyplot as plt

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
from dataset import split_train_val, initLang, create_data_loader, SOS_token, EOS_token, UNK_token
from model import *

## Read Data

Make sure `data/eng-fra.txt` file exists then read fra-eng translation data. Dataset is already cleaned and filtered to exclude non-letter characters.

In [5]:
data_df = pd.read_csv('data/fra-eng.csv')

Look at the data

In [6]:
print(data_df.shape)
data_df.sample(5)

(10599, 2)


Unnamed: 0,fra,eng
886,il est ton fils .,he s your son .
6572,je n y suis pas tres bon .,i m not very good at it .
604,je n en ai pas termine .,i m not done .
10339,elle se plaint toujours de son travail .,she is always complaining of her job .
1491,elle est adorable .,she s adorable .


### Implement

We will now create create train / val splits. Please go to `dataset.py` and finish implementation for the function `split_train_val`

In [7]:
train_df, val_df = split_train_val(data_df, props=[0.8, 0.2])
# view train val size
print(train_df.shape, val_df.shape)
# verify implementation
assert train_df.shape[0] == 8479, "split is not implemented properly"
assert val_df.shape[0] == 2120, "split is not implemented properly"

(8479, 2) (2120, 2)


Initialize vocabulary for input and target language vocabulary from train_df using the `initLang` function in `dataset.py`. This is already implemented for you.

In [8]:
[input_language_name, target_langauge_name] = train_df.columns.to_list()
print(f"Input language\t: {input_language_name}\nTarget language\t: {target_langauge_name}")

Input language	: fra
Target language	: eng


In [9]:
input_lang = initLang(name=input_language_name, df = train_df)
target_lang = initLang(name = target_langauge_name, df = train_df)

Counting words for lang=fra ...
fra total word types: 3525
Counting words for lang=eng ...
eng total word types: 2183


### Implement

Create dataloaders for train, val and test data using `input_lang` and `target_lang` variables defined above. Please go to `dataset.py` and finish the implementation for `MT_Dataset.__getitem__()` in 

In [10]:
train_dataloader = create_data_loader(train_df, 
                                      input_lang=input_lang, 
                                      target_lang=target_lang,
                                      shuffle=True)
val_dataloader = create_data_loader(val_df, 
                                    input_lang=input_lang, 
                                    target_lang=target_lang, 
                                    shuffle=True)

In [11]:
# view sample from dataloader
next(iter(train_dataloader))

{'input_tensor': tensor([[  7,  12, 116, 118,   6,   1]]),
 'target_tensor': tensor([[ 3,  4, 75,  5,  1]])}

## Model Seq2Seq

We will create a simle encoder-decoder machine translation model. Both encoder and decoders are RNNs. For this assigment, we will use gated recurrent units (GRU) for both encoder and decoder RNN. In the following section, you will implement the `EncoderRNN` and `DecoderRNN` models. The `EncoderRNN` will process tokens in input sentence one by one, where the last hidden state is used as context for the decoder. Ideally, this *context* should encode teh "meaning" of the input sequence for the decoder to efficiently produce the translated target sequence. The `DecoderRNN`'s first input will be a \<SOS\> token and context from the encoder. During test time, we would feed the decoder's output back in the next time step until it generates a \<EOS\> token or we reach some MAX_LEN. For training, we can randomly switch between feeding in decoder's own outputs as inputs and *teacher forcing* where we feed in the target tokens (true labels) regardless for what the decoder outputs at each time step. 

**Relevant work**:
- [Sequence to Sequence Learnign with Neural Networks](https://arxiv.org/pdf/1409.3215.pdf)
- [Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation]

### EncoderRNN and DecoderRNN

Please **implement** the `__init__` and `__forward__` functions of the `EncoderRNN` and `DecoderRNN` in `model.py`

### Training the Model

Please **implement** the following `train` function which processes just one sample at a time. The train loop is implemented for you after this cell. The template initializes relevant variables. You will implement a for loop to step through encoder with `input_tensor` and a decoder for loop to step through `target_tensor` (with **and** without teacher forcing). The decoder loop is where you will also calculate the loss using the criterion. 

If you are unsure what arguments are being passed to the `train` function, please view the `train_iters` and `eval_iters` functions in the cells after. 

In [12]:
teacher_forcing_ratio = 0.5

def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion):
    # initialize encoder hidden state
    encoder_hidden = encoder.initHidden()

    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)
    loss = 0
    
    # Process input input_tensor (one token at a time with a for loop)
    ## YOUR CODE STARTS HERE (~2 lines of code) ##
    # for loop header
    decoder_input = []
    for sentence in input_tensor:
        # call encoder with input_tensor[ei] and encoder's hidden vector
        output, encoder_hidden = encoder(sentence, encoder_hidden)
        decoder_input.append(output)

    ## YOUR CODE ENDS HERE ##

    # Start decoder's input with <SOS> token
    decoder_input = torch.tensor([[SOS_token]], device=device)

    # initialize decoder hidden state as encoder's final hidden state
    decoder_hidden = encoder_hidden

    # decide whether to use teacher forcing 
    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    if use_teacher_forcing:
        # Teacher forcing: Feed the target as the next input
        ## YOUR CODE STARTS HERE (~4 lines of code) ##
        # for loop header to loop through target_tensor
        for word in target_tensor:
            # call decoder with decoder_input and decoder_hidden. make sure to reset
            #     decoder_hidden with the new hidden state returned
            
            output,decoder_hidden = decoder(decoder_input, decoder_hidden)

            # calculate loss using decoder's output and corresponding target_tensor[di]
            #     the criterion used is a negative log likelihood loss (NLLLoss)
            loss += criterion(output,word)

            # (Teacher forcing) set next decoder input as target_tensor[di]
            decoder_input = word

        ## YOUR CODE ENDS HERE ##

    else:
        # Without teacher forcing: use its own predictions as the next input
        ## YOUR CODE STARTS HERE (~5-7 lines of code) ##
        # for loop header to loop through target_tensor
        for di in range(target_length):
            # call decoder same as above
            output,decoder_hidden = decoder(decoder_input, decoder_hidden)
            # calculate loss same as above
            loss += criterion(output,target_tensor[di])
            # set next decoder input as argmax of decoder's output
            decoder_input = torch.argmax(output)
            # if new decoder_input is EOS_token: break
            if decoder_input == EOS_token:
                break
        ## YOUR CODE ENDS HERE ##

    loss.backward()

    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length

Next, please **implement** the `evaluate` function in the next cell. The process will be the same as `train`, except you will not use teacher forcing when stepping through the decoder. 

In [13]:
# evaluate
def evaluate(input_tensor, target_tensor, encoder, decoder, criterion, max_length=MAX_LENGTH):
    with torch.no_grad():
        input_length = input_tensor.size()[0]
        target_length = target_tensor.size()[0]
        encoder_hidden = encoder.initHidden()
        loss = 0
        # Process input input_tensor (one token at a time with a for loop)
        ## YOUR CODE STARTS HERE (~2 lines of code) ##
        decoder_input = []
        for sentence in input_tensor:
        # call encoder with input_tensor[ei] and encoder's hidden vector
            output, encoder_hidden = encoder(sentence, encoder_hidden)
            decoder_input.append(output)
        ## YOUR CODE ENDS HERE ##

        # Start decoder's input with <SOS> token
        decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS

        # initialize decoder hidden state as encoder's final hidden state
        decoder_hidden = encoder_hidden

        decoded_indices = []
        decoded_words = []

        # Run decoder starting with encoder's context (decoder_hidden)
        #     without using teacher forcing: use its own predictions as the next input
        ## YOUR CODE STARTS HERE (~5-7 lines of code) ##
        # for loop header to loop through target_tensor
        for di in range(target_length):
            # call decoder
            output,decoder_hidden = decoder(decoder_input, decoder_hidden)
            # calculate loss same as above
            loss += criterion(output,target_tensor[di])
            # set next decoder input as argmax of decoder's output
            decoder_input = torch.argmax(output)
            # append outputted index to decoded_indices
            decoded_indices.append(decoder_input)
            # if new decoder_input is EOS_token: break
            if decoder_input == EOS_token:
                break

        ## YOUR CODE ENDS HERE ##
        decoded_words = target_lang.getWordsFromIndices(decoded_indices)

        return decoded_words, loss.item() / len(decoded_words)

`trainEvalIters` is implemented for you. Simply review the implementation and run the following cell.

In [14]:
def asMinutes(s):
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)


def timeSince(since, percent):
    now = time.time()
    s = now - since
    es = s / (percent)
    rs = es - s
    return '%s (- %s)' % (asMinutes(s), asMinutes(rs))

def createDataFrame(points):
    df = pd.DataFrame(points, columns=['train_loss', 'val_loss', 'blue_scores_val'])
    return df

def evalIters(encoder, decoder, val_dataloader, criterion, num_samples=100):
    if num_samples > len(val_dataloader):
        num_samples = len(val_dataloader)
    
    candidates = [] # predicted output
    references = [] # true targets
    
    total_loss = 0
    for iter, sample in enumerate(val_dataloader):
        if iter > num_samples: break

        input_tensor = sample['input_tensor'].squeeze().to(device)
        target_tensor = sample['target_tensor'].squeeze().to(device)
        
        decoded_out, loss = evaluate(input_tensor, target_tensor, encoder, decoder, criterion)
        candidates.append(decoded_out)
        target_indices = sample['target_tensor'].squeeze().tolist()
        target_words = target_lang.getWordsFromIndices(target_indices)
        references.append([target_words])
        total_loss += loss
    
    return total_loss / num_samples, bleu_score(candidate_corpus=candidates, 
                                                references_corpus=references,
                                                max_n=4)

def trainEvalIters(encoder, decoder, epochs, train_dataloader, val_dataloader, 
                   eval_every=2000, eval_samples=200, learning_rate=0.01):
    start = time.time()
    plot_stats = []
    train_loss_total = 0  # Reset every eval_every

    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)

    criterion = nn.NLLLoss()
    num_samples = len(train_dataloader)
    n_iters = epochs * num_samples
    
    for ep in range(epochs):
        for iter, sample in enumerate(train_dataloader):
            input_tensor = sample['input_tensor'].squeeze().to(device)
#             print(input_tensor.shape)
            target_tensor = sample['target_tensor'].squeeze().to(device)

            loss = train(input_tensor, target_tensor, encoder,
                         decoder, encoder_optimizer, decoder_optimizer, criterion)
            train_loss_total += loss

            if (iter + 1) % eval_every == 0:
                train_loss_average = train_loss_total / eval_every
                train_loss_total = 0
                
                # validate model
                eval_loss, bleu_score = evalIters(encoder, decoder, val_dataloader, criterion, num_samples=eval_samples)
                
                curr_iter = ep * num_samples + iter + 1
                print('%s (%d %d%%) Average train loss: %.4f, Average val loss: %.4f ,val BLEU: %.4f' % (timeSince(start, curr_iter / n_iters), curr_iter, curr_iter / n_iters * 100, train_loss_average, eval_loss, bleu_score))

#                 plot_loss_avg = plot_loss_total / plot_every
                plot_stats.append([train_loss_average, eval_loss, bleu_score])
                plot_loss_total = 0

    return createDataFrame(plot_stats)

In [15]:
hidden_size = 256
encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
decoder1 = DecoderRNN(hidden_size, target_lang.n_words).to(device)

# training for 8 epochs on CPU will take about 25-30 mins (and ~15 mins with colab GPU)
epochs = 8

losses_df = trainEvalIters(encoder1, decoder1, epochs, train_dataloader, val_dataloader, eval_every=2000, eval_samples=200, learning_rate=0.01)

1m 20s (- 43m 53s) (2000 2%) Average train loss: 3.0546, Average val loss: 4.5648 ,val BLEU: 0.0000
2m 51s (- 45m 38s) (4000 5%) Average train loss: 2.5687, Average val loss: 4.6553 ,val BLEU: 0.0587
4m 15s (- 43m 55s) (6000 8%) Average train loss: 2.3356, Average val loss: 4.6296 ,val BLEU: 0.0666
5m 38s (- 42m 10s) (8000 11%) Average train loss: 2.2239, Average val loss: 4.2498 ,val BLEU: 0.0920
7m 21s (- 40m 16s) (10479 15%) Average train loss: 2.4639, Average val loss: 4.1702 ,val BLEU: 0.1453
8m 44s (- 38m 48s) (12479 18%) Average train loss: 1.8688, Average val loss: 4.4738 ,val BLEU: 0.1102
10m 7s (- 37m 19s) (14479 21%) Average train loss: 1.7804, Average val loss: 4.3059 ,val BLEU: 0.1266
11m 30s (- 35m 52s) (16479 24%) Average train loss: 1.7793, Average val loss: 4.6640 ,val BLEU: 0.1249
13m 13s (- 34m 5s) (18958 27%) Average train loss: 1.9301, Average val loss: 4.5374 ,val BLEU: 0.1211
14m 36s (- 32m 40s) (20958 30%) Average train loss: 1.4569, Average val loss: 4.3848 ,va

In [16]:
# Plot losses
losses_df.iloc[:, :-1].plot.line()

<IPython.core.display.Javascript object>

<AxesSubplot:>

In [17]:
losses_df.iloc[:, -1].plot.line()

<IPython.core.display.Javascript object>

<AxesSubplot:>

### View translations

Here is a quick function to manually assess the quality of your translator

In [18]:
def evaluateRandomly(encoder, decoder, val_dataloader, criterion, n=10):
    for i, sample in enumerate(val_dataloader):
        if i > n: break
        input_tensor = sample['input_tensor'].squeeze()
        target_tensor = sample['target_tensor'].squeeze()
        
        print('>', ' '.join(input_lang.getWordsFromIndices(input_tensor.tolist())))
        print('=', ' '.join(target_lang.getWordsFromIndices(target_tensor.tolist())))
        
        input_tensor = input_tensor.to(device)
        target_tensor = target_tensor.to(device)
        decoded_out, _ = evaluate(input_tensor, target_tensor, encoder, decoder, criterion)
        
        output_sentence = ' '.join(decoded_out)
        print('<', output_sentence)
        print('')

In [19]:
criterion = nn.NLLLoss()
evaluateRandomly(encoder1, decoder1, val_dataloader, criterion)

> il en a marre de mes problemes . EOS
= he is fed up with my UNK . EOS
< he is in in swimming . EOS

> je suis inquiet pour ta sante . EOS
= i am anxious about your health . EOS
< i m so worried about this . EOS

> nous avons des problemes avec notre nouveau voisin . EOS
= we are having trouble with our new neighbor . EOS
< we re going shopping . EOS

> j ai du mal a me concentrer . EOS
= i m having a hard time concentrating . EOS
< i m having trouble . EOS

> il est mon UNK par UNK . EOS
= he is related to me by UNK . EOS
< he s my love by love . EOS

> je passe une UNK pour le UNK . EOS
= i m UNK for the part . EOS
< i m waiting for the job . EOS

> il s interesse aux UNK UNK . EOS
= he s interested in UNK UNK . EOS
< he s always complaining . EOS

> elle est inquiete de ta securite . EOS
= she s worried about your UNK . EOS
< she is used to be . EOS

> c est une femme UNK . EOS
= she is a self UNK woman . EOS
< she s a woman woman . EOS

> j ai hate de UNK ta UNK . EOS
= i m lookin

## Bonus add Attention

We can use the same encoder but create a modified version of `DecoderRNN` with attention applied. Implement `AttnDecoderRNN` in `model.py`. 

### Training the Seq2Seq with Attention


**TODO** modify instructions
Please **implement** the following `train` and `evaluate` functions below. The overall structure is the same as before with a few differences. During encoder processing, we also need to keep track of all encoder outputs to feed into the decoder later. During decoder loop, we will have to readjust our call to add encoder outputs and receive decoder attention weights. 

In [20]:
teacher_forcing_ratio = 0.5
MAX_LENGTH = 10

def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length=MAX_LENGTH):
    # initialize encoder hidden state
    encoder_hidden = encoder.initHidden()

    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)
    loss = 0
    
    # this time we need to keep trak of encoder outputs
    encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)
    
    # Process input input_tensor (one token at a time with a for loop)
    ## YOUR CODE STARTS HERE (~3 lines of code) ##
    # for loop header
    for ei in range(input_length):
        # call encoder with input_tensor[ei] and encoder's hidden vector
        encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
        # save encoder output in encoder_outputs
        encoder_outputs[ei] = encoder_output[0, 0]
    ## YOUR CODE ENDS HERE ##

    # Start decoder's input with <SOS> token
    decoder_input = torch.tensor([[SOS_token]], device=device)

    # initialize decoder hidden state as encoder's final hidden state
    decoder_hidden = encoder_hidden

    # decide whether to use teacher forcing 
    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    if use_teacher_forcing:
        # Teacher forcing: Feed the target as the next input
        ## YOUR CODE STARTS HERE (~4 lines of code) ##
        # for loop header to loop through target_tensor
        for di in range(target_length):
            # call decoder with decoder_input, decoder_hidden and encoder_outputs. 
            #     make sure to reset decoder_hidden with the new hidden state returned
            decoder_output, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_hidden, encoder_outputs)

            # calculate loss using decoder's output and corresponding target_tensor[di]
            #     the criterion used is a negative log likelihood loss (NLLLoss)
            loss += criterion(decoder_output, target_tensor[di])

            # (Teacher forcing) set next decoder input as target_tensor[di]
            decoder_input = target_tensor[di] 

        ## YOUR CODE ENDS HERE ##

    else:
        # Without teacher forcing: use its own predictions as the next input
        ## YOUR CODE STARTS HERE (~5-7 lines of code) ##
        # for loop header to loop through target_tensor
        for di in range(target_length):
            
            # call decoder same as above
            decoder_output, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_hidden, encoder_outputs)
            topv, topi = decoder_output.topk(1)
            decoder_input = topi.squeeze().detach()

            # calculate loss same as above
            loss += criterion(decoder_output, target_tensor[di])
            # set next decoder input as argmax of decoder's output

            # if new decoder_input is EOS_token: break
            if decoder_input.item() == EOS_token:
                break

        ## YOUR CODE ENDS HERE ##

    loss.backward()

    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length

Next, please **implement** the `evaluate` function in the next cell. The process will be the same as `train`, except you will not use teacher forcing when stepping through the decoder. 

In [21]:
# evaluate
def evaluate(input_tensor, target_tensor, encoder, decoder, criterion, max_length=MAX_LENGTH):
    with torch.no_grad():
        input_length = input_tensor.size()[0]
        target_length = target_tensor.size()[0]
        encoder_hidden = encoder.initHidden()
        loss = 0
        
        # this time we need to keep trak of encoder outputs
        encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)
        
        # Process input input_tensor (one token at a time with a for loop)
        ## YOUR CODE STARTS HERE (~3 lines of code) ##
        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei],
                                                     encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]

        ## YOUR CODE ENDS HERE ##

        # Start decoder's input with <SOS> token
        decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS

        # initialize decoder hidden state as encoder's final hidden state
        decoder_hidden = encoder_hidden

        decoded_indices = []
        decoded_words = []
        # keep track of decoder attention for analysis later
        decoder_attentions = torch.zeros(max_length, max_length)

        # Run decoder starting with encoder's context (decoder_hidden)
        #     without using teacher forcing: use its own predictions as the next input
        ## YOUR CODE STARTS HERE (~5-7 lines of code) ##
        # for loop header to loop through target_tensor
        for di in range(target_length):

            # call decoder
            decoder_output, decoder_hidden, decoder_attention = decoder(decoder_input, decoder_hidden, encoder_outputs)
            # save decoder attention in decoder_attentions
            decoder_attentions[di] = decoder_attention.data

            # calculate loss same as above
            loss += criterion(decoder_output, target_tensor[di])
            
            # set next decoder input as argmax of decoder's output
            topv, topi = decoder_output.data.topk(1)
            decoder_input = topi.squeeze().detach()
            # append outputted index to decoded_indices

            # if new decoder_input is EOS_token: break
            if topi.item() == EOS_token:
                decoded_indices.append(EOS_token)
                break
            else:
                decoded_indices.append(topi.item())

        ## YOUR CODE ENDS HERE ##
        decoded_words = target_lang.getWordsFromIndices(decoded_indices)

        return decoded_words, loss.item() / len(decoded_words), decoder_attentions[:di + 1]


Train and eval loops are implemented for you

In [22]:
def createDataFrame(points):
    df = pd.DataFrame(points, columns=['train_loss', 'val_loss', 'blue_scores_val'])
    return df

def evalIters(encoder, decoder, val_dataloader, criterion, num_samples=100):
    if num_samples > len(val_dataloader):
        num_samples = len(val_dataloader)
    
    candidates = [] # predicted output
    references = [] # true targets
    
    total_loss = 0
    for iter, sample in enumerate(val_dataloader):
        if iter > num_samples: break

        input_tensor = sample['input_tensor'].squeeze().to(device)
        target_tensor = sample['target_tensor'].squeeze().to(device)
        
        decoded_out, loss, _ = evaluate(input_tensor, target_tensor, encoder, decoder, criterion)
        candidates.append(decoded_out)
        target_indices = sample['target_tensor'].squeeze().tolist()
        target_words = target_lang.getWordsFromIndices(target_indices)
        references.append([target_words])
        total_loss += loss
    
    return total_loss / num_samples, bleu_score(candidate_corpus=candidates, 
                                                references_corpus=references,
                                                max_n=4)

def trainEvalIters(encoder, decoder, epochs, train_dataloader, val_dataloader, 
                   eval_every=2000, eval_samples=200, learning_rate=0.01):
    start = time.time()
    plot_stats = []
    train_loss_total = 0  # Reset every eval_every

    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)

    criterion = nn.NLLLoss()
    num_samples = len(train_dataloader)
    n_iters = epochs * num_samples
    
    for ep in range(epochs):
        for iter, sample in enumerate(train_dataloader):
            input_tensor = sample['input_tensor'].squeeze().to(device)
#             print(input_tensor.shape)
            target_tensor = sample['target_tensor'].squeeze().to(device)

            loss = train(input_tensor, target_tensor, encoder,
                         decoder, encoder_optimizer, decoder_optimizer, criterion)
            train_loss_total += loss

            if (iter + 1) % eval_every == 0:
                train_loss_average = train_loss_total / eval_every
                train_loss_total = 0
                
                # validate model
                eval_loss, bleu_score = evalIters(encoder, decoder, val_dataloader, criterion, num_samples=eval_samples)
                
                curr_iter = ep * num_samples + iter + 1
                print('%s (%d %d%%) Average train loss: %.4f, Average val loss: %.4f ,val BLEU: %.4f' % (timeSince(start, curr_iter / n_iters), curr_iter, curr_iter / n_iters * 100, train_loss_average, eval_loss, bleu_score))

#                 plot_loss_avg = plot_loss_total / plot_every
                plot_stats.append([train_loss_average, eval_loss, bleu_score])
                plot_loss_total = 0

    return createDataFrame(plot_stats)

In [23]:
hidden_size = 256
encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
decoder1 = AttnDecoderRNN(hidden_size, target_lang.n_words, dropout_p=0.1).to(device)

epochs = 8

losses_df = trainEvalIters(encoder1, decoder1, epochs, train_dataloader, val_dataloader, eval_every=2000, eval_samples=200, learning_rate=0.01)

1m 34s (- 51m 57s) (2000 2%) Average train loss: 3.0099, Average val loss: 4.6913 ,val BLEU: 0.0320
3m 10s (- 50m 42s) (4000 5%) Average train loss: 2.4381, Average val loss: 4.1611 ,val BLEU: 0.0759
4m 47s (- 49m 23s) (6000 8%) Average train loss: 2.2985, Average val loss: 4.4519 ,val BLEU: 0.0643
17m 20s (- 129m 41s) (8000 11%) Average train loss: 2.1441, Average val loss: 4.2163 ,val BLEU: 0.0976
19m 33s (- 107m 5s) (10479 15%) Average train loss: 2.4027, Average val loss: 4.0920 ,val BLEU: 0.1053
21m 24s (- 94m 55s) (12479 18%) Average train loss: 1.8153, Average val loss: 4.2871 ,val BLEU: 0.1281
23m 13s (- 85m 36s) (14479 21%) Average train loss: 1.7398, Average val loss: 4.3262 ,val BLEU: 0.1044
25m 3s (- 78m 4s) (16479 24%) Average train loss: 1.6786, Average val loss: 4.1727 ,val BLEU: 0.1307
27m 18s (- 70m 24s) (18958 27%) Average train loss: 1.8707, Average val loss: 4.0871 ,val BLEU: 0.1464
29m 8s (- 65m 9s) (20958 30%) Average train loss: 1.3934, Average val loss: 4.3126 ,

In [24]:
# Plot losses
losses_df.iloc[:, :-1].plot.line()

<IPython.core.display.Javascript object>

<AxesSubplot:>

In [25]:
losses_df.iloc[:, -1].plot.line()

<IPython.core.display.Javascript object>

<AxesSubplot:>

## Visualize Attention

In [26]:
import matplotlib.ticker as ticker
def showAttention(input_words, output_words, attentions):
    # Set up figure with colorbar
    fig = plt.figure()
    ax = fig.add_subplot(111)
    cax = ax.matshow(attentions.numpy(), cmap='bone')
    fig.colorbar(cax)

    # Set up axes
    ax.set_xticklabels([''] + input_words, rotation=90)
    ax.set_yticklabels([''] + output_words)

    # Show label at every tick
    ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
    ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

    plt.show()

def evaluateRandomlyAndShowAttention(encoder, decoder, val_dataloader, criterion, n = 10):

    for i, sample in enumerate(val_dataloader):
        if i > n: break
        input_words = input_lang.getWordsFromIndices(sample['input_tensor'].squeeze().tolist())
        input_tensor = sample['input_tensor'].squeeze().to(device)
        target_tensor = sample['target_tensor'].squeeze().to(device)
        output_words, _, attention = evaluate(input_tensor, target_tensor, encoder, decoder, criterion)
        target_words = target_lang.getWordsFromIndices(sample['target_tensor'].squeeze().tolist())
        print('input : ', ' '.join(input_words))
        print('target: ',' '.join(target_words))
        print('predicted :', ' '.join(output_words))
        showAttention(input_words, output_words, attention)

criterion = nn.NLLLoss()
evaluateRandomlyAndShowAttention(encoder1, decoder1, val_dataloader, criterion)

input :  il craint de commettre des erreurs . EOS
target:  he s afraid of making mistakes . EOS
predicted : he is making to trouble . EOS


<IPython.core.display.Javascript object>

input :  il est plus grand que son frere . EOS
target:  he is taller than his brother . EOS
predicted : he is taller than her brother . EOS


  ax.set_xticklabels([''] + input_words, rotation=90)
  ax.set_yticklabels([''] + output_words)


<IPython.core.display.Javascript object>

input :  vous etes chanceuse d avoir un travail . EOS
target:  you re lucky that you have a job . EOS
predicted : you re fortunate to a job . EOS


<IPython.core.display.Javascript object>

input :  je suis certain qu il a tort . EOS
target:  i am UNK that he is wrong . EOS
predicted : i m sure he was wrong . EOS


<IPython.core.display.Javascript object>

input :  ils courent dans le parc . EOS
target:  they are running in the park . EOS
predicted : they re in the good . EOS


<IPython.core.display.Javascript object>

input :  tu es difficile et incorrigible . EOS
target:  you are UNK and incorrigible . EOS
predicted : you are cruel and . EOS


<IPython.core.display.Javascript object>

input :  je n ai plus peur de vous . EOS
target:  i m not scared of you anymore . EOS
predicted : i m no at you . EOS


<IPython.core.display.Javascript object>

input :  je suis convaincu de ton UNK . EOS
target:  i am convinced of your UNK . EOS
predicted : i m proud of your enemy . EOS


<IPython.core.display.Javascript object>

input :  c est l homme parfait pour toi . EOS
target:  he s the perfect man for you . EOS
predicted : you re on the master . EOS


<IPython.core.display.Javascript object>

input :  il est notre professeur d anglais . EOS
target:  he is our teacher of english . EOS
predicted : he is an teacher of english . EOS


<IPython.core.display.Javascript object>

input :  nous avons des invites ce soir . EOS
target:  we re having some guests over this evening . EOS
predicted : we re having together . EOS


<IPython.core.display.Javascript object>