# Using Attention for Neural Machine Translation
In this notebook we are going to perform machine translation using a deep learning based approach and attention mechanism.

Specifically, we are going to train a sequence to sequence model for Spanish to English translation.  We will use Sequence to Sequence Models for this Assignment. In this assignment you only need tto implement the encoder and decoder, we implement all the data loading for you.Please **refer** to the following resources for more details:

1.   https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf
2.   https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html
3. https://arxiv.org/pdf/1409.0473.pdf



In [1]:
import torch
import torch.nn.functional as F
import torch.nn as nn
import torch.optim as optim
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
import unicodedata
import re
import time
import nltk
from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.bleu_score import SmoothingFunction
print(torch.__version__)

1.3.1


# Download The Data

Here we will download the translation data. We will learn a model to translate Spanish to English.

In [2]:
from google.colab import drive
drive.mount('/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /gdrive


In [3]:
cd sample_data/

/content/sample_data


In [4]:
!wget http://www.manythings.org/anki/spa-eng.zip

--2019-11-25 22:12:43--  http://www.manythings.org/anki/spa-eng.zip
Resolving www.manythings.org (www.manythings.org)... 104.24.109.196, 104.24.108.196, 2606:4700:30::6818:6cc4, ...
Connecting to www.manythings.org (www.manythings.org)|104.24.109.196|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4752884 (4.5M) [application/zip]
Saving to: ‘spa-eng.zip’


2019-11-25 22:12:46 (2.66 MB/s) - ‘spa-eng.zip’ saved [4752884/4752884]



In [5]:
!unzip spa-eng.zip

Archive:  spa-eng.zip
  inflating: _about.txt              
  inflating: spa.txt                 


In [6]:
f = open('spa.txt', encoding='UTF-8').read().strip().split('\n')
lines = f
total_num_examples = 30000 
original_word_pairs = [[w for w in l.split('\t')][:2] for l in lines[:total_num_examples]]
data = pd.DataFrame(original_word_pairs, columns=["eng", "es"])
data # visualizing the data

Unnamed: 0,eng,es
0,Go.,Ve.
1,Go.,Vete.
2,Go.,Vaya.
3,Go.,Váyase.
4,Hi.,Hola.
...,...,...
29995,Stop blaming yourself.,Deja de culparte a ti mismo.
29996,Summer has just begun.,El verano acaba de comenzar.
29997,Tadpoles become frogs.,Los renacuajos se convierten en ranas.
29998,Take a walk every day.,Da un paseo cada día.


In [0]:
# Converts the unicode file to ascii
def unicode_to_ascii(s):
    """
    Normalizes latin chars with accent to their canonical decomposition
    """
    return ''.join(c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn')

# Preprocessing the sentence to add the start, end tokens and make them lower-case
def preprocess_sentence(w):
    w = unicode_to_ascii(w.lower().strip())
    w = re.sub(r"([?.!,¿])", r" \1 ", w)
    w = re.sub(r'[" "]+', " ", w)

    w = re.sub(r"[^a-zA-Z?.!,¿]+", " ", w)
    
    w = w.rstrip().strip()
    w = '<start> ' + w + ' <end>'
    return w

In [8]:
# Now we do the preprocessing using pandas and lambdas
# Make sure YOU only run this once - if you run it twice it will mess up the data so you will have run the few above cells again
data["eng"] = data.eng.apply(lambda w: preprocess_sentence(w))
data["es"] = data.es.apply(lambda w: preprocess_sentence(w))
data[250:260]

Unnamed: 0,eng,es
250,<start> be brief . <end>,<start> se breve . <end>
251,<start> be brief . <end>,<start> sea breve . <end>
252,<start> be brief . <end>,<start> sean breves . <end>
253,<start> be quiet . <end>,<start> estate quieto . <end>
254,<start> be still . <end>,<start> no te muevas . <end>
255,<start> call tom . <end>,<start> llamalo a tomas ! <end>
256,<start> call tom . <end>,<start> llamalo a tomas ! <end>
257,<start> call tom . <end>,<start> llamenlo a tomas ! <end>
258,<start> cheer up ! <end>,<start> animate . <end>
259,<start> cheer up . <end>,<start> venga . <end>


# Vocabulary Class

We create a class here for managing our vocabulary as we did in MP2. In this MP, we have a separate class for the vocabulary as we need 2 different vocabularies - one for English and one for Spanish.

In [0]:
class Vocab_Lang():
    def __init__(self, data):
        """ data is the list of all sentences in the language dataset"""
        self.data = data
        self.word2idx = {}
        self.idx2word = {}
        self.vocab = set()
        
        self.create_index()
        
    def create_index(self):
        for sentence in self.data:
            # update with individual tokens
            self.vocab.update(sentence.split(' '))

        # add a padding token
        self.word2idx['<pad>'] = 0
        
        # word to index mapping
        for index, word in enumerate(self.vocab):
            self.word2idx[word] = index + 1 # +1 because of pad token
        
        # index to word mapping
        for word, index in self.word2idx.items():
            self.idx2word[index] = word 

In [0]:
# index language using the class above
inp_lang = Vocab_Lang(data["es"].values.tolist())
targ_lang = Vocab_Lang(data["eng"].values.tolist())
# Vectorize the input and target languages
input_tensor = [[inp_lang.word2idx[s] for s in es.split(' ')]  for es in data["es"].values.tolist()]
target_tensor = [[targ_lang.word2idx[s] for s in eng.split(' ')]  for eng in data["eng"].values.tolist()]

In [0]:
def max_length(tensor):
    return max(len(t) for t in tensor)

In [0]:
# calculate the max_length of input and output tensor for padding
max_length_inp, max_length_tar = max_length(input_tensor), max_length(target_tensor)

In [0]:
def pad_sequences(x, max_len):
    padded = np.zeros((max_len), dtype=np.int64)
    if len(x) > max_len: padded[:] = x[:max_len]
    else: padded[:len(x)] = x
    return padded

# pad all the sentences in the dataset with the max_length
input_tensor = [pad_sequences(x, max_length_inp) for x in input_tensor]
target_tensor = [pad_sequences(x, max_length_tar) for x in target_tensor]

In [0]:
# Creating training and test/val sets using an 80-20 split
input_tensor_train, input_tensor_val, target_tensor_train, target_tensor_val = input_tensor[:24000], input_tensor[24000:], target_tensor[:24000], target_tensor[24000:]

assert(len(input_tensor_train)==24000)
assert(len(target_tensor_train)==24000)
assert(len(input_tensor_val)==6000)
assert(len(target_tensor_val)==6000)

# Dataloader for our Encoder and Decoder

We prepare the dataloader and make sure the dataloader returns the source sentence, target sentence and the length of the source sentenc sampled from the training dataset.

In [0]:
# conver the data to tensors and pass to the Dataloader 
# to create an batch iterator
from torch.utils.data import Dataset, DataLoader
class MyData(Dataset):
    def __init__(self, X, y):
        self.data = X
        self.target = y
        # TODO: convert this into torch code is possible
        self.length = [ np.sum(1 - np.equal(x, 0)) for x in X]
        
    def __getitem__(self, index):
        x = self.data[index]
        y = self.target[index]
        x_len = self.length[index]
        return x,y,x_len
    
    def __len__(self):
        return len(self.data)

In [0]:
BUFFER_SIZE = len(input_tensor_train)
BATCH_SIZE = 8
N_BATCH = BUFFER_SIZE//BATCH_SIZE
embedding_dim = 256
units = 1024
vocab_inp_size = len(inp_lang.word2idx)
vocab_tar_size = len(targ_lang.word2idx)

train_dataset = MyData(input_tensor_train, target_tensor_train)
val_dataset = MyData(input_tensor_val, target_tensor_val)

dataset = DataLoader(train_dataset, batch_size = BATCH_SIZE, 
                     drop_last=True,
                     shuffle=True)

val_dataset = DataLoader(val_dataset, batch_size = BATCH_SIZE, 
                     drop_last=True,
                     shuffle=False)

# Encoder Model

First we build a simple encoder model, which will be very similar to what you did in MP2. But instead of using a fully connected layer as the output, you should the return the output of your recurrent net (GRU/LSTM) as well as the hidden output. They are used in the decoder later.


In [0]:
## Feel free to change any parameters class definitions as long as you can change the training code, but make sure
## evaluation should get the tensor format it expects
class Encoder(nn.Module):
    def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
        super(Encoder, self).__init__()
        ### TO - DO
        self.hidden_size = enc_units
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = 0)
        self.lstm = nn.LSTM(input_size=embedding_dim,
                             hidden_size=self.hidden_size, bidirectional = True)
        
    def forward(self, x, lens):
        '''
        Pseudo-code
        - Pass x through an embedding layer
        - Make sure x is correctly packed before the recurrent net 
        - Pass it through the recurrent net
        - Make sure the output is unpacked correctly
        - return output and hidden states from the recurrent net
        - Feel free to play around with dimensions - the training loop should help you determine the dimensions
        '''
        ### TO - DO
        x = x.long().cuda()
        embedded = self.embedding(x)
        hidden = None
        pack_embedded = nn.utils.rnn.pack_padded_sequence(embedded, lens, enforce_sorted=False)
        padded_output,(hidden, cell_state)= self.lstm(pack_embedded, hidden)
        padded_output, _ = nn.utils.rnn.pad_packed_sequence(padded_output)
        output = padded_output.transpose(0,1)
        
        hidden_cat = torch.cat((hidden[0], hidden[1]), dim = 1)
        return output, hidden_cat


# Decoder Model
We will implement a Decoder model which uses an attention mechanism. We will implement the decoder as provided in https://arxiv.org/pdf/1409.0473.pdf. **Please read** the links provided above first, at the start of this assignment for review. The pseudo-code for your implementation should be somewhat as follows:



1.   The input is put through an encoder model which gives us the encoder output of shape *(batch_size, max_length, hidden_size)* and the encoder hidden state of shape *(batch_size, hidden_size)*. 
2.   Using the output your encoder you will calculate the score and subsequently the attention using following equations : 
<img src="https://www.tensorflow.org/images/seq2seq/attention_equation_0.jpg" alt="attention equation 0" width="800">
<img src="https://www.tensorflow.org/images/seq2seq/attention_equation_1.jpg" alt="attention equation 1" width="800">

3. Once you have calculated this attention vector, you pass the original input x through a embedding layer. The output of this embedding layer is concatenated with the attention vector which is passed into a GRU.

4. Finally you pass the output of the GRU into a fully connected layer with an output size same as that vocab, to see the probability of the most possible word.




In [0]:
class Decoder(nn.Module):
  def __init__(self, vocab_size, embedding_dim, dec_units, enc_units, batch_sz):
    super(Decoder, self).__init__()
    self.hidden_size = enc_units * 2
    self.output_size = vocab_size
    self.embedding_dim = embedding_dim

    self.embedding = nn.Embedding(self.output_size, self.embedding_dim)
    
    self.fc_hidden = nn.Linear(self.hidden_size, self.hidden_size, bias=False)
    self.fc_encoder = nn.Linear(self.hidden_size, self.hidden_size, bias=False)
    self.weight = nn.Linear(self.hidden_size, 1)
    self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
    self.gru = nn.GRU(self.hidden_size + self.embedding_dim, self.hidden_size, batch_first=True)
    self.classifier = nn.Linear(self.hidden_size, self.output_size)

  def forward(self, inputs, hidden, enc_output):
    '''
        Pseudo-code
        - Calculate the score using the formula shown above using encoder output and hidden output. 
        Note h_t is the hidden output of the decoder and h_s is the encoder output in the formula
        - Calculate the attention weights using softmax and 
        passing through V - which can be implemented as a fully connected layer
        - Finally find c_t which is a context vector where the shape of context_vector should be (batch_size, hidden_size)
        - You need to unsqueeze the context_vector for concatenating with x aas listed in Point 3 above
        - Pass this concatenated tensor to the GRU and follow as specified in Point 4 above

        Returns :
        output - shape = (batch_size, vocab)
        hidden state - shape = (batch_size, hidden size)
    '''
    enc_output = enc_output.squeeze()
    hidden = hidden.squeeze(0)  ### [batch_size, hidden_size] 
    # Embed input words
    embedded = self.embedding(inputs)
    # print("embedded", embedded.shape)
    
    # Calculating Attention Scores
    x = torch.tanh(self.fc_hidden(hidden.unsqueeze(1))+self.fc_encoder(enc_output)) ### [batch_size, max_length, hidden_size]
    # print("x", x.shape)
    alignment_scores = self.weight(x) ### [batch_size, max_length,1]
    # print("alignment_scores", alignment_scores.shape)
    
    # Calculating Attention weights
    attn_weights = F.softmax(alignment_scores, dim=1).squeeze(2).unsqueeze(1)  ### [batch_size, 1, max_length]
    # print("attn_weights", attn_weights.shape)
    
    # Calculating the context vector
    context_vector = torch.bmm(attn_weights, enc_output)  ### [batch_size, 1, hidden_size]
    # print("context_vector", context_vector.shape)
    
    # Concatenating context vector with embedded input word
    output = torch.cat((embedded, context_vector), dim = 2)   ### [batch_size，1, hidden_size + embedding_size]
    # print("output", output.shape)
    hidden = hidden.unsqueeze(0)  ### [1, batch_size, hidden_size] 
    # print("hidden", hidden.shape)
    # Through the GRU cell
    output, hidden = self.gru(output, hidden) ### [batch_size, 1, hidden_size]
    # Through a classifier
    output = F.log_softmax(self.classifier(hidden.squeeze(0)), dim=1)
    # print("output", output.shape)
    return output, hidden, attn_weights

In [0]:
### sort batch function to be able to use with pad_packed_sequence
def sort_batch(X, y, lengths):
    lengths, indx = lengths.sort(dim=0, descending=True)
    X = X[indx]
    y = y[indx]
    return X.transpose(0,1), y, lengths # transpose (batch x seq) to (seq x batch)

In [0]:
criterion = nn.CrossEntropyLoss()

def loss_function(real, pred):
    """ Only consider non-zero inputs in the loss; mask needed """
    #mask = 1 - np.equal(real, 0) # assign 0 to all above 0 and 1 to all 0s
    #print(mask)
    mask = real.ge(1).type(torch.cuda.FloatTensor)
    
    loss_ = criterion(pred, real) * mask 
    return torch.mean(loss_)

In [0]:
# Device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")


## Feel free to change any parameters class definitions as long as you can change the training code, but make sure
## evaluation should get the tensor format it expects, this is only for reference
encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)
decoder = Decoder(vocab_tar_size, embedding_dim, units, units, BATCH_SIZE)

encoder.to(device)
decoder.to(device)

optimizer = optim.SGD(list(encoder.parameters()) + list(decoder.parameters()), 
                       lr=0.1)

# Train your model

You will train your model here.
*   Pass the source sentence and their corresponding lengths into the encoder
*   Creating the decoder input using <start> tokens
*   Now we find out the decoder outputs conditioned on the previous predicted word usually, but in our training we use teacher forcing. Read more about teacher forcing at https://machinelearningmastery.com/teacher-forcing-for-recurrent-neural-networks/



In [22]:
EPOCHS = 1

for epoch in range(EPOCHS):
    start = time.time()
    
    encoder.train()
    decoder.train()
    
    total_loss = 0
    
    for (batch, (inp, targ, inp_len)) in enumerate(dataset):
        loss = 0
        
        xs, ys, lens = sort_batch(inp, targ, inp_len)
        enc_output, enc_hidden = encoder(xs.to(device), lens.to(device))
        dec_hidden = enc_hidden
        
        # use teacher forcing - feeding the target as the next input (via dec_input)
        dec_input = torch.tensor([[targ_lang.word2idx['<start>']]] * BATCH_SIZE)
        
        # run code below for every timestep in the ys batch
        for t in range(1, ys.size(1)):
            predictions, dec_hidden, _ = decoder(dec_input.to(device), 
                                         dec_hidden.to(device), 
                                         enc_output.to(device))
            loss += loss_function(ys[:, t].to(device), predictions.to(device))
            #loss += loss_
            dec_input = ys[:, t].unsqueeze(1)
            
        
        batch_loss = (loss / int(ys.size(1)))
        total_loss += batch_loss
        
        optimizer.zero_grad()
        
        loss.backward()

        ### UPDATE MODEL PARAMETERS
        optimizer.step()
        
        if batch % 100 == 0:
            print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1,
                                                         batch,
                                                         batch_loss.detach().item()))
        
        
    ### TODO: Save checkpoint for model
    print('Epoch {} Loss {:.4f}'.format(epoch + 1,
                                        total_loss / N_BATCH))
    print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))
    torch.save(encoder.state_dict(),'./encoder.pth')
    torch.save(decoder.state_dict(),'./decoder.pth')

Epoch 1 Batch 0 Loss 5.0056
Epoch 1 Batch 100 Loss 2.2636
Epoch 1 Batch 200 Loss 1.9826
Epoch 1 Batch 300 Loss 1.6793
Epoch 1 Batch 400 Loss 1.4052
Epoch 1 Batch 500 Loss 1.1084
Epoch 1 Batch 600 Loss 1.4587
Epoch 1 Batch 700 Loss 1.0960
Epoch 1 Batch 800 Loss 1.5162
Epoch 1 Batch 900 Loss 1.2819
Epoch 1 Batch 1000 Loss 1.0916
Epoch 1 Batch 1100 Loss 1.6456
Epoch 1 Batch 1200 Loss 0.9997
Epoch 1 Batch 1300 Loss 1.3759
Epoch 1 Batch 1400 Loss 1.3040
Epoch 1 Batch 1500 Loss 1.2797
Epoch 1 Batch 1600 Loss 1.2738
Epoch 1 Batch 1700 Loss 0.9114
Epoch 1 Batch 1800 Loss 1.1159
Epoch 1 Batch 1900 Loss 0.9139
Epoch 1 Batch 2000 Loss 1.1578
Epoch 1 Batch 2100 Loss 1.1039
Epoch 1 Batch 2200 Loss 0.7620
Epoch 1 Batch 2300 Loss 0.7778
Epoch 1 Batch 2400 Loss 0.8835
Epoch 1 Batch 2500 Loss 1.1731
Epoch 1 Batch 2600 Loss 0.9125
Epoch 1 Batch 2700 Loss 0.8573
Epoch 1 Batch 2800 Loss 1.2353
Epoch 1 Batch 2900 Loss 0.7879
Epoch 1 Loss 1.2840
Time taken for 1 epoch 155.1126148700714 sec



In [23]:
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

E: Package 'python-software-properties' has no installation candidate
Selecting previously unselected package google-drive-ocamlfuse.
(Reading database ... 145605 files and directories currently installed.)
Preparing to unpack .../google-drive-ocamlfuse_0.7.14-0ubuntu1~ubuntu18.04.1_amd64.deb ...
Unpacking google-drive-ocamlfuse (0.7.14-0ubuntu1~ubuntu18.04.1) ...
Setting up google-drive-ocamlfuse (0.7.14-0ubuntu1~ubuntu18.04.1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force
··········
Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope

In [0]:
!mkdir -p drive
!google-drive-ocamlfuse drive

In [0]:
import os
os.chdir("drive/HW4")

In [0]:
torch.save(encoder.state_dict(),'./encoder.pth')
torch.save(decoder.state_dict(),'./decoder.pth')

In [0]:
# encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)
# decoder = Decoder(vocab_tar_size, embedding_dim, units, units, BATCH_SIZE)
import sys
sys.path.insert(0, './')
encoder.load_state_dict(torch.load(saving_path+'encoder.pth'))
decoder.load_state_dict(torch.load(saving_path+'decoder.pth'))

FileNotFoundError: ignored

# Evaluation


*   We evaluate on the test set.
*   In this evaluation, instead of using the concept of teacher forcing, we use the prediction of the decoder as the input to the decoder for the sequence of outputs.



In [27]:
start = time.time()

encoder.eval()
decoder.eval()

total_loss = 0

final_output = torch.zeros((len(target_tensor_val),max_length_tar))
target_output = torch.zeros((len(target_tensor_val),max_length_tar))

for (batch, (inp, targ, inp_len)) in enumerate(val_dataset):
    loss = 0
    xs, ys, lens = sort_batch(inp, targ, inp_len)
    enc_output, enc_hidden = encoder(xs.to(device), lens.to(device))
    dec_hidden = enc_hidden
    
    dec_input = torch.tensor([[targ_lang.word2idx['<start>']]] * BATCH_SIZE)
    curr_output = torch.zeros((ys.size(0), ys.size(1)))
    curr_output[:, 0] = dec_input.squeeze(1)

    for t in range(1, ys.size(1)): # run code below for every timestep in the ys batch
        predictions, dec_hidden, _ = decoder(dec_input.to(device), 
                                      dec_hidden.to(device), 
                                      enc_output.to(device))
        loss += loss_function(ys[:, t].to(device), predictions.to(device))
        dec_input = torch.argmax(predictions, dim=1).unsqueeze(1)
        curr_output[:, t] = dec_input.squeeze(1)
    final_output[batch*BATCH_SIZE:(batch+1)*BATCH_SIZE] = curr_output
    target_output[batch*BATCH_SIZE:(batch+1)*BATCH_SIZE] = targ
    batch_loss = (loss / int(ys.size(1)))
    total_loss += batch_loss

print('Epoch {} Loss {:.4f}'.format(epoch + 1,
                                    total_loss / N_BATCH))
print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Epoch 1 Loss 0.9447
Time taken for 1 epoch 16.613391399383545 sec



# Bleu Score Calculation for evaluation

Read more about Bleu Score at :


1.   https://en.wikipedia.org/wiki/BLEU
2.   https://www.aclweb.org/anthology/P02-1040.pdf

We expect your BLEU Scores to be in the range of for full credit. No partial credit :( 


*   BLEU-1 > 0.14
*   BLEU-2 > 0.08
*   BLEU-3 > 0.02
*   BLEU-4 > 0.15





In [0]:
def get_reference_candidate(target, pred):
  reference = list(target)
  reference = [targ_lang.idx2word[s] for s in np.array(reference[1:])]
  candidate = list(pred)
  candidate = [targ_lang.idx2word[s] for s in np.array(candidate[1:])]
  return reference, candidate

In [30]:
bleu_1 = 0.0
bleu_2 = 0.0
bleu_3 = 0.0
bleu_4 = 0.0
smoother = SmoothingFunction()
save_reference = []
save_candidate = []

for i in range(len(target_tensor_val)):
  reference, candidate = get_reference_candidate(target_output[i], final_output[i])
  #print(reference)
  #print(candidate)
  save_reference.append(reference)
  save_candidate.append(candidate)

  bleu_1 += sentence_bleu(reference, candidate, weights=(1, 0, 0, 0), smoothing_function=smoother.method1)
  bleu_2 += sentence_bleu(reference, candidate, weights=(0, 1, 0, 0), smoothing_function=smoother.method2)
  bleu_3 += sentence_bleu(reference, candidate, weights=(0, 0, 1, 0), smoothing_function=smoother.method3)
  bleu_4 += sentence_bleu(reference, candidate, weights=(0, 0, 0, 1), smoothing_function=smoother.method4)

print('Individual 1-gram: %f' % (bleu_1/len(target_tensor_val)))
print('Individual 2-gram: %f' % (bleu_2/len(target_tensor_val)))
print('Individual 3-gram: %f' % (bleu_3/len(target_tensor_val)))
print('Individual 4-gram: %f' % (bleu_4/len(target_tensor_val)))
assert(len(save_reference)==len(target_tensor_val))

Individual 1-gram: 0.158788
Individual 2-gram: 0.098452
Individual 3-gram: 0.030766
Individual 4-gram: 0.189989


# Save File for Submission
You just need to submit your **results.pickle** file to the autograder.

In [31]:
# import pickle
# from google.colab import drive
# drive.mount('/content/drive')

Mounted at /content/drive


In [0]:
with open('./results.pickle', 'wb') as fil:
    pickle.dump(save_candidate, fil)