# Homework 4: Sequence to Sequence Modeling

The aim of this homework is to familiarize you with sequence-to-sequence language modeling, specifically using an encoder-decoder model. This is the coding portion; you also have a written portion. The written portion can be found the homework instructions, i.e. the pdf you download from the syllabus website. In this notebook, you are provided with pre-written code for a simple sequence-to-sequence model that already works and learns how to reverse short sequences of numbers.

If you run this whole jupyter notebook, it will learn to reverse short sequences of numbers. Although much of this code you will not be modifying, we recommend reading through it to get a sense of how the model and training works.

This starter code is based on [this tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html) by Sean Robertson from the PyTorch website. It has been modified by Katy Ilonka Gero for COMS W4705 taught at Columbia University by Professor Kathy McKeown. 

### Overview

Your assignment is to:

1. modify this code to run with the E2E restaurant data set (provided)
2. train a model on this dataset on a GPU
3. implement beam search for evaluation
4. implement a BLEU evaluator and report BLEU scores

These do not need to be done in this order.

You must submit:

1. This jupyter notebook, with your solutions to the above assignments. (Note cells that require specific outputs. Do not clear outputs before submitting.)
2. A saved model (encoder and decoder) that takes in a meaning representation and generates a restaurant description.

Write all your code **in this jupyter notebook**. Cells are provided where you should be implementing your code. See homework instructions for further details on how to submit this homework.

### 1. Modify to work with E2E Dataset

You will be working with the end-to-end (E2E) challenge dataset. More information can be found on [their website](http://www.macs.hw.ac.uk/InteractionLab/E2E/). In this dataset, the inputs are restaurant meaning representations, which are a series of key-value pairs that encode information about a restaurant. The outputs are fluent sentences that describe the restaurant. Here is an example:

*Input: Meaning Representation*

```
name[The Eagle],
eatType[coffee shop],
food[French],
priceRange[moderate],
customerRating[3/5],
area[riverside],
kidsFriendly[yes],
near[Burger King]
```

*Output: Fluent Sentence*

```
The three star coffee shop, The Eagle, gives families a mid-priced dining experience featuring a variety of wines and cheeses. Find The Eagle near Burger King.
```

You will need to read in and process the training and development data. This data is provided in csv format. Here is an example line from the `trainset.csv` file:

```
"name[Browns Cambridge], eatType[coffee shop], food[English], customer rating[5 out of 5], area[riverside], familyFriendly[no], near[Crowne Plaza Hotel]","Browns Cambridge, a 5 out of 5 English coffee shop, is not kid friendly. It is located near Crowne Plaza Hotel and riverside."
```

You will need to tokenize the input and output. The input should be tokenized such that each token is a single entry from the meaning representation. You can decide how to tokenize the output. Here is how the input should be tokenize, and a simple tokenization for the output:

*Input:*

```
['name[Browns Cambridge]', 'eatType[coffee shop]', 'food[English]', 'customer rating[5 out of 5]', 'area[riverside]', 'familyFriendly[no]', 'near[Crowne Plaza Hotel]']
```

*Output:*

```
['<SOS>', 'Browns', 'Cambridge,', 'a', '5', 'out', 'of', '5', 'English', 'coffee', 'shop,', 'is', 'not', 'kid', 'friendly.', 'It', 'is', 'located', 'near', 'Crowne', 'Plaza', 'Hotel', 'and', 'riverside.', '<EOS>']
```

Be sure to note the `<SOS>` (start-of-sequence) and `<EOS>` (end-of-sequence) tokens in the output. This is important and necessary! The decoder requires start and end tokens; the start token gives it an initial input to start generating, and the end token lets you know when to stop.

Your first goal is to load in this data with [torchtext](https://torchtext.readthedocs.io/en/latest/index.html), a library used to manage text datasets in pytorch. *You do not need to change anything in the model or training or evaluation.* All you need to do is load in the data similar to how the number-reversal data is loaded in.

### 2. Train a model on this data

To train a model on this data in a reasonable period of time, you will need to run this notebook on the Google Cloud VM with a GPU. [This tutorial](https://towardsdatascience.com/running-jupyter-notebook-in-google-cloud-platform-in-15-min-61e16da34d52) gives a good explanation of how to use jupyter notebooks from a Google Cloud VM. It can take time to correctly get set up on a GPU, so don't leave this to the last minute. 

However, we *do* recommend testing your code by loading in a small amount of data (say, 5 examples,) and training on these. This should train quickly even without a GPU and the model should be able to almost memorize the data. This is generally good practice with generative networks -- ensure your model will memorize a small amount of data.

With the full e2e dataset on a GPU, it should take around 20 minutes to train a single epoch. You should see decent results after a single epoch. Decent results are sentences with a few mistakes, but are mostly readable. You are encouraged to see what kind of improvements can be found with more training or different parameters.

You do not need to modify any code to train the model, nor are you allowed to modify the model. You may modify the `trainIters` function, if you would like to improve how you track progress, or change parameters while training. For example, it can be useful to decrease the teacher-forcing ratio as training progresses.

*You must submit a trained model. This model must be a GPU model. It is not reasonable to train this model with a CPU; part of the assignment is training it on a GPU.*

*Note that the model is trained using single examples -- that is, it doesn't use batching. Batching is possible with seq2seq models, but for simplicity of reading the code we have not implemented it here.*

### 3. Implement a beam search evaluator

We provide you with an evaluation function that takes in an input sequence and generates an output sentence given a trained model. This evaluation function performs *greedy decoding* by taking the most likely token at each generation step. You are required to implement a beam search version of this evaluation function, that keeps track of the top *k* most likely sequences at each generation step, and then returns the top *k* best sequences with their associated probabilities.

### 4. Implement a BLEU evaluator and report scores

While loss and accuracy are good for tracking training progress, they don't tell us much about how well the model generates meaningful sentences. You need to implement a BLEU evaluation function that takes in an input/output pair and returns the BLEU score for how well the model predicts the output.

You can find a formal description of how to calculate BLEU in the original paper, [BLEU: A Method for Automatic Evaluation of Machine Translation](https://www.aclweb.org/anthology/P02-1040.pdf). We reprodue this formal description for you in the homework instructions.

When reporting these scores, use the *development dataset* provided. Report scores for greedy decoding and beam search (beam size=3). For beam search, use the top-scoring output sentence as the score for that datapoint.

*You must implement your BLEU evaluator from scratch.* There exist python libraries that implement BLEU for you. Do not use these.

### Don't forget the written portion!

A series of open-ended questions about these tasks are required for the written portion of this homework. Please see the homework instructions for this, as well as instructions about how to submit.

In [0]:
# You may modify this cell

import random
import time

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F

import torchtext

In [2]:
# DO NO MODIFY

# this is useful for checking if your code is successfully using the GPU

mydevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")
mydevice

device(type='cuda')

In [0]:
# DO NOT MODIFY

SOS_TOKEN = '<sos>'
EOS_TOKEN = '<eos>'

MAX_LEN = 50

def len_filter(example):
    return len(example.src) <= MAX_LEN and len(example.tgt) <= MAX_LEN

### Load dummy number reversal dataset

In [4]:
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


In [0]:
# # DO NOT MODIFY

# train_path = '/content/gdrive/My Drive/NLP/hw4/data/toy_reverse/train/data.txt'
# dev_path = '/content/gdrive/My Drive/NLP/hw4/data/toy_reverse/dev/data.txt'

# src = torchtext.data.Field(
#     batch_first=True, 
#     include_lengths=True
#     )
# tgt = torchtext.data.Field(
#     batch_first=True, 
#     preprocessing = lambda seq: [SOS_TOKEN] + seq + [EOS_TOKEN]
#     )

# data_train = torchtext.data.TabularDataset(
#         path=train_path, format='tsv',
#         fields=[('src', src), ('tgt', tgt)],
#         filter_pred=len_filter
#     )
# data_dev = torchtext.data.TabularDataset(
#         path=dev_path, format='tsv',
#         fields=[('src', src), ('tgt', tgt)],
#         filter_pred=len_filter
#     )

### 1. Load the e2e data

Load in the E2E data similar to how the dummy number reversal dataset is loaded. That is, use the same `torchtext.data.Field` and `torchtext.data.TabularDataset` classes.

In [0]:
# WRITE YOUR CODE FOR LOADING THE E2E DATA HERE
train_path = '/content/gdrive/My Drive/NLP/hw4/data/e2e-dataset/trainset.csv'
dev_path = '/content/gdrive/My Drive/NLP/hw4/data/e2e-dataset/devset.csv'

src = torchtext.data.Field(
    batch_first=True, 
    tokenize=lambda row: row.split(', '),
    include_lengths=True
    )
tgt = torchtext.data.Field(
    batch_first=True, 
    tokenize=lambda row: row.split(),
    preprocessing = lambda seq: [SOS_TOKEN] + seq + [EOS_TOKEN]
    )

data_train = torchtext.data.TabularDataset(
        path=train_path, format='csv',
        skip_header=True,
        fields=[('src', src), ('tgt', tgt)],
        filter_pred=len_filter
    )
data_dev = torchtext.data.TabularDataset(
        path=dev_path, format='csv',
        skip_header=True,
        fields=[('src', src), ('tgt', tgt)],
        filter_pred=len_filter
    )

Have a look at the vocab and some example data points.

*If you have loaded in the E2E dataset correctly, the code in the cell below should work without any modification.*

In [7]:
# You may modify this cell

src.build_vocab(data_train, max_size=50000)
tgt.build_vocab(data_train, max_size=50000)
input_vocab = src.vocab
output_vocab = tgt.vocab

print('20 tokens from input vocab:\n', list(input_vocab.stoi.keys())[:20])
print('\n20 tokens from output vocab:\n', list(output_vocab.stoi.keys())[:20])

print('\nnum training examples:', len(data_train.examples))

item = random.choice(data_train.examples)
print('\nexample train data:')
print('src:\n', item.src)
print('tgt:\n', item.tgt)

20 tokens from input vocab:
 ['<unk>', '<pad>', 'familyFriendly[yes]', 'area[riverside]', 'eatType[coffee shop]', 'familyFriendly[no]', 'area[city centre]', 'eatType[pub]', 'food[Japanese]', 'food[Italian]', 'food[Fast food]', 'food[French]', 'priceRange[moderate]', 'priceRange[less than £20]', 'customer rating[average]', 'customer rating[low]', 'priceRange[high]', 'customer rating[5 out of 5]', 'priceRange[more than £30]', 'food[Indian]']

20 tokens from output vocab:
 ['<unk>', '<pad>', 'is', '<eos>', '<sos>', 'a', 'The', 'the', 'in', 'near', 'of', 'and', 'food', 'customer', 'located', 'It', 'restaurant', 'has', 'coffee', 'price']

num training examples: 42037

example train data:
src:
 ['name[The Waterman]', 'food[Fast food]', 'priceRange[less than £20]', 'customer rating[low]', 'area[riverside]', 'familyFriendly[yes]']
tgt:
 ['<sos>', 'The', 'Waterman', 'provides', 'cheap', 'fast', 'food.', 'It', 'is', 'located', 'riverside.', 'It', 'is', 'family-Friendly,', 'but', 'has', 'a', 'low

### Model definition and training functions

In [0]:
# DO NOT MODIFY

class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)

    def forward(self, myinput, hidden):
        embedded = self.embedding(myinput).view(1, 1, -1)
        output = embedded
        output, hidden = self.gru(output, hidden)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=mydevice)

    
class DecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(DecoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(output_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)
        self.out = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        output = self.embedding(input).view(1, 1, -1)
        output = F.relu(output)
        output, hidden = self.gru(output, hidden)
        output = self.softmax(self.out(output[0]))
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=mydevice)

In [0]:
# DO NOT MODIFY

def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion,
          max_length=MAX_LEN, teacher_forcing_ratio=0.5):
    
    # get an initial hidden state for the encoder
    encoder_hidden = encoder.initHidden()

    # zero the gradients of the optimizers
    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    # get the seq lengths, used for iterating through encoder/decoder
    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)

    # create empty tensor to fill with encoder outputs
    encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=mydevice)

    # create a variable for loss
    loss = 0
    
    # pass the inputs through the encoder
    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
        encoder_outputs[ei] = encoder_output[0, 0]

    # create a start-of-sequence tensor for the decoder
    decoder_input = torch.tensor([[output_vocab.stoi[SOS_TOKEN]]], device=mydevice)

    # set the decoder hidden state to the final encoder hidden state
    decoder_hidden = encoder_hidden

    # decide if we will use teacher forcing
    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    for di in range(target_length):
        decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
        
        topv, topi = decoder_output.topk(1)
        decoder_input = topi.squeeze().detach()  # detach from history as input
                
        loss += criterion(decoder_output, target_tensor[di].unsqueeze(0))
        
        if use_teacher_forcing:
            decoder_input = target_tensor[di]
        
        if decoder_input.item() == output_vocab.stoi[EOS_TOKEN]:
                break

    loss.backward()

    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length

In [0]:
# You may modify this cell

def trainIters(encoder, decoder, n_iters, print_every=1000, learning_rate=0.01, teacher_forcing_ratio=0.5):
    print(f'Running {n_iters} epochs...')
    print_loss_total = 0
    print_loss_epoch = 0

    encoder_optim = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optim = optim.SGD(decoder.parameters(), lr=learning_rate)

    # note batch size of 1, just for simplicity
    # DO NOT INCREASE THE BATCH SIZE
    batch_iterator = torchtext.data.Iterator(
        dataset=data_train, batch_size=1,
        sort=False, sort_within_batch=True,
        sort_key=lambda x: len(x.src),
        device=mydevice, repeat=False)
    

    criterion = nn.NLLLoss()

    for e in range(n_iters):
        batch_generator = batch_iterator.__iter__()
        step = 0
        start = time.time()
        for batch in batch_generator:
            step += 1
            
            # get the input and target from the batch iterator
            input_tensor, input_lengths = getattr(batch, 'src')
            target_tensor = getattr(batch, 'tgt')
            
            # this is because we're not actually using the batches.
            # batch size is 1 and this just selects that first one
            input_tensor = input_tensor[0]
            target_tensor = target_tensor[0]

            loss = train(input_tensor, target_tensor, encoder, decoder, encoder_optim, decoder_optim, criterion, teacher_forcing_ratio=teacher_forcing_ratio)
            print_loss_total += loss
            print_loss_epoch += loss
            

            if step % print_every == 0:
                print_loss_avg = print_loss_total / print_every
                print_loss_total = 0
                t = (time.time() - start) / 60
                print(f'step: {step}\t avg loss: {print_loss_avg:.2f}\t time for {print_every} steps: {t:.2f} min')
                start = time.time()
        
        print_loss_avg = print_loss_epoch / step
        print_loss_epoch = 0
        print(f'End of epoch {e}, avg loss {print_loss_avg:.2f}')  

### 2. Create and train a model

In [0]:
# You may modify this cell

hidden_size = 128
encoder1 = EncoderRNN(len(input_vocab), hidden_size).to(mydevice)
decoder1 = DecoderRNN(hidden_size, len(output_vocab)).to(mydevice)

Here are some guidelines for how much training to expect. Note that these *guidelines*; they are not exact.

Only 1 epoch is needed for the number reversal dataset. This produces near-perfect results, and should take less than 5 minutes to run on a CPU.

To memorize ~5 examples of the e2e dataset, ~100 epochs are needed (with a high teacher forcing ratio). This produces near-perfect results.

To train on the full e2e dataset, only 1 epoch is needed to see decent outputs on the training data. More are required to increase fluency and see improvements on the development data.

In [12]:
# You may modify this cell
# but be sure that it prints some indication of how training is progressing

trainIters(encoder1, decoder1, 1, print_every=1000, teacher_forcing_ratio=0.75)

Running 1 epochs...
step: 1000	 avg loss: 4.31	 time for 1000 steps: 0.38 min
step: 2000	 avg loss: 3.60	 time for 1000 steps: 0.39 min
step: 3000	 avg loss: 3.42	 time for 1000 steps: 0.38 min
step: 4000	 avg loss: 3.24	 time for 1000 steps: 0.38 min
step: 5000	 avg loss: 3.10	 time for 1000 steps: 0.40 min
step: 6000	 avg loss: 3.03	 time for 1000 steps: 0.39 min
step: 7000	 avg loss: 3.07	 time for 1000 steps: 0.41 min
step: 8000	 avg loss: 2.94	 time for 1000 steps: 0.39 min
step: 9000	 avg loss: 2.84	 time for 1000 steps: 0.39 min
step: 10000	 avg loss: 2.82	 time for 1000 steps: 0.40 min
step: 11000	 avg loss: 2.72	 time for 1000 steps: 0.40 min
step: 12000	 avg loss: 2.66	 time for 1000 steps: 0.39 min
step: 13000	 avg loss: 2.60	 time for 1000 steps: 0.40 min
step: 14000	 avg loss: 2.65	 time for 1000 steps: 0.39 min
step: 15000	 avg loss: 2.64	 time for 1000 steps: 0.39 min
step: 16000	 avg loss: 2.62	 time for 1000 steps: 0.39 min
step: 17000	 avg loss: 2.57	 time for 1000 st

In [0]:
# WRITE YOUR CODE FOR SAVING YOUR MODEL HERE
ENCODER_PATH = '/content/gdrive/My Drive/NLP/hw4/encoder.mdl'
DECODER_PATH = '/content/gdrive/My Drive/NLP/hw4/decoder.mdl'

In [0]:
torch.save(encoder1.state_dict(), ENCODER_PATH)
torch.save(decoder1.state_dict(), DECODER_PATH)

In [15]:
# We encourage you to confirm that you can load your trained model here also
encoder = EncoderRNN(len(input_vocab), hidden_size).to(mydevice)
decoder = DecoderRNN(hidden_size, len(output_vocab)).to(mydevice)
encoder.load_state_dict(torch.load(ENCODER_PATH))
decoder.load_state_dict(torch.load(DECODER_PATH))

<All keys matched successfully>

In [0]:
# DO NOT MODIFY

def evaluate(encoder, decoder, sentence, max_length=MAX_LEN):
    with torch.no_grad():
        input_tensor = torch.tensor([input_vocab.stoi[word] for word in sentence], device=mydevice)
        input_length = input_tensor.size()[0]
        encoder_hidden = encoder.initHidden()

        encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=mydevice)

        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]

        decoder_input = torch.tensor([[output_vocab.stoi[SOS_TOKEN]]], device=mydevice)

        decoder_hidden = encoder_hidden

        decoded_words = []

        for di in range(max_length):
            decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
            topv, topi = decoder_output.data.topk(1)
            next_word = output_vocab.itos[topi.item()]
            decoded_words.append(next_word)
            if next_word == EOS_TOKEN:
                break

            decoder_input = topi.squeeze().detach()

        return decoded_words

In [0]:
# for i in range(5):
#     item = random.choice(data_train.examples)
#     seq = item.src
#     print(seq)
#     words = evaluate(encoder_loaded, decoder_loaded, seq)
#     print(' '.join(words))
#     print()

### 3. Implement beam search evaluator

Be sure to return all the output sequences (i.e. if the beam size is k, you should return k sequences) and their associated probabilities. You will need the associated probabilities to select the best performing sequence when calculating BLEU.

In [0]:
# WRITE YOUR CODE FOR BEAM SEARCH HERE

# The output of this cell should be an example input from the dev set, 
# and three outputs from a beam search evaluator.

def evaluate_beam_search(encoder, decoder, sentence, max_length=MAX_LEN, beam_size=3):
    with torch.no_grad():
        input_tensor = torch.tensor([input_vocab.stoi[word] for word in sentence], device=mydevice)
        input_length = input_tensor.size()[0]
        encoder_hidden = encoder.initHidden()
        encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=mydevice)
        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]

        decoder_input = torch.tensor([[output_vocab.stoi[SOS_TOKEN]]], device=mydevice)
        decoder_hidden = encoder_hidden

        # Above are the same with greedy evaluate.
####################################################################
        # Initialize the beams
        beams = [] #First element is sequence of tokens, second is corresponding log likelihood.
        decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
        topv, topi = decoder_output.data.topk(beam_size)
        for i in range(beam_size):
            beams.append([[topi[0][i]], topv[0][i].item()])

        # Since above has run 1 decoder, here run max_length-1 times.
        for di in range(max_length-1):
            candidates = [] # Store k*k candidate seqs.
            for ibeam in beams:
                decoder_input = ibeam[0][-1] # Take last word in beam seq as input
                # If current beam has ended, add it to candidates without expanding
                if output_vocab.itos[decoder_input] == EOS_TOKEN:
                    candidates.append(ibeam)
                    continue 
                decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
                topv, topi = decoder_output.data.topk(beam_size)
                # Add the new words to seqs
                for i in range(beam_size):
                    candidates.append([ibeam[0]+[topi[0][i]], ibeam[1]+topv[0][i].item()])
                    # Since the output of decoder is log likelihood, use summation.
            # Choose top k seqs according to length-normalized log-likelihood.
            if candidates:
                beams = sorted(candidates, key=lambda x:x[1]/len(x[0]), reverse=True)[:beam_size]

        # Transform tokens into words:
        decoded_words = []
        for i in range(beam_size):
            tmp_words = []
            for j in beams[i][0]:
                tmp_words.append(output_vocab.itos[j])
            decoded_words.append(tmp_words)

        return decoded_words

In [0]:
# # test beam search1
# # paras
# sentence = random.choice(data_train.examples).src
# max_length=MAX_LEN
# beam_size=3

# #code
# input_tensor = torch.tensor([input_vocab.stoi[word] for word in sentence], device=mydevice)
# input_length = input_tensor.size()[0]
# encoder_hidden = encoder.initHidden()

# encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=mydevice)

# for ei in range(input_length):
#     encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
#     encoder_outputs[ei] += encoder_output[0, 0]

# decoder_input = torch.tensor([[output_vocab.stoi[SOS_TOKEN]]], device=mydevice)

# decoder_hidden = encoder_hidden

In [0]:
# # test beam search1
# beams = [] #first element is sequence, second is corresponding log likelihood.

# decoder_input = torch.tensor([[output_vocab.stoi[SOS_TOKEN]]], device=mydevice)
# decoder_hidden = encoder_hidden
# decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
# topv, topi = decoder_output.data.topk(beam_size)

# for i in range(beam_size):
#     beams.append([[topi[0][i]], topv[0][i].item()])
# print(beams)
# for di in range(max_length-1):
#     candidates = []
#     for ibeam in beams:
#         decoder_input = ibeam[0][-1]
#         if output_vocab.itos[decoder_input] == EOS_TOKEN:
#             candidates.append(ibeam)
#             continue
#         decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
#         topv, topi = decoder_output.data.topk(beam_size)
#         for i in range(beam_size):
#             candidates.append([ibeam[0]+[topi[0][i]], ibeam[1]+topv[0][i].item()])
#     if candidates:
#         beams = sorted(candidates, key=lambda x:x[1]/len(x[0]), reverse=True)[:beam_size]
#     #print(beams)
# test = []
# for i in beams[0][0]:
#     test.append(output_vocab.itos[i])
# print(np.shape(test),test)
# test = []
# for i in beams[1][0]:
#     test.append(output_vocab.itos[i])
# print(np.shape(test),test)
# test = []
# for i in beams[2][0]:
#     test.append(output_vocab.itos[i])
# print(np.shape(test),test)
#     # decoder_input = topi.squeeze().detach()

Have a look at some generated sequences! This is the fun part.

In [21]:
# You may modify this cell

# This selects 5 random datapoints from the training data and shows the generated sequence

for i in range(5):
    item = random.choice(data_train.examples)
    seq = item.src
    print(seq)
    words = evaluate(encoder, decoder, seq)
    print(' '.join(words))
    print()

['name[Green Man]', 'food[Chinese]', 'priceRange[high]', 'area[riverside]', 'familyFriendly[no]', 'near[All Bar One]']
<sos> Green Man is a Chinese restaurant in the price range It is is located in the the is in the the is in the the is the price range in the the is <eos>

['name[Giraffe]', 'eatType[coffee shop]', 'priceRange[more than £30]', 'customer rating[high]', 'familyFriendly[yes]', 'near[The Bakers]']
<sos> Giraffe is a children friendly coffee shop near The Bakers. It is located near The Bakers. It is located near the riverside. It is near The Bakers. <eos>

['name[Fitzbillies]', 'eatType[coffee shop]', 'food[French]', 'priceRange[more than £30]', 'customer rating[5 out of 5]', 'area[riverside]', 'familyFriendly[yes]']
<sos> Fitzbillies is a coffee shop in the It 5 5 star in the price range of 5 <eos>

['name[Fitzbillies]', 'eatType[coffee shop]', 'food[French]', 'priceRange[less than £20]', 'customer rating[average]', 'area[riverside]', 'familyFriendly[no]']
<sos> Fitzbillies

### 4. Implement BLEU evaluator

Remember that when calculating BLEU using beam search, select the top-scoring sequence output using the model probability.

In [0]:
# The output of this cell should be the average BLEU score on the dev set
# for greedy decoding AND for beam search decoding (beam size = 3)
import string
# Preprocess the seq from data_dev
def preprocessing(seq):
    if EOS_TOKEN in seq:
        seq.remove(EOS_TOKEN)
    if SOS_TOKEN in seq:
        seq.remove(SOS_TOKEN)
    seq = [i.translate(str.maketrans('','',string.punctuation)).lower() for i in seq]
    return seq

In [23]:
# WRITE YOUR CODE FOR THE BLEU EVALUATION HERE
# Get the src-tgt dictionary  in dev data.
dev_dict = {}
for i in data_dev.examples:
    key = tuple(i.src)
    if key not in dev_dict:
        dev_dict[key] = [preprocessing(i.tgt)]
    else:
        dev_dict[key] = dev_dict[key]+[preprocessing(i.tgt)]
print(len(dev_dict))

547


In [0]:
# Calculate pn: modified precision scores for a set of ngrams.
from collections import Counter
from nltk import ngrams
def pn(src, tgt_list, n):
    counts = Counter(ngrams(src,n))
    max_counts = {}
    for i in tgt_list:
        reference_counts = Counter(ngrams(i,n))
        for ngram in counts:
            max_counts[ngram] = max(max_counts.get(ngram, 0), reference_counts[ngram])

    clipped_counts = {ngram: min(count, max_counts[ngram]) for ngram, count in counts.items()}
    return sum(clipped_counts.values())/max(1, sum(counts.values())) 

In [0]:
# Calculate the Brevity Penalty
import math
def brevity_penalty(tgt_list, src_len):
    ref_lens = [len(i) for i in tgt_list]
    closest_ref_len = min(ref_lens, key = lambda ref_len: (abs(ref_len-src_len), ref_len))
    if closest_ref_len <= src_len:
        return 1
    else:
        return math.exp(1-closest_ref_len/src_len)

In [0]:
# Calculate BLEU4
def BLEU4(src, tgt_list):
    p1 = pn(src, tgt_list, 1)
    p2 = pn(src, tgt_list, 2)
    p3 = pn(src, tgt_list, 3)
    p4 = pn(src, tgt_list, 4)
    if p1 == 0:
        avg = 0
    elif p2 == 0:
        avg = p1
    elif p3 == 0:
        avg = (p1*p2)**0.5
    elif p4 == 0:
        avg = (p1*p1*p3)**(1/3)
    else:
        avg = (p1*p1*p3*p4)**(1/4)
    return avg*brevity_penalty(tgt_list, len(src))

In [0]:
# from nltk.translate.bleu_score import sentence_bleu
# sentence_bleu(value, candidate)

In [0]:
bleu_greedy = []
for key, value in dev_dict.items():
    candidate = preprocessing(evaluate(encoder, decoder, key))
    bleu_greedy.append(BLEU4(candidate, value))
avg_bleu_greedy = sum(bleu_greedy)/len(bleu_greedy)

In [29]:
avg_bleu_greedy

0.4393657025745546

In [0]:
bleu_beam = []
for key, value in dev_dict.items():
    candidates = evaluate_beam_search(encoder, decoder, key)
    max_bleu = 0
    for ican in candidates:
        max_bleu = max(max_bleu, BLEU4(preprocessing(ican), value))
    bleu_beam.append(max_bleu)
avg_bleu_beam = sum(bleu_beam)/len(bleu_beam)

In [39]:
avg_bleu_beam

0.45755794719648657

## error analysis

In [33]:
key

('name[Wildwood]',
 'eatType[coffee shop]',
 'food[English]',
 'priceRange[moderate]',
 'customer rating[3 out of 5]',
 'near[Ranch]')

In [34]:
test_name = ('name[Jings Malatang]','eatType[coffee shop]', 'priceRange[high]', 'area[city centre]', 'near[The Sorrento]')
candidate = evaluate(encoder, decoder, test_name)
print(" ".join(candidate))
candidates = evaluate_beam_search(encoder, decoder, test_name)
for ican in candidates:
    print(" ".join(ican))

<sos> The Mill is a coffee shop located near The near The coffee shop in the in the <eos>
<sos> The Sorrento is a coffee shop that serves Indian food. <eos>
<sos> Near The Mill is a coffee shop that <eos>
<sos> The Sorrento is a coffee shop that serves Indian food <eos>


In [42]:
i = 0
for key, value in dev_dict.items():
    i+=1
    if i%20!=0:continue
    print(i,'\n')
    print(key)
    candidate = evaluate(encoder, decoder, key)
    print("GREEDY PREDICT:")
    print(" ".join(candidate))
    print(BLEU4(candidate, value))
    print("\n")
    print("BEAM PREDICT:")
    candidates = evaluate_beam_search(encoder, decoder, key)
    for ican in candidates:
        print(" ".join(ican))
        print( BLEU4(preprocessing(ican), value))
    # print("\n REFERENCE:")
    # for ival in value:
    #     print(" ".join(ival))
    #print(BLEU4(candidate, value),"\n")
   

20 

('name[Aromi]', 'eatType[coffee shop]', 'food[Chinese]', 'customer rating[low]', 'area[city centre]', 'familyFriendly[no]')
GREEDY PREDICT:
<sos> Aromi is a coffee shop in the city centre. It city centre. It is not family-friendly. It is not family-friendly. <eos>
0.30051459922164164


BEAM PREDICT:
<sos> Aromi is a coffee shop serving Chinese food is not family friendly. <eos>
0.5890769257099558
<sos> Aromi is a coffee shop serving Chinese food. <eos>
0.33483239880625626
<sos> Aromi in the coffee shop serving Chinese food. <eos>
0.3277793447161845
40 

('name[Bibimbap House]', 'area[riverside]', 'near[Café Sicilia]')
GREEDY PREDICT:
<sos> Near the riverside near the Café is called The Dumpling Tree is a family friendly family friendly coffee shop <eos>
0.2760262237369417


BEAM PREDICT:
<sos> Near the riverside is a family friendly restaurant. <eos>
0.44153854132913556
<sos> Near the riverside is a family friendly restaurant called Clowns is located near Clare Hall. <eos>
0.30254

In [36]:
import numpy as np
np.shape(data_train.examples)

(42037,)

In [37]:
for i in range(1000):
    if data_train.examples[i].src[0] == 'name[The Eagle]':
        print(data_train.examples[i].src)

['name[The Eagle]', 'eatType[coffee shop]', 'food[Japanese]', 'priceRange[less than £20]', 'customer rating[low]', 'area[riverside]', 'familyFriendly[yes]', 'near[Burger King]']
['name[The Eagle]', 'food[Chinese]', 'customer rating[1 out of 5]']
['name[The Eagle]', 'eatType[coffee shop]', 'food[French]', 'priceRange[more than £30]', 'customer rating[5 out of 5]', 'area[riverside]', 'familyFriendly[yes]', 'near[Burger King]']
['name[The Eagle]', 'eatType[coffee shop]', 'food[French]', 'priceRange[more than £30]', 'customer rating[low]', 'area[riverside]', 'familyFriendly[yes]', 'near[Burger King]']
['name[The Eagle]', 'eatType[coffee shop]', 'food[Italian]', 'priceRange[high]', 'customer rating[average]', 'area[riverside]', 'familyFriendly[yes]', 'near[Burger King]']
['name[The Eagle]', 'food[English]', 'customer rating[5 out of 5]']
['name[The Eagle]', 'eatType[coffee shop]', 'food[French]', 'priceRange[cheap]', 'customer rating[5 out of 5]', 'area[city centre]', 'familyFriendly[no]', 