# Assignment 3



# Question 2.1. Train the sequence to sequence model (Model 1) provided for a language pair where the output is English and the input is a language of your choice.

This code translates Italian to English.

We followed a tutorial by Sean Robertson at

https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb
 

In [1]:
%matplotlib inline
#import matplotlib.pyplot as plt

In this project we will be teaching a neural network to translate from
Italian to English.

This is made possible by the simple but powerful idea of the `sequence
to sequence network <https://arxiv.org/abs/1409.3215>`__, in which two
recurrent neural networks work together to transform one sequence to
another. An encoder network condenses an input sequence into a vector,
and a decoder network unfolds that vector into a new sequence.

To improve upon this model we'll use an `attention
mechanism <https://arxiv.org/abs/1409.0473>`__, which lets the decoder
learn to focus over a specific range of the input sequence.



In [2]:
from __future__ import unicode_literals, print_function, division
from io import open
import unicodedata
import string
import re
import random

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load data files

Similar to the character encoding used in the character-level RNN
tutorials, we will be representing each word in a language as a one-hot
vector, or giant vector of zeros except for a single one (at the index
of the word).

We'll need a unique index per word to use as the inputs and targets of
the networks later. To keep track of all this we will use a helper class
called ``Lang`` which has word → index (``word2index``) and index → word
(``index2word``) dictionaries, as well as a count of each word
``word2count`` to use to later replace rare words.




In [3]:
SOS_token = 0
EOS_token = 1

class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

The files are all in Unicode, to simplify we will turn Unicode
characters to ASCII, make everything lowercase, and trim most
punctuation.




In [4]:
# Turn a Unicode string to plain ASCII, thanks to
# https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

# Lowercase, trim, and remove non-letter characters

def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    return s

In [7]:
#from google.colab import files
#uploaded = files.upload()

#### here choose eng-ita.txt for full
#### choose eng-itS.txt for small

To read the data file we will split the file into lines, and then split
lines into pairs. The files are all English → Other Language, so if we
want to translate from Other Language → English I added the ``reverse``
flag to reverse the pairs.




In [8]:
def readLangs(lang1, lang2, reverse = False):
    print("Reading lines...")

    # Read the file and split into lines

    #lines = open('data/%s-%s.txt' % (lang1, lang2), encoding='utf-8').\
    lines = open('%s-%s.txt' % (lang1, lang2), encoding='utf-8').read().strip().\
    translate(str.maketrans('', '', string.punctuation)).split('\n')

    # Split every line into pairs and normalize
    pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines]

    # Reverse pairs, make Lang instances
    if reverse:
        print("Reverse is true")
        pairs = [list(reversed(p)) for p in pairs]
        input_lang = Lang(lang2)
        output_lang = Lang(lang1)
    else:
        pairs = [list(p) for p in pairs]
        input_lang = Lang(lang1)
        output_lang = Lang(lang2)

    return input_lang, output_lang, pairs

Since there are a *lot* of example sentences and we want to train
something quickly, we'll trim the data set to only relatively short and
simple sentences. Here the maximum length is 10 words (that includes
ending punctuation) and we're filtering to sentences that translate to
the form "I am" or "He is" etc. (accounting for apostrophes replaced
earlier).




In [9]:
MAX_LENGTH = 10

eng_prefixes = (
    "i am ", "i m ",
    "he is", "he s ",
    "she is", "she s ",
    "you are", "you re ",
    "we are", "we re ",
    "they are", "they re "
)

def filterPair(p):
      return len(p[0].split(' ')) < MAX_LENGTH and \
      len(p[1].split(' ')) < MAX_LENGTH # and \
  ####
#        (p[1].startswith(eng_prefixes) or p[0].startswith(eng_prefixes))

def filterPairs(pairs):
    return [pair for pair in pairs if filterPair(pair)]

The full process for preparing the data is:

-  Read text file and split into lines, split lines into pairs
-  Normalize text, filter by length and content
-  Make word lists from sentences in pairs




In [10]:
def prepareData(lang1, lang2, reverse=False):
  
    input_lang, output_lang, pairs = readLangs(lang1, lang2, reverse)
    print("Read %s sentence pairs" % len(pairs))

    #### remove call to filterPairs() when do final training
    pairs = filterPairs(pairs)
    print("Trimmed to %s sentence pairs" % len(pairs))
    print("Counting words...")
    for pair in pairs:
        input_lang.addSentence(pair[0])
        output_lang.addSentence(pair[1])
    print("Counted words:")
    print(input_lang.name, input_lang.n_words)
    print(output_lang.name, output_lang.n_words)
    return input_lang, output_lang, pairs


In [11]:
# PREPARE DATA TO TRANSLATE FROM ITALIAN TO ENGLISH

#### change language here to ita from itS when do final run
input_lang, output_lang, pairs = prepareData('eng', 'ita', True)

print(random.choice(pairs))

Reading lines...
Reverse is true
Read 331799 sentence pairs
Trimmed to 314351 sentence pairs
Counting words...
Counted words:
ita 25472
eng 12427
['qual era il problema di tom a riguardo', 'what was toms problem with that']


The Seq2Seq Model
=================

A Recurrent Neural Network, or RNN, is a network that operates on a
sequence and uses its own output as input for subsequent steps.

A `Sequence to Sequence network <https://arxiv.org/abs/1409.3215>`__, or
seq2seq network, or `Encoder Decoder
network <https://arxiv.org/pdf/1406.1078v3.pdf>`__, is a model
consisting of two RNNs called the encoder and decoder. The encoder reads
an input sequence and outputs a single vector, and the decoder reads
that vector to produce an output sequence.

Unlike sequence prediction with a single RNN, where every input
corresponds to an output, the seq2seq model frees us from sequence
length and order, which makes it ideal for translation between two
languages.

With a seq2seq model the encoder creates a single vector which, in the
ideal case, encodes the "meaning" of the input sequence into a single
vector — a single point in some N dimensional space of sentences.




The Encoder
-----------

The encoder of a seq2seq network is a RNN that outputs some value for
every word from the input sentence. For every input word the encoder
outputs a vector and a hidden state, and uses the hidden state for the
next input word.



In [12]:
class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)
        
#        if glove_model == True:
#          self.embedding.weight.data.copy_(torch.from_numpy(weights_matrix))
    

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output = embedded
        output, hidden = self.gru(output, hidden)
        return output, hidden      

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

The Decoder
-----------

The decoder is another RNN that takes the encoder output vector(s) and
outputs a sequence of words to create the translation.




Simple Decoder
^^^^^^^^^^^^^^

In the simplest seq2seq decoder we use only last output of the encoder.
This last output is sometimes called the *context vector* as it encodes
context from the entire sequence. This context vector is used as the
initial hidden state of the decoder.

At every step of decoding, the decoder is given an input token and
hidden state. The initial input token is the start-of-string ``<SOS>``
token, and the first hidden state is the context vector (the encoder's
last hidden state).

.. figure:: /_static/img/seq-seq-images/decoder-network.png
   :alt:





In [13]:
class DecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(DecoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(output_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)
        self.out = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        output = self.embedding(input).view(1, 1, -1)
        output = F.relu(output)
        output, hidden = self.gru(output, hidden)
        output = self.softmax(self.out(output[0]))
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

I encourage you to train and observe the results of this model, but to
save space we'll be going straight for the gold and introducing the
Attention Mechanism.




Attention Decoder
^^^^^^^^^^^^^^^^^

If only the context vector is passed betweeen the encoder and decoder,
that single vector carries the burden of encoding the entire sentence.

Attention allows the decoder network to "focus" on a different part of
the encoder's outputs for every step of the decoder's own outputs. First
we calculate a set of *attention weights*. These will be multiplied by
the encoder output vectors to create a weighted combination. The result
(called ``attn_applied`` in the code) should contain information about
that specific part of the input sequence, and thus help the decoder
choose the right output words.

.. figure:: https://i.imgur.com/1152PYf.png
   :alt:

Calculating the attention weights is done with another feed-forward
layer ``attn``, using the decoder's input and hidden state as inputs.
Because there are sentences of all sizes in the training data, to
actually create and train this layer we have to choose a maximum
sentence length (input length, for encoder outputs) that it can apply
to. Sentences of the maximum length will use all the attention weights,
while shorter sentences will only use the first few.

.. figure:: /_static/img/seq-seq-images/attention-decoder-network.png
   :alt:





In [14]:
class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, hidden, encoder_outputs):
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded)

        attn_weights = F.softmax(
            self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
        attn_applied = torch.bmm(attn_weights.unsqueeze(0),
                                 encoder_outputs.unsqueeze(0))

        output = torch.cat((embedded[0], attn_applied[0]), 1)
        output = self.attn_combine(output).unsqueeze(0)

        output = F.relu(output)
        output, hidden = self.gru(output, hidden)

        output = F.log_softmax(self.out(output[0]), dim=1)
        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

<div class="alert alert-info"><h4>Note</h4><p>There are other forms of attention that work around the length
  limitation by using a relative position approach. Read about "local
  attention" in `Effective Approaches to Attention-based Neural Machine
  Translation <https://arxiv.org/abs/1508.04025>`__.</p></div>

Training
========

Preparing Training Data
-----------------------

To train, for each pair we will need an input tensor (indexes of the
words in the input sentence) and target tensor (indexes of the words in
the target sentence). While creating these vectors we will append the
EOS token to both sequences.




In [15]:
def indexesFromSentence(lang, sentence):
    return [lang.word2index[word] for word in sentence.split(' ')]


def tensorFromSentence(lang, sentence):
    indexes = indexesFromSentence(lang, sentence)
    indexes.append(EOS_token)
    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)


def tensorsFromPair(pair):
    input_tensor = tensorFromSentence(input_lang, pair[0])
    target_tensor = tensorFromSentence(output_lang, pair[1])
    return (input_tensor, target_tensor)

Training the Model
------------------

To train we run the input sentence through the encoder, and keep track
of every output and the latest hidden state. Then the decoder is given
the ``<SOS>`` token as its first input, and the last hidden state of the
encoder as its first hidden state.

"Teacher forcing" is the concept of using the real target outputs as
each next input, instead of using the decoder's guess as the next input.
Using teacher forcing causes it to converge faster but `when the trained
network is exploited, it may exhibit
instability <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.378.4095&rep=rep1&type=pdf>`__.

You can observe outputs of teacher-forced networks that read with
coherent grammar but wander far from the correct translation -
intuitively it has learned to represent the output grammar and can "pick
up" the meaning once the teacher tells it the first few words, but it
has not properly learned how to create the sentence from the translation
in the first place.

Because of the freedom PyTorch's autograd gives us, we can randomly
choose to use teacher forcing or not with a simple if statement. Turn
``teacher_forcing_ratio`` up to use more of it.




In [16]:
teacher_forcing_ratio = 0
#### set to 0 when run final training


def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length=MAX_LENGTH):
    encoder_hidden = encoder.initHidden()

    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)

    encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

    loss = 0

    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(
            input_tensor[ei], encoder_hidden)
        encoder_outputs[ei] = encoder_output[0, 0]

    decoder_input = torch.tensor([[SOS_token]], device=device)

    decoder_hidden = encoder_hidden

    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    if use_teacher_forcing:
        # Teacher forcing: Feed the target as the next input
        for di in range(target_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            loss += criterion(decoder_output, target_tensor[di])
            decoder_input = target_tensor[di]  # Teacher forcing

    else:
        # Without teacher forcing: use its own predictions as the next input
        for di in range(target_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            topv, topi = decoder_output.topk(1)
            decoder_input = topi.squeeze().detach()  # detach from history as input

            loss += criterion(decoder_output, target_tensor[di])
            if decoder_input.item() == EOS_token:
                break

    loss.backward()

    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length

This is a helper function to print time elapsed and estimated time
remaining given the current time and progress %.




In [17]:
import time
import math


def asMinutes(s):
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)


def timeSince(since, percent):
    now = time.time()
    s = now - since
    es = s / (percent)
    rs = es - s
    return '%s (- %s)' % (asMinutes(s), asMinutes(rs))

The whole training process looks like this:

-  Start a timer
-  Initialize optimizers and criterion
-  Create set of training pairs
-  Start empty losses array for plotting

Then we call ``train`` many times and occasionally print the progress (%
of examples, time so far, estimated time) and average loss.




In [18]:
def trainIters(encoder, decoder, n_iters, print_every=1000, plot_every=100, learning_rate=0.01):
    start = time.time()
    plot_losses = []
    print_loss_total = 0  # Reset every print_every
    plot_loss_total = 0  # Reset every plot_every

    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)
    training_pairs = [tensorsFromPair(random.choice(pairs))
                      for i in range(n_iters)]
    criterion = nn.NLLLoss()

    for iter in range(1, n_iters + 1):
        training_pair = training_pairs[iter - 1]
        input_tensor = training_pair[0]
        target_tensor = training_pair[1]

        loss = train(input_tensor, target_tensor, encoder,
                     decoder, encoder_optimizer, decoder_optimizer, criterion)
        print_loss_total += loss
        plot_loss_total += loss

        if iter % print_every == 0:
            print_loss_avg = print_loss_total / print_every
            print_loss_total = 0
            print('%s (%d %d%%) %.4f' % (timeSince(start, iter / n_iters),
                                         iter, iter / n_iters * 100, print_loss_avg))

        if iter % plot_every == 0:
            plot_loss_avg = plot_loss_total / plot_every
            plot_losses.append(plot_loss_avg)
            plot_loss_total = 0

    showPlot(plot_losses)

Plotting results
----------------

Plotting is done with matplotlib, using the array of loss values
``plot_losses`` saved while training.




In [19]:
import matplotlib.pyplot as plt
plt.switch_backend('agg')
import matplotlib.ticker as ticker
import numpy as np


def showPlot(points):
    plt.figure()
    fig, ax = plt.subplots()
    # this locator puts ticks at regular intervals
    loc = ticker.MultipleLocator(base=0.2)
    ax.yaxis.set_major_locator(loc)
    plt.plot(points)

Evaluation
==========

Evaluation is mostly the same as training, but there are no targets so
we simply feed the decoder's predictions back to itself for each step.
Every time it predicts a word we add it to the output string, and if it
predicts the EOS token we stop there. We also store the decoder's
attention outputs for display later.




In [20]:
def evaluate(encoder, decoder, sentence, max_length=MAX_LENGTH):
    with torch.no_grad():
        input_tensor = tensorFromSentence(input_lang, sentence)
        input_length = input_tensor.size()[0]
        encoder_hidden = encoder.initHidden()

        encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei],
                                                     encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]

        decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS

        decoder_hidden = encoder_hidden

        decoded_words = []
        decoder_attentions = torch.zeros(max_length, max_length)

        for di in range(max_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            decoder_attentions[di] = decoder_attention.data
            topv, topi = decoder_output.data.topk(1)
            if topi.item() == EOS_token:
                decoded_words.append('<EOS>')
                break
            else:
                decoded_words.append(output_lang.index2word[topi.item()])

            decoder_input = topi.squeeze().detach()

        return decoded_words, decoder_attentions[:di + 1]

We can evaluate random sentences from the training set and print out the
input, target, and output to make some subjective quality judgements:




In [21]:
def evaluateRandomly(encoder, decoder, n=10):
    for i in range(n):
        pair = random.choice(pairs)
        print('>', pair[0])
        print('=', pair[1])
        output_words, attentions = evaluate(encoder, decoder, pair[0])
        output_sentence = ' '.join(output_words)
        print('<', output_sentence)
        print('')

Training and Evaluating
=======================

With all these helper functions in place (it looks like extra work, but
it makes it easier to run multiple experiments) we can actually
initialize a network and start training.

Remember that the input sentences were heavily filtered. For this small
dataset we can use relatively small networks of 256 hidden nodes and a
single GRU layer. After about 40 minutes on a MacBook CPU we'll get some
reasonable results.

.. Note::
   If you run this notebook you can train, interrupt the kernel,
   evaluate, and continue training later. Comment out the lines where the
   encoder and decoder are initialized and run ``trainIters`` again.




In [22]:
hidden_size = 256
glove_model = False
encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)


####
#trainIters(encoder1, attn_decoder1, 10000, print_every=1000)
trainIters(encoder1, attn_decoder1, 250000, print_every=25000)

5m 17s (- 47m 34s) (25000 10%) 3.6881
10m 42s (- 42m 49s) (50000 20%) 3.0849
16m 11s (- 37m 47s) (75000 30%) 2.7710
21m 44s (- 32m 36s) (100000 40%) 2.5695
27m 18s (- 27m 18s) (125000 50%) 2.4404
32m 53s (- 21m 55s) (150000 60%) 2.3408
38m 29s (- 16m 29s) (175000 70%) 2.2891
44m 6s (- 11m 1s) (200000 80%) 2.1983
49m 44s (- 5m 31s) (225000 90%) 2.1492
55m 22s (- 0m 0s) (250000 100%) 2.1452


In [23]:
evaluateRandomly(encoder1, attn_decoder1)

> lascio casa
= he left home
< i left <EOS>

> perche hai mentito a riguardo
= why did you lie about that
< why did you lie to that <EOS>

> lei resta in contatto con lui
= she stays in touch with him
< she him him with touch touch with <EOS>

> voi avete un ottimo alibi
= you have a great alibi
< youve have have alibi alibi <EOS>

> e sabato
= its saturday
< she is <EOS>

> saluta
= say goodbye
< write it <EOS>

> chi pensi che siamo
= who do you think we are
< who do you think <EOS>

> mi sono divertita a giocare a carte con tom
= i had fun playing cards with tom
< i took good with with with tom <EOS>

> vi siete decisi
= did you make up your mind
< you you up <EOS>

> io sono il piu alto
= im the tallest
< im the tallest <EOS>



Visualizing Attention
---------------------

A useful property of the attention mechanism is its highly interpretable
outputs. Because it is used to weight specific encoder outputs of the
input sequence, we can imagine looking where the network is focused most
at each time step.

You could simply run ``plt.matshow(attentions)`` to see attention output
displayed as a matrix, with the columns being input steps and rows being
output steps:




In [24]:
output_words, attentions = evaluate(
    encoder1, attn_decoder1, "non posso aspettarla")
output_words
#plt.matshow(attentions.numpy())

['i', 'cant', 'stand', 'it', '<EOS>']

For a better viewing experience we will do the extra work of adding axes
and labels:




In [25]:
def showAttention(input_sentence, output_words, attentions):
    # Set up figure with colorbar
    fig = plt.figure()
    ax = fig.add_subplot(111)
    cax = ax.matshow(attentions.numpy(), cmap='bone')
    fig.colorbar(cax)

    # Set up axes
    ax.set_xticklabels([''] + input_sentence.split(' ') +
                       ['<EOS>'], rotation=90)
    ax.set_yticklabels([''] + output_words)

    # Show label at every tick
    ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
    ax.yaxis.set_major_locator(ticker.MultipleLocator(1))
    #### remove to show plots
    #plt.show()




In [26]:
def evaluateAndShowAttention(input_sentence):
    input_sentenceR = input_sentence.translate(str.maketrans('', '', string.punctuation)).lower()
    output_words, attentions = evaluate(
        encoder1, attn_decoder1, input_sentenceR)
    print('input =', input_sentence)
    print('output =', ' '.join(output_words))
    showAttention(input_sentence, output_words, attentions)


In [27]:
# EVALUATE ON TEST SENTENCES

evaluateAndShowAttention("Sono grosso e grasso.")
evaluateAndShowAttention("Tu stai dormendo.")
evaluateAndShowAttention("Io sono piu alto di lui.")
evaluateAndShowAttention("Siamo attori famosi.")
evaluateAndShowAttention("Tu starai meglio con me.")


input = Sono grosso e grasso.
output = im is fat <EOS>
input = Tu stai dormendo.
output = youre you sleeping <EOS>
input = Io sono piu alto di lui.
output = im taller than than him <EOS>
input = Siamo attori famosi.
output = were lucky <EOS>
input = Tu starai meglio con me.
output = you you better better with me <EOS>


#Question 2.2. Now train another model (Model 2) for the reverse (i.e., from English to the language you chose). In this model, use the GloVe 100 dimensional embeddings (see notebook 4, cell 2 for an example) while training

# PREPARE DATA TO TRANSLATE FROM ENGLISH TO ITALIAN

#### change language here to ita from itS when do final run
input_lang2, output_lang2, pairs = prepareData('eng', 'ita', False)
print(random.choice(pairs))

# First let's try the existing model on Italian to English translations
# TRAIN MODEL

# MODEL ARCHITECTURE
hidden_size = 256
encoder1 = EncoderRNN(input_lang2.n_words, hidden_size).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang2.n_words, dropout_p=0.1).to(device)

# TRAIN MODEL
#trainIters(encoder1, attn_decoder1, 75000, print_every=5000)
trainIters(encoder1, attn_decoder1, 750, print_every=100)

# EVALUATE
evaluateRandomly(encoder1, attn_decoder1)

# TEST ON FIVE SENTENCES
evaluateAndShowAttention("I am walking to the store.")
evaluateAndShowAttention("This dinner is delicious.")
evaluateAndShowAttention("My mother lived in a white house.")
evaluateAndShowAttention("He has three friends.")
evaluateAndShowAttention("I love a good red wine.")


print(input_lang2.n_words)
print(hidden_size)

# USING GloVe 100
NOW LET'S FIGURE OUT HOW TO CHANGE THE INPUT TO THE DECODER. IN THE FIRST PART, WE DEVELOPED THE ENCODINGS FROM OUR VOCABULARY USING ONE HOT ENCODING. HERE, WE WILL USE THE GLOVE ENCODINGS AS OUR INITIAL WEIGHTS

We followed this tutorial and article on loading and using the GloVe embeddings

https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/B%20-%20A%20Closer%20Look%20at%20Word%20Embeddings.ipynb

https://medium.com/@martinpella/how-to-use-pre-trained-word-embeddings-in-pytorch-71ca59249f76

In [28]:
import torch
from torchtext import data
from torchtext import datasets
import random


In [29]:


# PREPARE DATA TO TRANSLATE FROM ENGLISH TO ITALIAN (set to false, so english is lang 1)

#### change language here to ita from itS when do final run
input_lang, output_lang, pairs = prepareData('eng', 'ita', False)
print(random.choice(pairs))

Reading lines...
Read 331799 sentence pairs
Trimmed to 314351 sentence pairs
Counting words...
Counted words:
eng 12427
ita 25472
['they named their dog lucky', 'loro chiamarono il loro cane lucky']


In [30]:

import torchtext.vocab

glove = torchtext.vocab.GloVe(name = '6B', dim = 100)

print(f'There are {len(glove.itos)} words in the vocabulary')

There are 400000 words in the vocabulary


The returned GloVe object includes attributes:

-stoi string-to-index returns a dictionary of words to indexes

-itos index-to-string returns an array of words by index

-vectors returns the actual vectors. To get a word vector get the index to get the vector:

In [31]:
# dimensions of vectors
glove.vectors.shape

torch.Size([400000, 100])

In [32]:
def get_embedding(word):
    return glove.vectors[glove.stoi[word]]

In [33]:
# top 10 words
glove.itos[:10]

['the', ',', '.', 'of', 'to', 'and', 'in', 'a', '"', "'s"]

In [34]:
glove.stoi['horse']

2867

In [35]:
get_embedding('horse')

tensor([ 0.5179,  0.3962,  0.2472, -0.5844, -0.1586, -0.1102,  0.2165,  0.4236,
        -0.3006, -0.6232, -0.3979,  0.8369,  0.9012, -0.5920, -0.4431,  0.5162,
         0.4842, -0.1131,  0.2091, -0.0192,  0.7597, -0.0308, -0.0892, -1.2704,
         0.6240,  0.5687, -1.0596, -0.3601,  0.2804,  0.0514, -0.4561, -0.3268,
         0.3975, -0.3634,  0.9031,  0.6263, -1.2037,  0.5927,  0.2759,  0.3832,
         0.4349,  0.0909, -0.0571, -1.3353, -0.1076, -0.0985, -0.4497, -0.5063,
         0.4927, -1.0391, -0.6060,  0.0519,  0.5007,  0.6970,  0.2069, -1.4953,
        -0.4424,  0.6916,  1.0389,  0.1846,  0.4320,  0.9502,  0.5237,  1.2069,
         1.1042,  0.5904, -0.5393, -0.1903,  0.4055,  0.5285, -0.3223,  0.0379,
         0.0969,  0.3456,  0.3104,  0.1461,  0.2462,  0.1726, -0.5111, -0.4004,
         0.3375,  0.2234, -0.0548,  0.4452, -0.1311, -0.2883, -0.9907, -0.2414,
         0.3735, -0.2088, -0.8067, -1.0341,  0.2646,  0.4033, -0.5369, -0.5935,
        -0.5144, -0.1386,  0.9348, -0.40

In [36]:
# Get a list of all english words in our dataset
print("Number of pairs, or translated phrases, is:", len(pairs))
eng_words = [el[0] for el in pairs]
a = []
for word in eng_words:
  a += word.split()
eng_vocabulary = list(set(a))
print('Number of unique English words in our corpus:', len(eng_vocabulary))
print('Some examples are', eng_vocabulary[3000:3500])




Number of pairs, or translated phrases, is: 314351
Number of unique English words in our corpus: 12424
Some examples are ['bred', 'expressway', 'tune', 'failure', 'deed', 'gets', 'barbarians', 'hibernate', 'acclaimed', 'mind', 'luxury', 'apes', 'walked', 'buddhist', 'minimum', 'finding', 'molecule', 'quoted', 'cards', 'experimented', 'foretell', 'breed', 'buries', 'tied', 'slaughter', 'xiaoli', 'championship', 'killers', 'grilled', 'deadend', 'perspiring', 'status', 'pepperoni', 'unsettled', 'spies', 'frighten', 'absurdly', 'devious', 'administrative', 'father', 'criticisms', 'abolished', 'reader', 'professors', 'september', 'bookworm', 'collapse', 'shiro', 'piling', 'restricted', 'tickle', 'nobel', 'refreshments', 'bookshelf', 'tvs', 'protege', 'slaves', 'crime', 'johns', 'faintly', 'baptized', 'gymnast', 'whackjob', 'specialties', 'photocopier', 'kissing', 'mortal', 'wants', 'insolent', 'bungee', 'negotiable', 'peeked', 'body', 'sympathetically', 'pilots', 'custom', 'companies', 'pic

We must build a matrix of weights that will be loaded into the PyTorch embedding layer. Its shape will be equal to:
(dataset’s vocabulary length, word vectors dimension).
For each word in dataset’s vocabulary, we check if it is on GloVe’s vocabulary. If it do it, we load its pre-trained word vector. Otherwise, we initialize a random vector.

In [37]:
# This finds the weight matrix from GloVe for each English word in our corpus

emb_dim = 100
#matrix_len = len(eng_vocabulary)
matrix_len = input_lang.n_words
weights_matrix = np.zeros((matrix_len, 100))
words_found = 0

for i, word in enumerate(eng_vocabulary):
    try: 
        weights_matrix[i] = get_embedding(word)
        words_found += 1
    except KeyError:
        weights_matrix[i] = np.random.normal(scale=0.6, size=(emb_dim, ))
        

In [38]:
# take a look at the values we have for a couple examples, at index i = ?

i = 334
print("word is", eng_vocabulary[i])
print("index is", i)
print('weight matrix is', weights_matrix[i])

i = 1234
print("word is", eng_vocabulary[i])
print("index is", i)
print('weight matrix is', weights_matrix[i])


word is obstructed
index is 334
weight matrix is [ 0.013554   -0.043476    0.047454    0.54572999  0.022044    0.33779001
 -0.10829     0.92881     1.32959998 -0.43035999  0.32609999  0.80293
  0.22702    -0.13998     0.088069    0.0014781  -0.34150001  0.25917
 -0.059105   -0.45673001 -0.32931    -0.069961    0.17648999 -0.11162
  0.16632999 -0.097215   -0.46689999 -0.23040999  0.044397   -0.38117999
  0.93879002 -0.26333001 -0.37691     0.023119    0.059446    0.24518
 -0.011433    0.26422     0.0384      1.278      -0.84948999 -0.13229001
  0.62180001  0.46555999  0.14456999  0.2421      0.26716    -0.48725
  0.31935    -0.52974999  0.46819001 -0.063646   -0.0093993   0.53381002
 -0.20479     0.27785     1.30859995  0.31165999  0.28353    -0.44971001
 -0.41514999 -0.30500999  0.26673001  0.37079    -0.17572001 -0.49259001
  0.74541998 -0.23011     0.18179999 -0.54519999  0.099276   -0.30948001
  0.54930001 -0.72226     0.1312     -0.78991002  0.15332     0.21087
 -0.25734001 -0.2272

In [39]:
print(input_lang.n_words)
len(eng_vocabulary)

12427


12424

In [40]:
# Training

# PREPARE DATA TO TRANSLATE FROM ENGLISH TO ITALIAN (set to false, so english is lang 1)

#### change language here to ita from itS when do final run
input_lang2, output_lang2, pairs = prepareData('eng', 'ita', False)
print(random.choice(pairs))

Reading lines...
Read 331799 sentence pairs
Trimmed to 314351 sentence pairs
Counting words...
Counted words:
eng 12427
ita 25472
['dont you like working with me', 'non vi piace lavorare con me']


In [43]:
glove_model = True
hidden_size = 100
embedding = nn.Embedding(input_lang2.n_words, hidden_size)
embedding.weight.data.copy_(torch.from_numpy(weights_matrix))
encoder2 = EncoderRNN(input_lang2.n_words, hidden_size).to(device)
attn_decoder2 = AttnDecoderRNN(hidden_size, output_lang2.n_words, dropout_p=0.1).to(device)




In [44]:
# Run GLOVE model on English to Italy translations

# TRAIN MODEL
####
#trainIters(encoder2, attn_decoder2, 8000, print_every=1000,  learning_rate=0.1)
trainIters(encoder2, attn_decoder2, 250000, print_every=25000, learning_rate=0.01)

# EVALUATE
evaluateRandomly(encoder2, attn_decoder2)

# TEST ON FIVE SENTENCES
evaluateAndShowAttention("I am walking to the store.")
evaluateAndShowAttention("This dinner is delicious.")
evaluateAndShowAttention("My mother lived in a white house.")
evaluateAndShowAttention("He has three friends.")
evaluateAndShowAttention("I love good red wine.")


3m 31s (- 31m 45s) (25000 10%) 4.1393
7m 11s (- 28m 47s) (50000 20%) 3.9123
10m 56s (- 25m 31s) (75000 30%) 3.6975
14m 43s (- 22m 5s) (100000 40%) 3.5549
18m 33s (- 18m 33s) (125000 50%) 3.4339
22m 23s (- 14m 55s) (150000 60%) 3.3488
26m 14s (- 11m 14s) (175000 70%) 3.2837
30m 6s (- 7m 31s) (200000 80%) 3.2025
33m 58s (- 3m 46s) (225000 90%) 3.1509
37m 49s (- 0m 0s) (250000 100%) 3.1267
> i went there dozens of times
= sono andata la dozzine di volte
< io sono li <EOS>

> shes wearing a greatlooking hat
= lei indossa un cappello meraviglioso
< e e una una <EOS>

> you love christmas dont you
= tu ami il natale vero
< tu e vero vero <EOS>

> tom removed the bandages from marys leg
= tom ha tolto le bende dalla gamba di mary
< tom ha le il dal da <EOS>

> tom considered the possibility of not going
= tom ha considerato la possibilita di non andare
< tom ha il il di di <EOS>

> i dont feel like watching tv tonight
= non mi va di guardare la tv stasera
< non mi piace di di la <EOS>

> has 

#2.3. Input 5 well formed sentences from the English vocab to Model 2, and input the resultant translated sentences to Model 1. Display all model outputs in each case.

In [45]:
# PREPARE DATA TO TRANSLATE FROM ENGLISH TO ITALIAN (set to false, so english is lang 1)

#### change language here to ita from itS when do final run
input_lang, output_lang, pairs = prepareData('eng', 'ita', False)
print(random.choice(pairs))

Reading lines...
Read 331799 sentence pairs
Trimmed to 314351 sentence pairs
Counting words...
Counted words:
eng 12427
ita 25472
['ive heard it before', 'lho gia sentita']


In [46]:
# run Model 2 (English to Italian Glove model) to generate Italian sentence

glove_model = True
output_words2, attentions2 = evaluate(encoder2, attn_decoder2, "i have a big house in the city")
output_words2 = output_words2[0:len(output_words2)-1]
outputsentence=""
for i in output_words2:
    outputsentence += (i+" ")
print('English to Italian translation:', outputsentence)


English to Italian translation: ho una tempo tempo in 


In [47]:
# PREPARE DATA TO TRANSLATE FROM ITALIAN TO ENGLISH

#### change language here to ita from itS when do final run
input_lang, output_lang, pairs = prepareData('eng', 'ita', True)

print(random.choice(pairs))

Reading lines...
Reverse is true
Read 331799 sentence pairs
Trimmed to 314351 sentence pairs
Counting words...
Counted words:
ita 25472
eng 12427
['ho mal di testa', 'my head hurts']


In [48]:
# Run Model 1 to translate from Italian to English

glove_model = False
output_words1, attentions1 = evaluate(
    encoder1, attn_decoder1, outputsentence)
output_words1 = output_words1[0:len(output_words1)-1]
outputsentence1=""
for i in output_words1:
    outputsentence1 += (i+" ")
print('Italian back to English translation:', outputsentence1)

Italian back to English translation: i have a time to time 
