# **TUBES NLP**
### **Neural Machine Translation with Seq2Seq Architecture (Eng→Ina)**
### **Menggunakan GRU : max sentence length = 15**

#### Ruhiyah Faradishi Widiaputri
#### 13519034


# IMPORT NEEDED LIBRARIES

In [None]:
import json
import random
import re
from datetime import datetime

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# LOAD DATA

This NMT trains with ... dataset from IndoNLG ([https://github.com/IndoNLP/indonlg](https://github.com/IndoNLP/indonlg))

In [None]:
# read train data
%cd /content/drive/My Drive/Tahun 4/NLP/tubes-mt/MT_TED_MULTI/
train_data_dir = "train_preprocess.json"
val_data_dir = 'valid_preprocess.json'
test_data_dir = 'test_preprocess.json'
trained_model_encoder_path = "seq2seq_1/trained_model_encoder"
trained_model_decoder_path = "seq2seq_1/trained_model_decoder"

/content/drive/My Drive/Tahun 4/NLP/tubes-mt/MT_TED_MULTI


In [None]:
MAX_LENGTH = 15

In [None]:
SOS_token = 0
EOS_token = 1

class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

def normalize_string(s):
  s = s.lower()
  s = re.sub(r"([.!?])", r" \1", s)
  s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
  return s

def filterPair(p):
    return len(p[0].split(' ')) < MAX_LENGTH and \
        len(p[1].split(' ')) < MAX_LENGTH

def filterPairs(pairs):
    return [pair for pair in pairs if filterPair(pair)]

def load_data(filename):
  f = open(filename)
  json_data = json.load(f)
  data = []
  for j in json_data:
    text = normalize_string(j["text"])
    label = normalize_string(j["label"])
    data.append([text, label])
  return data

# define input and output lang
input_lang = Lang("en")
output_lang = Lang("ina")

# load + normalize train data
train_data = load_data(train_data_dir)

# check how many sentence pairs
print("Read %s sentence pairs" % len(train_data))

# take only data train with len < 20
train_data = filterPairs(train_data)
print("Trimmed to %s sentence pairs" % len(train_data))
  
# add vocabulary
for tr in train_data:
  input_lang.addSentence(tr[0])
  output_lang.addSentence(tr[1])

print("Counted words:")
print(input_lang.name, input_lang.n_words)
print(output_lang.name, output_lang.n_words)

Read 87406 sentence pairs
Trimmed to 40063 sentence pairs
Counted words:
en 17725
ina 16806


# THE SEQ2SEQ MODEL

In [None]:
"""
***** LAYER-LAYER YANG DIPAKAI *****
  # Embedding layer
    - CLASS
      torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, 
      norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None, device=None, dtype=None)
    - Tabel lookup yang menyimpan embedding
    - ukuran vocab (num_embeddings) dan dimensi setiap vektor embedding (embedding_dim) tetap
    - pada dasarnya sama seperti linear layer tapi dia melakukan lookup instead of matrix-vector multiplication
      - The embedding weights and the linear layers weights are transposed to each other.
      - The embedding requires the sum(0)
    - masukan:
      - IntTensor atau LongTensor yang mengandung indeks-indeks yang akan diekstrak
    - keluaran: (shape input, embedding_dim)
  # GRU layer
    - CLASS:
      torch.nn.GRU(*args, **kwargs)
    - parameter-parameter:
        - input_size : banyak fitur di input x
        - hidden_size : banyak fitur hidden state h
        - num_layers : banyak layer (stack)
        - bias
        - batch_first
        - dropout
        - bidirectional
    - masukan: inputs, h_0
        - inputs: tensor dengan ukuran :
          - input tanpa batch : (sequence_length, input_size)
          - input dengan batch, batch_first=False (default) : (sequence_length, batch size, input_size)
          - input dengan batch, batch_first=True : (batch size, sequence_length, input_size)
        - h_0: tensor dengan ukuran :
          - (num_direction * num_layers, hidden_size), atau 
          - (num_direction * num_layers, batch_size, hidden_size)
    - keluaran: output, h_n
        - output: tensor dengan ukuran :
          - tanpa batch : (sequence_length, num_direction*hidden_size)
          - batch_first=False : (sequence_length, batch_size, num_direction*hidden_size)
          - batch_first=True : (batch_size, sequence_length, num_direction*hidden_size)
        - h_n: tensor dengan ukuran :
          - (num_direction*num_layers, hidden_size), atau
          - (num_direction*num_layers, batch_size, hidden_size)
  # Linear layer
    - CLASS:
      torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)
    - parameter-parameter:
      - in_features(int) : ukuran setiap sampel input
      - out_features(int) : ukuran setiap sampel output
      - bias(bool): ada bias atau tidak
  # Dropout layer
    - CLASS:
      torch.nn.Dropout(p=0.5, inplace=False)
    - selama training layer ini akan mengubah nilai beberapa elemen dari tensor input menjadi nol 
      secara random dengan probabilitas=p menggunakan sampel dari distribusi Bernoulli
    - terbukti menjadi teknik yang efektif untuk regularisasi (dapat mencegah underfitting atau overfitting) dan mencegah ko-adaptasi neuron

***** Fungsi-fungsi yang Dipakai *****
  # Softmax
    - torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)
    - parameter-parameter
      - input(Tensor) : input
      - dim(int): dimensi di mana softmax akan dihitung
      - dtype(torch.dtype, optional): tipe data tensor keluaran yang diinginkan (sebelum dioperasikan tensor input dicasting ke dtype ini)
    - return type: Tensor
  # relu
    - torch.nn.functional.relu(input, inplace=False)
    - element-wise fungsi ReLU: max(0,input)
  # log_softmax
    - torch.nn.functional.log_softmax(input, dim=None, _stacklevel=3, dtype=None)
    - log(softmax(x))
    - parameter sama seperti softmax
  # cat
    - torch.cat(tensors, dim=0, *, out=None)
    - mengkonkatenasi sekuens tensor dalam dimensi tertentu
    - parameter-parameter:
      - tensors(sequence of Tensors)
      - dim (int, optional); cth untuk tensor 2 dimensi:
        - dim = 0 --> konkatenasi baris
        - dim = 1 --> konkatenasi kolom
  # unsqueeze
    - mengubah dimensi tensor dari n menjadi n+1
    - parameter dim untuk menentukan sumbu mana dimensi baru harus berada
      berturut-turut dari luar ke lebih dalam
  # squeeze
    - torch.squeeze(input, dim=None)
    - mengembalikan tensor dengan dimensi input ukuran 1 dihapus
    - jika nilai dim diberikan dan di dimensi ke-dim itu sizenya=1 maka akan disqueeze di dim itu saja
  # bmm
    - torch.bmm(input, mat2, *, out=None)
    - melakukan perkalian batch matriks-matriks input dan mat2
    - input dan mat2 harus tensor 3D
    - jika input=tensor(bxnxm), mat2=tensor(bxmxp) maka out=tensor(bxnxp) 
  # topk
    - torch.topk(input, k, dim=None, largest=True, sorted=True, *, out=None)
    - mengembalikan k elemen terbesar (atau terkecil jika largest diset False) dari input yang diberikan pada dimensi yang diberikan
    - jika dim tidak diset diambil dimensi paling terakhir
  # detach
    - Tensor.detach()
    - mengembalikan tensor baru yang hasilnya tidak membutuhkan gradien
"""

'\n***** LAYER-LAYER YANG DIPAKAI *****\n  # Embedding layer\n    - CLASS\n      torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, \n      norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None, device=None, dtype=None)\n    - Tabel lookup yang menyimpan embedding\n    - ukuran vocab (num_embeddings) dan dimensi setiap vektor embedding (embedding_dim) tetap\n    - pada dasarnya sama seperti linear layer tapi dia melakukan lookup instead of matrix-vector multiplication\n      - The embedding weights and the linear layers weights are transposed to each other.\n      - The embedding requires the sum(0)\n    - masukan:\n      - IntTensor atau LongTensor yang mengandung indeks-indeks yang akan diekstrak\n    - keluaran: (shape input, embedding_dim)\n  # GRU layer\n    - CLASS:\n      torch.nn.GRU(*args, **kwargs)\n    - parameter-parameter:\n        - input_size : banyak fitur di input x\n        - hidden_size : banyak fitur hidden state h\n    

In [None]:
# encoder
class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size                            

        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output = embedded
        output, hidden = self.gru(output, hidden)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

In [None]:
"""
=== ENCODER ===
Terdiri dari :
- 1 embedding layer
- 1 GRU layer

Forward:
  1. input masuk ke embedding layer
  2. hasil (1) diubah jadi berdimensi (1x1x_)
  3. hasil (2) dan hidden masukkan ke layer GRU
"""

'\n=== ENCODER ===\nTerdiri dari :\n- 1 embedding layer\n- 1 GRU layer\n\nForward:\n  1. input masuk ke embedding layer\n  2. hasil (1) diubah jadi berdimensi (1x1x_)\n  3. hasil (2) dan hidden masukkan ke layer GRU\n'

In [None]:
# decoder - with attention mechanism
class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, hidden, encoder_outputs):
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded)

        attn_weights = F.softmax(
            self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
        attn_applied = torch.bmm(attn_weights.unsqueeze(0),
                                 encoder_outputs.unsqueeze(0))

        output = torch.cat((embedded[0], attn_applied[0]), 1)
        output = self.attn_combine(output).unsqueeze(0)

        output = F.relu(output)
        output, hidden = self.gru(output, hidden)

        output = F.log_softmax(self.out(output[0]), dim=1)
        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

In [None]:
"""
=== DECODER ===
Menggunakan mekanisme atensi
Terdiri dari:
- 1 embedding layer
- 3 linear layer
- 1 dropout layer
- 1 gru layer

Forward:
  1. input masuk ke embedding layer
  2. hasil (1) diubah jadi berdimensi (1x1x_)
  3. hasil (2) masuk ke dropout layer
  3. embedded[0] dan hidden[0] dikonkatenasi
  4. hasil (3) dimasukkan ke linear layer
  5. hasil (4) diambil softmax --> jadi attention weight 
  6. hasil (5) dikenakan fungsi unsqueeze
  7. output encoder dikenakan fungsi unsqueeze
  8. hasil (6) dan (7) dikenakan fungsi bmm
  9. embedded[0] dan hasil (8) dikonkatenasi
  10. hasil (9) masuk ke linear layer
  11. hasil (10) dikenakan fungsi unsqueeze
  12. hasil (11) dikenakan fungsi relu
  13. hasil (12) masuk ke layer GRU
  14. hasil (13) masuk ke linear layer
  15. hasil (14) dikenakan fungsi log_softmax

"""

'\n=== DECODER ===\nMenggunakan mekanisme atensi\nTerdiri dari:\n- 1 embedding layer\n- 3 linear layer\n- 1 dropout layer\n- 1 gru layer\n\nForward:\n  1. input masuk ke embedding layer\n  2. hasil (1) diubah jadi berdimensi (1x1x_)\n  3. hasil (2) masuk ke dropout layer\n  3. embedded[0] dan hidden[0] dikonkatenasi\n  4. hasil (3) dimasukkan ke linear layer\n  5. hasil (4) diambil softmax --> jadi attention weight \n  6. hasil (5) dikenakan fungsi unsqueeze\n  7. output encoder dikenakan fungsi unsqueeze\n  8. hasil (6) dan (7) dikenakan fungsi bmm\n  9. embedded[0] dan hasil (8) dikonkatenasi\n  10. hasil (9) masuk ke linear layer\n  11. hasil (10) dikenakan fungsi unsqueeze\n  12. hasil (11) dikenakan fungsi relu\n  13. hasil (12) masuk ke layer GRU\n  14. hasil (13) masuk ke linear layer\n  15. hasil (14) dikenakan fungsi log_softmax\n\n'

# TRAINING THE MODEL

In [None]:
# preparing training data

# mengambil indeks dari setiap kata di sentence --> hasilnya list of indeks kata
def indexesFromSentence(lang, sentence):
    return [lang.word2index[word] if word in lang.word2index else 0 for word in sentence.split(' ')]

# mengembalikan tensor dengan elemennya adalah list of indeks kata yang dikonkatenasi dengan EOS_TOKEN (1)
def tensorFromSentence(lang, sentence):
    indexes = indexesFromSentence(lang, sentence)
    indexes.append(EOS_token)
    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)

# mengembalikan tuple (input_tensor, target_tensor)
def tensorsFromPair(pair):
    input_tensor = tensorFromSentence(input_lang, pair[0])
    target_tensor = tensorFromSentence(output_lang, pair[1])
    return (input_tensor, target_tensor)

In [None]:
# program for training
teacher_forcing_ratio = 0.5


def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length=MAX_LENGTH):
    # inisialisasi hidden state encoder dengan zeros(1x1xhidden_size)
    encoder_hidden = encoder.initHidden()

    # set gradien semua tensor yang dioptimasi jadi nol : supaya tidak terakumulasi dengan gradien yang sudah ada
    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    # hitung panjang input dan output
    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)

    # inisialisasi encoder_outputs dengan zeros(max_length x hidden_size)
    encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

    # inisialisasi nilai loss = 0
    loss = 0

    # untuk setiap input lakukan:
    for ei in range(input_length):
        # forward encoder
        encoder_output, encoder_hidden = encoder(
            input_tensor[ei], encoder_hidden)
        # update nilai encoder_outputs
        encoder_outputs[ei] = encoder_output[0, 0]

    # inisialisasi input decoder dengan tensor [[0]]
    decoder_input = torch.tensor([[SOS_token]], device=device)

    # inisialisasi nilai hidden state decoder menjadi sama dengan hidden state encoder
    decoder_hidden = encoder_hidden

    # dengan peluang teacher_forcing_ratio, tentukan apakah akan memakai teacher forcing atau tidak
    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    # jika memakai teacher forcing: 
    if use_teacher_forcing:
        # untuk setiap output lakukan:
        for di in range(target_length):
            # forward decoder
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            # update loss
            loss += criterion(decoder_output, target_tensor[di])
            # teacher forcing: Feed the target as the next input
            decoder_input = target_tensor[di]  

    # jika tidak menggunakan teacher forcing: use its own predictions as the next input
    else:
        # untuk setiap output lakukan:
        for di in range(target_length):
            # forward decoder
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            # ambil nilai decoder_output terbesar
            topv, topi = decoder_output.topk(1)
            # lepaskan dari histori sebagai input
            decoder_input = topi.squeeze().detach()  
            # update loss
            loss += criterion(decoder_output, target_tensor[di])
            # berhenti jika sudah menemukan token EOS
            if decoder_input.item() == EOS_token:
                break
    # lakukan backpropagation
    loss.backward()
    
    # inisiasi optimizer
    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length

In [None]:
# helper
import time
import math


def asMinutes(s):
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)


def timeSince(since, percent):
    now = time.time()
    s = now - since
    es = s / (percent)
    rs = es - s
    return '%s (- %s)' % (asMinutes(s), asMinutes(rs))

In [None]:
def trainIters(encoder, decoder, n_iters, print_every=1000, plot_every=100, learning_rate=0.01):
    # hitung waktu mulai
    start = time.time()

    plot_losses = []

    # Reset every print_every
    print_loss_total = 0 
    # Reset every plot_every 
    plot_loss_total = 0  

    # encoder menggunakan optimizer SGD (stochastic gradient descent)
    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    # decoder menggunakan optimizer SGD (stochastic gradient descent)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)

    # mengambil list of pasangan di train data secara acak sebanyak n_iters
    training_pairs = [tensorsFromPair(random.choice(train_data))
                      for i in range(n_iters)]

    # menggunakan loss negative log likelihood
    criterion = nn.NLLLoss()

    # untuk setiap pasangan di training_pairs
    for iter in range(1, n_iters + 1):
        training_pair = training_pairs[iter - 1]
        input_tensor = training_pair[0]
        target_tensor = training_pair[1]

        # lakukan training
        loss = train(input_tensor, target_tensor, encoder,
                     decoder, encoder_optimizer, decoder_optimizer, criterion)
        
        # update nilai loss total
        print_loss_total += loss
        plot_loss_total += loss

        # untuk menampilkan : jika sudah iterasi kelipatan print_every 
        if iter % print_every == 0:
            print_loss_avg = print_loss_total / print_every
            print_loss_total = 0
            perc = iter / n_iters * 100
            print('%s (%d %d%%) %.4f' % (timeSince(start, iter / n_iters),
                                         iter, perc , print_loss_avg))
            # save setiap iterasi kelipatan print_every
            torch.save(encoder1.state_dict(), f"{trained_model_encoder_path}_{perc}.pt")
            torch.save(attn_decoder1.state_dict(), f"{trained_model_decoder_path}_{perc}.pt")

        if iter % plot_every == 0:
            plot_loss_avg = plot_loss_total / plot_every
            plot_losses.append(plot_loss_avg)
            plot_loss_total = 0

    showPlot(plot_losses)

## TRAIN

In [None]:
# current time
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
print("Start Time =", current_time)

# train
hidden_size = 256
encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)

trainIters(encoder1, attn_decoder1, 50000, print_every=2500)
#trainIters(encoder1, attn_decoder1, 2, print_every=1)

Start Time = 00:31:52
18m 5s (- 343m 39s) (2500 5%) 5.1786
37m 1s (- 333m 13s) (5000 10%) 5.0110
56m 14s (- 318m 40s) (7500 15%) 4.9168
74m 53s (- 299m 33s) (10000 20%) 4.8180
93m 43s (- 281m 10s) (12500 25%) 4.7700
112m 53s (- 263m 24s) (15000 30%) 4.6974
132m 50s (- 246m 42s) (17500 35%) 4.6321
150m 52s (- 226m 18s) (20000 40%) 4.5590
170m 12s (- 208m 1s) (22500 45%) 4.5160
188m 33s (- 188m 33s) (25000 50%) 4.4032
206m 43s (- 169m 7s) (27500 55%) 4.4265
224m 52s (- 149m 54s) (30000 60%) 4.3667
242m 55s (- 130m 48s) (32500 65%) 4.3338
261m 28s (- 112m 3s) (35000 70%) 4.3451
279m 47s (- 93m 15s) (37500 75%) 4.2648
297m 56s (- 74m 29s) (40000 80%) 4.1958
316m 6s (- 55m 46s) (42500 85%) 4.1996
335m 17s (- 37m 15s) (45000 90%) 4.1624
354m 49s (- 18m 40s) (47500 95%) 4.1261
372m 35s (- 0m 0s) (50000 100%) 4.1754


In [None]:
from datetime import datetime

finish = datetime.now()

finish_time = finish.strftime("%H:%M:%S")
print("Finish Time =", finish_time)

Finish Time = 06:44:29


In [None]:
# save trained model - state dict
torch.save(encoder1.state_dict(), f"{trained_model_encoder_path}.pt")
torch.save(attn_decoder1.state_dict(), f"{trained_model_decoder_path}.pt")

# LOAD TRAINED MODEL

In [None]:
# redefine the model
hidden_size = 256
encoder_trained = EncoderRNN(input_lang.n_words, hidden_size).to(device)
attn_decoder_trained = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)

# load encoder trained model
encoder_trained.load_state_dict(torch.load(f"{trained_model_encoder_path}.pt"))
encoder_trained.eval()

EncoderRNN(
  (embedding): Embedding(17725, 256)
  (gru): GRU(256, 256)
)

In [None]:
# load decoder trained model
attn_decoder_trained.load_state_dict(torch.load(f"{trained_model_decoder_path}.pt"))
attn_decoder_trained.eval()

AttnDecoderRNN(
  (embedding): Embedding(16806, 256)
  (attn): Linear(in_features=512, out_features=15, bias=True)
  (attn_combine): Linear(in_features=512, out_features=256, bias=True)
  (dropout): Dropout(p=0.1, inplace=False)
  (gru): GRU(256, 256)
  (out): Linear(in_features=256, out_features=16806, bias=True)
)

# EVALUATE THE MODEL

In [None]:
# load validation data
# load + normalize train data
val_data = load_data(val_data_dir)

# check how many sentence pairs
print("Read %s sentence pairs" % len(val_data))

# take only data train with len < 20
val_data = filterPairs(val_data)
print("Trimmed to %s sentence pairs" % len(val_data))

Read 2677 sentence pairs
Trimmed to 1304 sentence pairs


In [None]:
def evaluate(encoder, decoder, sentence, max_length=MAX_LENGTH):
    with torch.no_grad():
        input_tensor = tensorFromSentence(input_lang, sentence)
        input_length = input_tensor.size()[0]
        encoder_hidden = encoder.initHidden()

        encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei],
                                                     encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]

        decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS

        decoder_hidden = encoder_hidden

        decoded_words = []
        decoder_attentions = torch.zeros(max_length, max_length)

        for di in range(max_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            decoder_attentions[di] = decoder_attention.data
            topv, topi = decoder_output.data.topk(1)
            if topi.item() == EOS_token:
                decoded_words.append('<EOS>')
                break
            else:
                decoded_words.append(output_lang.index2word[topi.item()])

            decoder_input = topi.squeeze().detach()

        return decoded_words, decoder_attentions[:di + 1]

In [None]:
from torchtext.data.metrics import bleu_score

def calculate_bleu(data, encoder, decoder, device, max_len = 50):
    
    trgs = []
    pred_trgs = []
    
    for datum in data:
        
        src = datum[0]
        trg = [o for o in datum[1].split(' ')]
        
        output_words, attentions = evaluate(encoder, decoder, src)
        
        #cut off <eos> token
        pred_trg = output_words[:-1]
        
        pred_trgs.append(pred_trg)
        trgs.append([trg])
        
    return bleu_score(pred_trgs, trgs)

In [None]:
bleu_scr = calculate_bleu(val_data, encoder_trained, attn_decoder_trained, device)

print(f'BLEU score = {bleu_scr*100:.10f}')

BLEU score = 5.2515426441


In [None]:
def evaluateRandomly(encoder, decoder, n=10):
    for i in range(n):
        pair = random.choice(val_data)
        print('>', pair[0])
        print('=', pair[1])
        output_words, attentions = evaluate(encoder, decoder, pair[0])
        output_sentence = ' '.join(output_words)
        print('<', output_sentence)
        print('')

In [None]:
evaluateRandomly(encoder_trained, attn_decoder_trained)

> that is me playing my imaginary piano .
= itu adalah saya memainkan piano khayalan saya .
< itu adalah saya saya saya . <EOS>

> she read two sentences .
= dia membaca dua kalimat .
< dia ia dua dua . . . <EOS>

> somebody who got educated here .
= yang mengenyam pendidikan di sana .
< yang yang ada di sini . <EOS>

> this is rising . comfort stays whole .
= ini semakin naik . kenyamanannya tetap .
< ini ini bukan . . <EOS>

> because of the hunger i was forced to drop out of school .
= karena kelaparan saya terpaksa putus sekolah .
< karena anak saya saya saya dari . . <EOS>

> and the truth is this guy can probably explain this to you .
= sebenarnya orang ini dapat menjelaskannya .
< dan dan adalah adalah adalah untuk anda dapat untuk . <EOS>

> and by that time the women are yelling and screaming inside the car .
= waktu itu semua perempuan berteriak di dalam mobil .
< dan saat ini membawa oleh wanita dan dan telah menjadi . <EOS>

> it gave me confidence . it gave me a career .
=