# Baseline code for Machine Translation system using standard Seq2seq model

In this notebook, we will provide the implementation of a machine translation system using standard Seq2seq model (discussed in Week 2 of our course). This baseline code can be considered as a starting point for the first question of Lab4 where we need to build an effective model than can translate sentences from Portuguese language to English language. Alternatively, we can ignore this baseline code and build our machine translation model by writing modules from scratch. Look at Lab4.ipynb for futher information.

This notebook assumes the following:
* the data is at `./drive/My Drive/machine_translation`
* the model checkpoint is at each epoch and other intermediate files will be stored at `./drive/My Drive/Colab Notebooks/ckpt_mt_lab4`
* the scorer.py is at `./drive/My Drive/scorer.py`






In [0]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
# required libraries

import unicodedata
import string
import re
import random
import time
import datetime
import math

import torch
import torch.nn as nn
from torch.autograd import Variable
from torch import optim
import torch.nn.functional as F
from torch.nn.utils.rnn import pad_packed_sequence, pack_padded_sequence
import torchtext
from torchtext.datasets import TranslationDataset

import spacy
import numpy as np
from nltk.translate.bleu_score import corpus_bleu

# set the device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cuda


## Data preprocessing

In [0]:
'''
tokenization code
'''

!python -m spacy download pt_core_news_sm
!python -m spacy download en_core_web_sm

import pt_core_news_sm
import en_core_web_sm

spacy_pt = pt_core_news_sm.load()
spacy_en = en_core_web_sm.load()

def tokenize_pt(text):
    """
    Tokenizes Portuguese text from a string into a list of strings (tokens)
    """
    return [tok.text for tok in spacy_pt.tokenizer(text)]

def tokenize_en(text):
    """
    Tokenizes English text from a string into a list of strings (tokens)
    """
    return [tok.text for tok in spacy_en.tokenizer(text)]

'''
define field
'''
SRC = torchtext.data.Field(tokenize = tokenize_pt, 
            init_token = '<sos>', 
            eos_token = '<eos>', 
            lower = True)
TRG = torchtext.data.Field(tokenize = tokenize_en, 
            init_token = '<sos>', 
            eos_token = '<eos>', 
            lower = True)

'''
load the data
'''
train_data = torchtext.data.TabularDataset(
    path='./drive/My Drive/machine_translation/mt-train-pt2en.tsv', 
    format='tsv', skip_header=True, fields=[('SRC', SRC), ('TRG', TRG)])
valid_data = torchtext.data.TabularDataset(
    path='./drive/My Drive/machine_translation/mt-valid-pt2en.tsv', 
    format='tsv', skip_header=True, fields=[('SRC', SRC), ('TRG', TRG)])
test_data = torchtext.data.TabularDataset(
    path='./drive/My Drive/machine_translation/mt-test-pt2en_onlySRCpt.tsv', 
    format='tsv', skip_header=True, fields=[('SRC', SRC)]) # blind test data (that is, no targets)

print(f"Number of training examples: {len(train_data.examples)}")
print(f"Number of validation examples: {len(valid_data.examples)}")
print(f"Number of testing examples: {len(test_data.examples)}")

'''
build the vocabulary
'''
TRG.build_vocab(train_data, min_freq=2)
SRC.build_vocab(train_data, min_freq=2)
print(f"Unique tokens in source (fr) vocabulary: {len(SRC.vocab)}")
print(f"Unique tokens in target (en) vocabulary: {len(TRG.vocab)}")

'''
create the iterator
'''
train_iter = torchtext.data.BucketIterator(train_data, batch_size=16, device=device, sort_key=lambda x: len(x.SRC), sort_within_batch=True)
valid_iter = torchtext.data.BucketIterator(valid_data, batch_size=256, device=device, sort_key=lambda x: len(x.SRC), sort_within_batch=True)
test_iter = torchtext.data.Iterator(test_data, batch_size=256, device=device, sort=False, sort_key=None, shuffle=False, sort_within_batch=False)

'''
print sample batch
'''
# print first batch of training data
print('training batch')
for batch in train_iter:
    src = batch.SRC
    trg = batch.TRG
    print('tensor size of source language:', src.shape)
    print('tensor size of target language:', trg.shape)
    break

# print first batch of validation data
print('validation batch')
for batch in valid_iter:
    src = batch.SRC
    trg = batch.TRG
    print('tensor size of source language:', src.shape)
    print('tensor size of target language:', trg.shape)
    break

# print first batch of test data
print('(blind) test batch')
for batch in test_iter:
    src = batch.SRC
    print('tensor size of source language:', src.shape)
    break

# save the field
import pickle
with open("./drive/My Drive/Colab Notebooks/ckpt_mt_lab4/TRG.Field","wb")as f:
     pickle.dump(TRG,f)

with open("./drive/My Drive/Colab Notebooks/ckpt_mt_lab4/SRC.Field","wb")as f:
     pickle.dump(SRC,f)



[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('pt_core_news_sm')
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
Number of training examples: 85918
Number of validation examples: 4772
Number of testing examples: 4738
Unique tokens in source (fr) vocabulary: 7891
Unique tokens in target (en) vocabulary: 2320
training batch
tensor size of source language: torch.Size([10, 16])
tensor size of target language: torch.Size([12, 16])
validation batch
tensor size of source language: torch.Size([8, 256])
tensor size of target language: torch.Size([11, 256])
(blind) test batch
tensor size of source language: torch.Size([16, 256])


## Seq2seq model

In [0]:
class Encoder(nn.Module):
    def __init__(self, input_dim, emb_dim, enc_hid_dim,n_layers, dropout):
        super().__init__()

        self.emb_dim = emb_dim
        self.enc_hid_dim = enc_hid_dim
        self.dropout = dropout
        self.n_layers = n_layers

        self.embedding = nn.Embedding(input_dim, emb_dim)
        self.lstm = nn.LSTM(emb_dim, enc_hid_dim, n_layers, dropout=dropout)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, src):
        
        #src = [src len, batch size]
        
        embedded = self.dropout(self.embedding(src))
        
        #embedded = [src len, batch size, emb dim]
        
        outputs, (hidden, cell) = self.lstm(embedded)
       
        # outputs are always from the top hidden layer, if bidirectional outputs are concatenated.
        # outputs shape [sequence_length, batch_size, hidden_dim * num_directions]
        # hidden is of shape [num_layers * num_directions, batch_size, hidden_size]
        # cell is of shape [num_layers * num_directions, batch_size, hidden_size]
        
        return hidden, cell

class Decoder(nn.Module):
    def __init__(self, output_dim, emb_dim, dec_hid_dim, n_layers, dropout):
        super().__init__()

        self.emb_dim = emb_dim
        self.output_dim = output_dim
        self.dec_hid_dim = dec_hid_dim
        self.n_layers = n_layers
        self.dropout = dropout

        self.embedding = nn.Embedding(output_dim, emb_dim)
        self.lstm = nn.LSTM(emb_dim, dec_hid_dim, n_layers, dropout=dropout)
        self.fc_out = nn.Linear(dec_hid_dim, output_dim)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, input, hidden, cell):
             
        # input is of shape [batch_size]
        # hidden is of shape [n_layer * num_directions, batch_size, hidden_size]
        # cell is of shape [n_layer * num_directions, batch_size, hidden_size]
        
        input = input.unsqueeze(0)
        
        # input shape is [1, batch_size]. reshape is needed rnn expects a rank 3 tensors as input.
        # so reshaping to [1, batch_size] means a batch of batch_size each containing 1 index.
        
        embedded = self.dropout(self.embedding(input))
        
        #embedded = [1, batch size, emb dim]    
        output, (hidden, cell) = self.lstm(embedded, (hidden, cell))
        
        # output shape is [sequence_len, batch_size, hidden_dim * num_directions]
        # hidden shape is [num_layers * num_directions, batch_size, hidden_dim]
        # cell shape is [num_layers * num_directions, batch_size, hidden_dim]

        # sequence_len and num_directions will always be 1 in the decoder.
        # output shape is [1, batch_size, hidden_dim]
        # hidden shape is [num_layers, batch_size, hidden_dim]
        # cell shape is [num_layers, batch_size, hidden_dim]
        
        prediction = self.fc_out(hidden.squeeze(0)) # linear expects as rank 2 tensor as input
        # predicted shape is [batch_size, output_dim]
        
        return prediction, hidden, cell

class Seq2Seq(nn.Module):
    ''' This class contains the implementation of complete sequence to sequence network.
    It uses to encoder to produce the context vectors.
    It uses the decoder to produce the predicted target sentence.
    Args:
        encoder: A Encoder class instance.
        decoder: A Decoder class instance.
    '''
    def __init__(self, encoder, decoder, device):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.device = device

    def forward(self, src, trg, teacher_forcing_ratio=0.5):
        # src is of shape [sequence_len, batch_size]
        # trg is of shape [sequence_len, batch_size]
        # if teacher_forcing_ratio is 0.5 we use ground-truth inputs 50% of time and 50% time we use decoder outputs.

        batch_size = trg.shape[1]
        max_len = trg.shape[0]
        trg_vocab_size = self.decoder.output_dim

        # to store the outputs of the decoder
        outputs = torch.zeros(max_len, batch_size, trg_vocab_size).to(self.device)

        # context vector, last hidden and cell state of encoder to initialize the decoder
        hidden, cell = self.encoder(src)

        # first input to the decoder is the <sos> tokens
        input = trg[0, :]

        for t in range(1, max_len):
            output, hidden, cell = self.decoder(input, hidden, cell)
            outputs[t] = output
            use_teacher_force = random.random() < teacher_forcing_ratio
            top1 = output.max(1)[1]
            input = (trg[t] if use_teacher_force else top1)

        # outputs is of shape [sequence_len, batch_size, output_dim]
        return outputs


## Model training

In [0]:
'''
hyperparameters
'''
INPUT_DIM = len(SRC.vocab)
OUTPUT_DIM = len(TRG.vocab)
ENC_EMB_DIM = 128
DEC_EMB_DIM = 128
ENC_HID_DIM = 256
DEC_HID_DIM = 256
ENC_DROPOUT = 0.5
DEC_DROPOUT = 0.5
N_LAYERS = 1
LEARNING_RT = 0.001
N_EPOCHS = 5
CLIP = 1

'''
instantiate the model
'''
enc = Encoder(INPUT_DIM, ENC_EMB_DIM, ENC_HID_DIM, N_LAYERS, ENC_DROPOUT)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, DEC_HID_DIM, N_LAYERS, DEC_DROPOUT)
model = Seq2Seq(enc, dec,device).to(device)
optimizer = optim.Adam(model.parameters(), lr = LEARNING_RT)
TRG_PAD_IDX = TRG.vocab.stoi[TRG.pad_token]
print('<pad> token index: ',TRG_PAD_IDX)
## we will ignore the pad token in true target set
criterion = nn.CrossEntropyLoss(ignore_index = TRG_PAD_IDX)

'''
initialize the model weights
'''
def init_weights(m):
    for name, param in m.named_parameters():
        if 'weight' in name:
            nn.init.normal_(param.data, mean=0, std=0.01)
        else:
            nn.init.constant_(param.data, 0)
model.apply(init_weights)

'''
calculate the number of parameters
'''
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'The model has {count_parameters(model):,} trainable parameters')

'''
compute the loss function with first training batch
'''
clip = 1
model.train()

for i, batch in enumerate(train_iter):

    src = batch.SRC
    trg = batch.TRG
    optimizer.zero_grad()

    output = model(src, trg)
    #trg = [trg len, batch size]
    #output = [trg len, batch size, output dim]

    output_dim = output.shape[-1]

    output = output[1:].view(-1, output_dim)
    trg = trg[1:].view(-1)

    #trg = [(trg len - 1) * batch size]
    #output = [(trg len - 1) * batch size, output dim]

    loss = criterion(output, trg)

    loss.backward()

    torch.nn.utils.clip_grad_norm_(model.parameters(), CLIP)

    optimizer.step()
    
    print('loss from first training batch')
    print(loss/src.shape[1])
    break

'''
Full training helper functions
'''

def train(model, iterator, optimizer, criterion, clip):
    
    model.train()
    
    epoch_loss = 0
    
    for i, batch in enumerate(iterator):
        
        src = batch.SRC
        trg = batch.TRG
        
        optimizer.zero_grad()
        
        output = model(src, trg)
        
        #trg = [trg len, batch size]
        #output = [trg len, batch size, output dim]
        
        output_dim = output.shape[-1]
        
        output = output[1:].view(-1, output_dim)
        trg = trg[1:].view(-1)
        
        # loss function works only 2d logits, 1d targets
        # so flatten the trg, output tensors. Ignore the <sos> token
        # trg shape shape should be [(sequence_len - 1) * batch_size]
        # output shape should be [(sequence_len - 1) * batch_size, output_dim]
        
        loss = criterion(output, trg)
        
        loss.backward()
        
        torch.nn.utils.clip_grad_norm_(model.parameters(), CLIP)
        
        optimizer.step()
        
        epoch_loss += loss.item()
        
    return epoch_loss / len(iterator)

def evaluate(model, iterator, criterion):
    
    model.eval()
    
    epoch_loss = 0
    
    with torch.no_grad():
    
        for i, batch in enumerate(iterator):

            src = batch.SRC
            trg = batch.TRG

            output = model(src, trg, 0) #turn off teacher forcing

            #trg = [trg len, batch size]
            #output = [trg len, batch size, output dim]

            output_dim = output.shape[-1]
            
            output = output[1:].view(-1, output_dim)
            trg = trg[1:].view(-1)

            #trg = [(trg len - 1) * batch size]
            #output = [(trg len - 1) * batch size, output dim]

            loss = criterion(output, trg)

            epoch_loss += loss.item()
        
    return epoch_loss / len(iterator)

# convert index to text string
def convert_itos(convert_vocab, token_ids):
    list_string = []
    for i in token_ids:
        if i == convert_vocab.vocab.stoi['<eos>']:
            break
        else:
            token = convert_vocab.vocab.itos[i]
            list_string.append(token)
    return list_string

def evaluate_bleu(model, eval_iter, src_vocab, trg_vocab, attention = True, max_trg_len = 64):
    '''
    Function for evaluating translation using BLEU

    Input: 
    model: translation model;
    eval_iter: iterator over the evaluation data
    src_vocab: Source torchtext Field
    trg_vocab: Target torchtext Field
    attention: the model returns attention weights or not.
    max_trg_len: the maximal length of translation text (optinal), default = 64

    Output:
    Corpus BLEU score.
    '''

    model.eval()
    all_trg = []
    all_translated_trg = []

    TRG_PAD_IDX = trg_vocab.vocab.stoi[trg_vocab.pad_token]

    with torch.no_grad():
    
        for i, batch in enumerate(eval_iter):

            src = batch.SRC
            #src = [src len, batch size]

            trg = batch.TRG
            #trg = [trg len, batch size]

            batch_size = trg.shape[1]

            # create a placeholder for traget language with shape of [max_trg_len, batch_size] where all the elements are the index of <pad>. Then send to device
            trg_placeholder = torch.Tensor(max_trg_len, batch_size)
            trg_placeholder.fill_(TRG_PAD_IDX)
            trg_placeholder = trg_placeholder.long().to(device)
            if attention == True:
              output,_ = model(src, trg_placeholder, 0) #turn off teacher forcing
            else:
              output = model(src, trg_placeholder, 0) #turn off teacher forcing
            # get translation results, we ignor first token <sos> in both translation and target sentences. 
            # output_translate = [(trg len - 1), batch, output dim] output dim is size of target vocabulary.
            output_translate = output[1:]
            # store gold target sentences to a list 
            all_trg.append(trg[1:].cpu())

            # Choose top 1 word from decoder's output, we get the probability and index of the word
            prob, token_id = output_translate.data.topk(1)
            translation_token_id = token_id.squeeze(2).cpu()

            # store predicted target sentences to a list 
            all_translated_trg.append(translation_token_id)
      
    all_gold_text = []
    all_translated_text = []
    for i in range(len(all_trg)): 
        cur_gold = all_trg[i]
        cur_translation = all_translated_trg[i]
        for j in range(cur_gold.shape[1]):
            gold_convered_strings = convert_itos(trg_vocab,cur_gold[:,j])
            trans_convered_strings = convert_itos(trg_vocab,cur_translation[:,j])

            all_gold_text.append(gold_convered_strings)
            all_translated_text.append(trans_convered_strings)

    corpus_all_gold_text = [[item] for item in all_gold_text]
    corpus_bleu_score = corpus_bleu(corpus_all_gold_text, all_translated_text)  
    return corpus_bleu_score

def epoch_time(start_time, end_time):
    elapsed_time = end_time - start_time
    elapsed_mins = int(elapsed_time / 60)
    elapsed_secs = int(elapsed_time - (elapsed_mins * 60))
    return elapsed_mins, elapsed_secs

'''
kickstart full training
'''

best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
    
    start_time = time.time()
    
    train_loss = train(model, train_iter, optimizer, criterion, CLIP)
    valid_loss = evaluate(model, valid_iter, criterion)
    valid_bleu = evaluate_bleu(model, valid_iter, SRC, TRG, attention = False, max_trg_len = 64)

    end_time = time.time()
    
    epoch_mins, epoch_secs = epoch_time(start_time, end_time)
    
    # Create checkpoint at end of each epoch
    state_dict_model = model.state_dict() 
    state = {
        'epoch': epoch,
        'state_dict': state_dict_model,
        'optimizer': optimizer.state_dict()
        }

    torch.save(state, "./drive/My Drive/Colab Notebooks/ckpt_mt_lab4/seq2seq_"+str(epoch+1)+".pt")

    print(f'Epoch: {epoch+1:02} | Time: {epoch_mins}m {epoch_secs}s')
    print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')
    print(f'\t Val. Loss: {valid_loss:.3f} |  Val. PPL: {math.exp(valid_loss):7.3f} | Val. BLEU: {valid_bleu:7.3f}')


  "num_layers={}".format(dropout, num_layers))


<pad> token index:  1
The model has 2,693,776 trainable parameters
loss from first training batch
tensor(0.4843, device='cuda:0', grad_fn=<DivBackward0>)
Epoch: 01 | Time: 1m 49s
	Train Loss: 3.590 | Train PPL:  36.217
	 Val. Loss: 4.542 |  Val. PPL:  93.832 | Val. BLEU:   0.044
Epoch: 02 | Time: 1m 52s
	Train Loss: 1.743 | Train PPL:   5.717
	 Val. Loss: 4.976 |  Val. PPL: 144.883 | Val. BLEU:   0.090
Epoch: 03 | Time: 1m 52s
	Train Loss: 0.740 | Train PPL:   2.095
	 Val. Loss: 5.575 |  Val. PPL: 263.731 | Val. BLEU:   0.092
Epoch: 04 | Time: 1m 51s
	Train Loss: 0.332 | Train PPL:   1.393
	 Val. Loss: 6.112 |  Val. PPL: 451.131 | Val. BLEU:   0.100
Epoch: 05 | Time: 1m 52s
	Train Loss: 0.184 | Train PPL:   1.201
	 Val. Loss: 6.543 |  Val. PPL: 694.373 | Val. BLEU:   0.102


## Preparing Predictions from Test Data

In [0]:
'''
load fields saved during preprocessing
'''
with open("./drive/My Drive/Colab Notebooks/ckpt_mt_lab4/TRG.Field","rb") as f:
     TRG_saved = pickle.load(f)

with open("./drive/My Drive/Colab Notebooks/ckpt_mt_lab4/SRC.Field","rb") as f:
     SRC_saved = pickle.load(f)

'''
hyperparameters (ensure the following hyperparameters match with those used during training of the best model)
'''
INPUT_DIM = len(SRC_saved.vocab)
OUTPUT_DIM = len(TRG_saved.vocab)
ENC_EMB_DIM = 128
DEC_EMB_DIM = 128
ENC_HID_DIM = 256
DEC_HID_DIM = 256
ENC_DROPOUT = 0.5
DEC_DROPOUT = 0.5
N_LAYERS = 1

'''
instantiate the model
'''
enc = Encoder(INPUT_DIM, ENC_EMB_DIM, ENC_HID_DIM, N_LAYERS, ENC_DROPOUT)
dec = Decoder(OUTPUT_DIM, DEC_EMB_DIM, DEC_HID_DIM, N_LAYERS, DEC_DROPOUT)
model_best = Seq2Seq(enc, dec,device).to(device)

'''
load the checkpoint corresponding to the best epoch (usually epoch with highest validation BLEU score)
'''
model_best.load_state_dict(torch.load('./drive/My Drive/Colab Notebooks/ckpt_mt_lab4/seq2seq_1.pt')['state_dict'])
model_best = model_best.to(device)

'''
generate translations for all the sentences in test data
'''
def generate_translations(model, eval_iter, trg_vocab, attention = True, max_trg_len = 64):
  '''
    Function for generating translation by model inference

    Input: 
    model: translation model;
    eval_iter: iterator over the evaluation data
    trg_vocab: Target torchtext Field
    attention: the model returns attention weights or not.
    max_trg_len: the maximal length of translation text (optional), default = 64

    Output:
    List of translated sentences
  '''
  model.eval()
  all_translation_word_ids = []
  for batch in test_iter:
    src = batch.SRC
    #src = [src len, batch size]
    batch_size = src.shape[1]

    # create a placeholder for target language with shape of [max_trg_len, batch_size] where all the elements are the index of <pad>. Then send to device
    trg_placeholder = torch.Tensor(max_trg_len, batch_size)
    trg_placeholder.fill_(TRG_PAD_IDX)
    trg_placeholder = trg_placeholder.long().to(device)
    if attention == True:
      output,_ = model(src, trg_placeholder, 0) #turn off teacher forcing
    else:
      output = model(src, trg_placeholder, 0) #turn off teacher forcing
    # get translation results, we ignore first token <sos> in both translation and target sentences. 
    # output_translate = [(trg len - 1), batch, output dim] output dim is size of target vocabulary.
    output_translate = output[1:]

    # Choose top 1 word from decoder's output, we get the probability and index of the word
    prob, token_id = output_translate.data.topk(1)
    translation_token_id = token_id.squeeze(2).cpu()

    # store gold target sentences to a list 
    all_translation_word_ids.append(translation_token_id)
  
  all_translation_text = []
  for i in range(len(all_translation_word_ids)):
    cur_translation_batch = all_translation_word_ids[i]
    for j in range(cur_translation_batch.shape[1]):
      trans_convered_strings = convert_itos(trg_vocab, cur_translation_batch[:,j])
      all_translation_text.append(' '.join(trans_convered_strings)) # convert list of words to text
  
  return all_translation_text

# translate all the sentences in the test set      
test_predictions = generate_translations(model, test_iter, TRG_saved, attention = False, max_trg_len = 64)


  "num_layers={}".format(dropout, num_layers))


## Create submission file using test predictions

In [0]:
'''
write the translations to a file. this function will help you generate the submission file for the first question.
'''
def out_prediction(first_name, last_name, save_directory, prediction_list):
    """
    out_prediction takes four input varibles: first_name, last_name, save_directory, prediction_list
    <first_name>, string, your first name, e.g., Tom
    <last_name>, string, your last name, e.g., Smith
    <save_directory>, string, directory to save the submission file, e.g., ./drive/My Drive/Colab Notebooks/ckpt_mt_lab4
    <prediction_list>, list of string which includes all your predictions (or translations) of TEST samples
          e.g., ['This is the translation of my first sentence','This is the translation of my second sentence',...]
                        
    Generate a file is named with <yourfirstname>_<yourlastname>_MACHINE_TRANSLATION_PRED.txt in current directory
    """
    absolute_file_path = "{}/{}_{}_MACHINE_TRANSLATION_PRED.txt".format(save_directory, first_name,last_name)
    output_file = open(absolute_file_path,'w')
    output_file.write("English (trg)\n")
    for item in prediction_list:
        output_file.write(item+"\n")
    output_file.close()
    print("submission file for the first question successfully saved at %s"%absolute_file_path)

# provide your firstname and lastname as arguments to out_prediction
out_prediction('firstname', 'lastname', './drive/My Drive/Colab Notebooks/ckpt_mt_lab4', test_predictions)

submission file for the first question successfully saved at ./drive/My Drive/Colab Notebooks/ckpt_mt_lab4/firstname_lastname_MACHINE_TRANSLATION_PRED.txt


## Compute BLEU score using scorer.py

We will compute BLEU score based on dummy test predictions (dummy-example-mt-test-gold.tsv) and dummy gold test data (dummy-example-mt-test-gold.tsv).

In [0]:
"""
Python code to evaluate the system outputs using BLEU

usage format:
> python scorer.py <task> <gold-file> <pred_file>

example usage for Machine Translation (Problem 1):
> python scorer.py mt prob1/dummy-example-mt-test-gold.tsv prob1/dummy-example-mt-test-pred.tsv
"""

!python ./drive/"My Drive"/scorer.py mt ./drive/"My Drive"/machine_translation/dummy-example-mt-test-gold.tsv ./drive/"My Drive"/machine_translation/dummy-example-mt-test-pred.tsv



OVERALL SCORES:
Cumulative 1-gram: 0.847155
Cumulative 2-gram: 0.801876
Cumulative 3-gram: 0.750307
Cumulative 4-gram: 0.695157
BLEU (default, that is, Cumulative 4-gram): 0.695157
