# neuralmt: default program

In [None]:
from default import *
import os, sys

## Run the default solution on dev

In [None]:
model = Seq2Seq(build=False)
model.load(os.path.join('data', 'seq2seq_E049.pt'))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
# loading test dataset
test_iter = loadTestData(os.path.join('data', 'input', 'dev.txt'), model.fields['src'],
                            device=device, linesToLoad=sys.maxsize)
results = translate(model, test_iter) # Warning: will take >5mins depending on your machine
print("\n".join(results))

## Evaluate the default output

In [None]:
from bleu_check import bleu
ref_t = []
with open(os.path.join('data','reference','dev.out')) as r:
    ref_t = r.read().strip().splitlines()
print(bleu(ref_t, results))

## Documentation

For the baseline implementation of adding attention to the sequence to sequence model, the only changes to the default solution is to the calculate alpha, and forward methods.

In [None]:
def calcAlpha(self, decoder_hidden, encoder_out):
        """
        param encoder_out: (seq, batch, dim),
        param decoder_hidden: (seq, batch, dim)
        """
        scores = torch.tanh(self.W_enc(encoder_out)+ self.W_dec(decoder_hidden))
        out = torch.transpose(self.V_att(scores),0,1)
        alpha = torch.nn.functional.softmax(out, dim=1)
        return alpha

For the alpha function, we are calculating the weights for additive attention. There was a mismatch with dimensions after the tanh function that was patched with transposing the output before putting it into the softmax function.

In [14]:
def forward(self, decoder_hidden, encoder_out):
        """
        encoder_out: (seq, batch, dim),
        decoder_hidden: (seq, batch, dim)
        """
        alpha = self.calcAlpha(decoder_hidden, encoder_out)
        seq, _, dim = encoder_out.shape
        combined = torch.bmm(torch.transpose(alpha,0,1),encoder_out)
        context = (torch.sum(combined, dim=0)).reshape(1, 1, dim)
        return context, alpha.permute(2, 0, 1)

When calculating the context that will be passed forward there was another mismatch of dimensions from the calcAlpha output so it is transposed again before multiplying it with the encoder hidden state.

### Running the Code
There were some setup issues getting this to run on my own machine but it may work fine on other machines. Versions of torchtext later than 0.8.1 discontinued some functions and will not work. Also, the older version of torch was too old to be able to run off my GPU as it uses CUDA capability sm_86 which the older version of torch supports up to sm_75. Of which I had to manually change the code, in neuralmt.py line 48 into line 49, to force it to run on my cpu. Other than that, the file needs to be fed an input, which for development the file "seq2seq_E049.pt" was directly put into the data folder. Then with the inputs in the data folder, you can run the command:

In [None]:
python3 neuralmt.py > output.txt

Which should take a few minutes to run but there will be an idicator in the command window and the output will be saved to the output.txt file.

## Analysis

After implementing additive attention to the model the BLEU score increased substantially to 17.11 from the default score of 3.35. It was difficult getting the dimensions of all the tensors to match and also choosing the right function to multiply tensors together as there are many ways to multiply tensors such as (*,torch.matmul,@,bmm) and also passing a tensor into the nn.linear.

Given the prevalence of <unk> tokens even in the baseline output, unknown word replacement would have been an especially suitable extension to the baseline solution by replacing all out-of-vocabulary words from the <unk> token with dictionary-translated words, as outlined in Sutskever et al.’s (2015) paper. Although we anticipate that this extension would have further increased the BLEU score of our current solution, a potential shortcoming of using dictionary replacements would be identifying word senses if the context of the sentence was not fully understood. Unfortunately, due to time constraints the other potential extensions such as beam search encoding and ensemble encoding remained unattempted.