# HW 3: Neural Machine Translation

In this homework you will build a full neural machine translation system using an attention-based encoder-decoder network to translate from German to English. The encoder-decoder network with attention forms the backbone of many current text generation systems. See [Neural Machine Translation and Sequence-to-sequence Models: A Tutorial](https://arxiv.org/pdf/1703.01619.pdf) for an excellent tutorial that also contains many modern advances.

## Goals


1. Build a non-attentional baseline model (pure seq2seq as in [ref](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf)). 
2. Incorporate attention into the baseline model ([ref](https://arxiv.org/abs/1409.0473) but with dot-product attention as in class notes).
3. Implement beam search: review/tutorial [here](http://www.phontron.com/slides/nlp-programming-en-13-search.pdf)
4. Visualize the attention distribution for a few examples. 

Consult the papers provided for hyperparameters, and the course notes for formal definitions.

This will be the most time-consuming assignment in terms of difficulty/training time, so we recommend that you get started early!

In [5]:
import torch
from namedtensor import ntorch, NamedTensor
import numpy as np
import random

In [None]:
from load_data import DataLoader
from models import LSTMTranslator, AttentionTranslator

loader = DataLoader('cpu')
train_iter, val_iter, DE, EN = loader.get_iters()

Loading data...


In [None]:
model = LSTMTranslator(DE, EN, 300, 300, 200)
model.fit(train_iter,val_iter)

In [4]:
model = AttentionTranslator(DE, EN, 300, 300, 200)
model.fit(train_iter,val_iter)

[epoch: 1, batch: 1] loss: 9.360163688659668
[epoch: 1, batch: 2] loss: 6.11624002456665
[epoch: 1, batch: 3] loss: 5.2285919189453125
[epoch: 1, batch: 4] loss: 5.5190110206604
[epoch: 1, batch: 5] loss: 5.469543933868408
[epoch: 1, batch: 6] loss: 5.159133434295654
[epoch: 1, batch: 7] loss: 4.834699630737305
[epoch: 1, batch: 8] loss: 4.836313724517822
[epoch: 1, batch: 9] loss: 4.9676713943481445
[epoch: 1, batch: 10] loss: 4.649656295776367
[epoch: 1, batch: 11] loss: 4.555586814880371
[epoch: 1, batch: 12] loss: 4.320090293884277
[epoch: 1, batch: 13] loss: 4.185899257659912
[epoch: 1, batch: 14] loss: 4.125365257263184
[epoch: 1, batch: 15] loss: 4.351080417633057
[epoch: 1, batch: 16] loss: 4.147336006164551
[epoch: 1, batch: 17] loss: 4.543869495391846


KeyboardInterrupt: 

In [None]:
model = AttentionTranslator(DE, EN, 300, 300, 200)