# Seq2Seq Neural Machine Translation with Luong Attention

This notebook demonstrates training and evaluating a Seq2Seq model using GRUs and **Luong attention** in PyTorch. It compares three attention mechanisms:
- Dot Product
- General
- Concat

We use a subset of the **English-French** dataset from [Tatoeba (ManyThings.org)](https://www.manythings.org/anki/).

---


# 1. Imports and Setup

In [None]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import random
import os

from dataloader_generator import (
    normalizeString,
    prepareData,
    DatasetEngFra,
    collate_batch,
    PAD_token, SOS_token, EOS_token
)
from models import EncoderWithLuongAttention, DecoderWithLuongAttention
from utils import train_luong_attention, translate_with_attention, plot_attention

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

seed = 13
torch.manual_seed(seed)


# 2. Hyperparameters & Dataset

In [None]:
# Parameters
num_epochs = 20
hidden_size = 256
BATCH_SIZE = 32
learning_rate = 1e-3
MAX_LENGTH = 10
alignment_modes = ['dot_product', 'concat', 'general']

# Download if not exists
if not os.path.exists('fra.txt'):
    os.system('wget -q https://www.manythings.org/anki/fra-eng.zip')
    os.system('unzip -oq fra-eng.zip')

# Load and preprocess
text_pairs = []
for line in open('fra.txt', 'r'):
    a = line.find('CC-BY')
    line = line[:a].strip()
    if '\t' not in line:
        continue
    eng, fra = line.split('\t')
    text_pairs.append((normalizeString(eng), normalizeString(fra)))

input_lang, output_lang, pairs = prepareData('eng', 'fra', text_pairs)
dataset = DatasetEngFra(pairs, input_lang, output_lang)
train_dl = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate_batch)


# 3. Training Loop Across Modes

In [None]:
results = {}

for alignment_mode in alignment_modes:
    print(f"\n{'=' * 40} {alignment_mode.upper()} MODE {'=' * 40}")

    encoder = EncoderWithLuongAttention(input_lang.n_words, hidden_size).to(device)
    decoder = DecoderWithLuongAttention(output_lang.n_words, hidden_size, alignment_mode).to(device)

    loss_fn = nn.NLLLoss(ignore_index=PAD_token)
    encoder_optimizer = torch.optim.Adam(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=learning_rate)

    train_loss = train_luong_attention(
        encoder, decoder, train_dl, num_epochs,
        loss_fn, encoder_optimizer, decoder_optimizer
    )
    results[alignment_mode] = {
        "encoder": encoder,
        "decoder": decoder,
        "train_loss": train_loss
    }


# 4. Visualize Sample Translations

In [None]:
def show_sample_translations(encoder, decoder, mode_name):
    print(f"\nSample translations for {mode_name.upper()} mode:\n")
    for _ in range(5):
        eng, fra = random.choice(text_pairs)
        pred, _ = translate_with_attention(encoder, decoder, eng, input_lang, output_lang)
        print(f"Input:    {eng}")
        print(f"Target:   {fra}")
        print(f"Predicted:{pred}\n")

for mode, obj in results.items():
    show_sample_translations(obj["encoder"], obj["decoder"], mode)


# **5. Attention Visualization**

In [None]:
sample_input = "Life is often compared to a journey"
correct_translation = "la vie est souvent comparée à un voyage."

for mode, obj in results.items():
    print(f"\n{mode.upper()} MODE Attention Visualization:")
    output_sentence, attention_weights = translate_with_attention(
        obj["encoder"], obj["decoder"], sample_input, input_lang, output_lang
    )
    print(f"Predicted: {output_sentence}")
    print(f"Reference: {correct_translation}")
    plot_attention(sample_input, output_sentence, attention_weights)


# 6. Loss Plot

In [None]:
plt.figure(figsize=(10, 6))
for mode, obj in results.items():
    plt.plot(obj["train_loss"], label=mode.upper())

plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training Loss vs Epochs for Different Attention Modes")
plt.grid(True)
plt.legend()
plt.show()


##  Conclusion

In this notebook, we trained a sequence-to-sequence model with **Luong attention mechanisms** on a subset of the English-French dataset.

### Key Observations:

- All attention modes were able to generate reasonable translations.
- The **concat** and **general** modes tended to be more expressive in attending to longer sequences.
- **Dot product** is faster but may be less flexible for some sentence structures.

### Final Thoughts:

Luong attention adds significant context-awareness by dynamically focusing on relevant encoder outputs. It improves translation quality and interpretability via attention heatmaps.

---

**Next Steps**
- Extend to multi-layer GRU or LSTM
- Use beam search for decoding
- Train on a larger dataset (like WMT14)
