# Simple Encoder-Decoder Translation

**Goal:** Build a simple encoder-decoder model to translate English to Hindi.

### Exercise: Build an Encoder-Decoder Neural Network

In this exercise, you will build a sequence-to-sequence model using PyTorch to translate English sentences to Hindi. Fill in the blanks to complete the code.

This notebook demonstrates:
- How encoder processes input text
- How decoder generates output text
- Complete translation pipeline

## What is Encoder-Decoder?

Think of translation as a two-step process:

1. **Encoder**: Reads the English sentence and creates a "summary" (context vector)
2. **Decoder**: Uses that summary to write the Hindi sentence

```
English → [ENCODER] → Context Vector → [DECODER] → Hindi
```

* **Import Libraries:** This cell imports the necessary PyTorch libraries for building neural networks.

**Hints:**
- Import `torch.nn` as `nn` for neural network layers
- Import `torch.nn.functional` as `F` for activation functions
- torch is the main PyTorch library

**Documentation:**
- [torch.nn](https://pytorch.org/docs/stable/nn.html)
- [torch.nn.functional](https://pytorch.org/docs/stable/nn.functional.html)
- [PyTorch Getting Started](https://pytorch.org/get-started/locally/)

In [None]:
# Import libraries
import torch         # Hint: main PyTorch library
import torch.nn as nn   # Hint: neural network module (contains layers like Linear, LSTM, etc.)
import torch.nn.functional as F  # Hint: functional API (contains relu, softmax, cross_entropy, etc.)

print("Libraries loaded!")

Libraries loaded!


## Step 1: Prepare Data

We'll use a small set of English-Hindi sentence pairs.

In [None]:
# Training data: English-Hindi pairs
data = [
    ("I am happy", "मैं खुश हूं"),
    ("you are good", "तुम अच्छे हो"),
    ("he is smart", "वह होशियार है"),
    ("she is kind", "वह दयालु है"),
    ("we are friends", "हम दोस्त हैं"),
]

print("Training Examples:")
for i, (eng, hin) in enumerate(data, 1):
    print(f"{i}. {eng:15} → {hin}")

Training Examples:
1. I am happy      → मैं खुश हूं
2. you are good    → तुम अच्छे हो
3. he is smart     → वह होशियार है
4. she is kind     → वह दयालु है
5. we are friends  → हम दोस्त हैं


## Step 2: Build Vocabulary

* **Create Vocabulary:** This cell builds word-to-index mappings for both English and Hindi.

**Hints:**
- Special tokens: PAD (padding), SOS (start of sentence), EOS (end of sentence), UNK (unknown word)
- Use a dictionary to map words to indices
- Use `split()` to break sentences into words
- Use `len(vocab)` to get the next available index

**Documentation:**
- [Python dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)
- [String.split](https://docs.python.org/3/library/stdtypes.html#str.split)

In [None]:
# Special tokens
PAD = 0  # Padding
SOS = 1  # Start of sentence
EOS = 2  # End of sentence
UNK = 3  # Unknown word

# Build vocabulary
def build_vocab(sentences):
    vocab = {"<PAD>": PAD, "<SOS>": SOS, "<EOS>": EOS, "<UNK>": UNK}
    for sentence in sentences:
        for word in sentence.split():   # Hint: method used to break sentence into words
            if word not in vocab:
                vocab[word] = len(vocab)   # Hint 1: key should be the current word
                                             # Hint 2: assign next available index using current vocab size
    return vocab

# Create vocabularies
english_sentences = [pair[0] for pair in data]
hindi_sentences = [pair[1] for pair in data]

eng_vocab = build_vocab(english_sentences)   # Hint: pass the list of English sentences
hin_vocab = build_vocab(hindi_sentences)

# Reverse mapping (number → word)
eng_idx2word = {v: k for k, v in eng_vocab.items()}
hin_idx2word = {v: k for k, v in hin_vocab.items()}

print(f"English vocabulary: {len(eng_vocab)} words")
print(f"Hindi vocabulary: {len(hin_vocab)} words")
print(f"\nEnglish words: {list(eng_vocab.keys())}")
print(f"\nHindi words: {list(hin_vocab.keys())}")

English vocabulary: 17 words
Hindi vocabulary: 17 words

English words: ['<PAD>', '<SOS>', '<EOS>', '<UNK>', 'I', 'am', 'happy', 'you', 'are', 'good', 'he', 'is', 'smart', 'she', 'kind', 'we', 'friends']

Hindi words: ['<PAD>', '<SOS>', '<EOS>', '<UNK>', 'मैं', 'खुश', 'हूं', 'तुम', 'अच्छे', 'हो', 'वह', 'होशियार', 'है', 'दयालु', 'हम', 'दोस्त', 'हैं']


## Step 3: Convert Sentences to Numbers

* **Sentence to Indices:** This cell converts text sentences into sequences of numerical indices.

**Hints:**
- Use `vocab.get(word, UNK)` to handle unknown words gracefully
- Append EOS token at the end of each sequence
- Use `torch.tensor()` to create PyTorch tensors

**Documentation:**
- [dict.get](https://docs.python.org/3/library/stdtypes.html#dict.get)
- [torch.tensor](https://pytorch.org/docs/stable/tensors.html)

In [None]:
def sentence_to_indices(sentence, vocab):
    """Convert sentence to list of word indices"""
    indices = [vocab.get(word, UNK) for word in sentence.split()]
    # Hint 1: dictionary method that safely retrieves value
    # Hint 2: default value should be the index for unknown words (UNK)

    indices.append(EOS)
    # Hint 3: add special end-of-sentence token
    # Hint 4: method used to add an element to a list
    # Hint 5: token to mark end of sentence

    return torch.tensor(indices, dtype=torch.long)

# Convert all data
pairs = []
for eng, hin in data:
    eng_tensor = sentence_to_indices(eng, eng_vocab)
    # Hint 6: pass the English vocabulary here

    hin_tensor = sentence_to_indices(hin, hin_vocab)
    pairs.append((eng_tensor, hin_tensor))

print("Example conversion:")
print(f"English: {data[0][0]}")
print(f"Indices: {pairs[0][0].tolist()}")
print(f"Hindi: {data[0][1]}")
print(f"Indices: {pairs[0][1].tolist()}")

Example conversion:
English: I am happy
Indices: [4, 5, 6, 2]
Hindi: मैं खुश हूं
Indices: [4, 5, 6, 2]


## Step 4: Build Encoder

* **Encoder Architecture:** This cell defines the encoder neural network that processes input sentences.

**Hints:**
- Use `nn.Embedding` to convert word indices to dense vectors
- Use `nn.GRU` (Gated Recurrent Unit) for sequence processing
- The GRU returns outputs and hidden state; we only need the hidden state (context vector)
- Set `batch_first=True` for easier data handling

**Documentation:**
- [nn.Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html)
- [nn.GRU](https://pytorch.org/docs/stable/generated/torch.nn.GRU.html)
- [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)

In [None]:
class SimpleEncoder(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        # Hint 1: Layer that converts word indices → dense vectors

        self.gru = nn.GRU(embed_size, hidden_size, batch_first=True)
        # Hint 2: Recurrent layer type (simpler than LSTM)
        # Hint 3: Set to True if input shape is (batch_size, seq_len, features)

    def forward(self, x):
        # x: input word indices
        embedded = self.embedding(x)
        # Hint 4: Use embedding layer defined above

        _, hidden = self.gru(embedded)
        # Hint 5: Pass embeddings through GRU
        # Hint 6: We only need the final hidden state

        return hidden
        # Hint 7: Return the context vector (final hidden state)

# Create encoder
encoder = SimpleEncoder(
    vocab_size=len(eng_vocab),
    # Hint 8: Use English vocabulary here

    embed_size=16,
    hidden_size=32
)

print("Encoder created!")
print(f"- Input: English words")
print(f"- Output: Context vector of size 32")

Encoder created!
- Input: English words
- Output: Context vector of size 32


## Step 5: Build Decoder

* **Decoder Architecture:** This cell defines the decoder neural network that generates output sentences.

**Hints:**
- Decoder also uses Embedding layer and GRU
- Add `nn.Linear` layer to project hidden state to vocabulary size
- The decoder takes both input word and hidden state
- Use `squeeze(1)` to remove extra dimension before linear layer

**Documentation:**
- [nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)
- [Tensor.squeeze](https://pytorch.org/docs/stable/generated/torch.squeeze.html)

In [None]:
class SimpleDecoder(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.gru = nn.GRU(embed_size, hidden_size, batch_first=True)
        self.output = nn.Linear(hidden_size, vocab_size)
        # Hint 1: Fully connected layer type
        # Hint 2: Output dimension should be size of vocabulary (predicting next word)

    def forward(self, x, hidden):
        # x: previous word, hidden: context from encoder
        embedded = self.embedding(x)

        output, hidden = self.gru(embedded, hidden)
        # Hint 3: Pass through GRU layer
        # Hint 4: Second argument should be previous hidden state (from encoder or last step)

        output = self.output(output.squeeze(1))
        # Hint 5: Remove sequence dimension (since seq_len=1 during decoding)
        # Hint 6: Use a tensor operation that removes a dimension of size 1

        return output, hidden

# Create decoder
decoder = SimpleDecoder(
    vocab_size=len(hin_vocab),
    # Hint 7: Use Hindi vocabulary here

    embed_size=16,
    hidden_size=32
)

print("Decoder created!")
print(f"- Input: Context vector + previous Hindi word")
print(f"- Output: Next Hindi word")

Decoder created!
- Input: Context vector + previous Hindi word
- Output: Next Hindi word


## Step 6: Train the Model

* **Training Function:** This cell implements the training loop for one sentence pair.

**Hints:**
- Call `optimizer.zero_grad()` before each training step
- Use `encoder(input)` to get context vector
- Start decoding with SOS (Start Of Sentence) token
- Use teacher forcing: feed correct previous word as input
- Accumulate loss for each predicted word
- Call `loss.backward()` to compute gradients
- Call `optimizer.step()` to update weights

**Documentation:**
- [optimizer.zero_grad](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html)
- [Tensor.backward](https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html)
- [optimizer.step](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.step.html)
- [torch.optim.Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html)

In [None]:
# Training function
def train_one_pair(eng_tensor, hin_tensor, encoder, decoder,
                   enc_optimizer, dec_optimizer, criterion):
    enc_optimizer.zero_grad()
    # Hint 1: Reset gradients before backward pass

    dec_optimizer.zero_grad()

    # Encode English sentence
    context = encoder(eng_tensor.unsqueeze(0))
    # Hint 2: Add batch dimension (since model expects batch_first=True)

    # Decode to Hindi
    loss = 0
    hidden = context
    # Hint 3: Initialize decoder hidden state using encoder output (context vector)

    for i in range(hin_tensor.shape[0]):
        # Prepare input (previous word or SOS)
        if i == 0:
            dec_input = torch.tensor([[SOS]])
            # Hint 4: Use Start-Of-Sentence token index
        else:
            dec_input = hin_tensor[i-1].unsqueeze(0).unsqueeze(0)

        # Predict next word
        output, hidden = decoder(dec_input, hidden)
        loss += criterion(output, hin_tensor[i].unsqueeze(0))

    # Update weights
    loss.backward()
    # Hint 5: Compute gradients via backpropagation

    enc_optimizer.step()
    # Hint 6: Update encoder parameters

    dec_optimizer.step()

    return loss.item() / hin_tensor.size(0)

# Setup training
criterion = nn.CrossEntropyLoss()
# Hint 7: Loss function used for multi-class classification (no softmax needed)

enc_optimizer = torch.optim.Adam(encoder.parameters(), lr=0.01)
# Hint 8: Optimizer type (commonly used adaptive optimizer)

dec_optimizer = torch.optim.Adam(decoder.parameters(), lr=0.01)

# Train
print("Training started...\n")
for epoch in range(1, 201):
    total_loss = 0
    for eng_tensor, hin_tensor in pairs:
        loss = train_one_pair(eng_tensor, hin_tensor, encoder, decoder,
                             enc_optimizer, dec_optimizer, criterion)
        total_loss += loss

    if epoch % 40 == 0:
        avg_loss = total_loss / len(pairs)
        print(f"Epoch {epoch:3d}: Loss = {avg_loss:.3f}")

print("\nTraining completed!")

Training started...

Epoch  40: Loss = 0.008
Epoch  80: Loss = 0.003
Epoch 120: Loss = 0.001
Epoch 160: Loss = 0.001
Epoch 200: Loss = 0.001

Training completed!


## Step 7: Test Translation

* **Translation Function:** This cell implements the translation function to convert English to Hindi.

**Hints:**
- Use `torch.no_grad()` context to disable gradient computation during inference
- Start with SOS token and predict word by word
- Use `output.argmax(1)` to get the predicted word index
- Stop when EOS token is predicted or max length reached
- Use idx2word mapping to convert indices back to words

**Documentation:**
- [torch.no_grad](https://pytorch.org/docs/stable/generated/torch.no_grad.html)
- [Tensor.argmax](https://pytorch.org/docs/stable/generated/torch.argmax.html)

In [None]:
def translate(sentence, encoder, decoder, eng_vocab, hin_idx2word, max_len=10):
    """Translate English sentence to Hindi"""
    # Convert to indices
    eng_tensor = sentence_to_indices(sentence, eng_vocab)

    # Encode
    with torch.no_grad():
        # Hint 1: Disable gradient computation during inference

        context = encoder(eng_tensor.unsqueeze(0))

    # Decode word by word
    words = []
    hidden = context
    dec_input = torch.tensor([[SOS]])
    # Hint 2: Start decoding with Start-Of-Sentence token index (SOS)

    for _ in range(max_len):
        with torch.no_grad():
            output, hidden = decoder(dec_input, hidden)

        # Get predicted word
        predicted_idx = output.argmax(1).item()
        # Hint 3: Function that returns index of highest probability along vocab dimension

        if predicted_idx == dec_input:
            # Hint 4: Stop when End-Of-Sentence token is generated
            break

        word = hin_idx2word[predicted_idx]
        words.append(word)
        # Hint 5: Add predicted word to list

        dec_input = torch.tensor([[predicted_idx]])

    return ' '.join(words)

# Test on training examples
print("Translation Results:")
print("=" * 60)
for eng, hin in data:
    predicted = translate(eng, encoder, decoder, eng_vocab, hin_idx2word)
    match = "✓ correct" if predicted == hin else "✗ wrong"
    print(f"English:   {eng}")
    print(f"Predicted: {predicted} {match}")
    print(f"Actual:    {hin}")
    print()

Translation Results:
English:   I am happy
Predicted: मैं खुश हूं <EOS> ✗ wrong
Actual:    मैं खुश हूं

English:   you are good
Predicted: तुम अच्छे हो <EOS> ✗ wrong
Actual:    तुम अच्छे हो

English:   he is smart
Predicted: वह होशियार है <EOS> ✗ wrong
Actual:    वह होशियार है

English:   she is kind
Predicted: वह दयालु है <EOS> ✗ wrong
Actual:    वह दयालु है

English:   we are friends
Predicted: हम दोस्त हैं <EOS> ✗ wrong
Actual:    हम दोस्त हैं



## Try Your Own Translations

* **Test Custom Input:** This cell allows you to test the model with your own English sentences.

**Hints:**
- Modify the test_sentence variable to try different inputs
- The model will warn you about words not in the vocabulary
- Unknown words will be replaced with UNK token

In [None]:
# Test translation
test_sentence = "I am sad"  # Change this!

print(f"Input:  {test_sentence}")

# Check for unknown words
words = test_sentence.split()
unknown_words = [w for w in words if w not in eng_vocab]
if unknown_words:
    print(f"⚠ Unknown words (will be replaced with <UNK>): {unknown_words}")

print(f"\nAvailable words: {list(eng_vocab.keys())[4:]}")  # Skip special tokens

output = translate(test_sentence, encoder, decoder, eng_vocab, hin_idx2word)
print(f"Output: {output}")

Input:  I am sad
⚠ Unknown words (will be replaced with <UNK>): ['sad']

Available words: ['I', 'am', 'happy', 'you', 'are', 'good', 'he', 'is', 'smart', 'she', 'kind', 'we', 'friends']
Output: मैं खुश हूं <EOS>


---

## How It Works - Summary

### 1. Encoder
- Reads input sentence word by word
- Creates a fixed-size "summary" (context vector)
- This vector captures the meaning

### 2. Decoder
- Starts with the context vector
- Generates output one word at a time
- Each word depends on context + previous words

### 3. Training
- Model learns by comparing predictions to correct translations
- Adjusts weights to minimize errors
- After many iterations, learns the translation pattern

### Limitations
- Only works with words seen during training
- Limited vocabulary (5 sentence pairs)
- No attention mechanism (can't focus on specific input words)

### To Improve
- Add more training data
- Implement attention mechanism
- Use larger embedding and hidden sizes
- Train for more epochs

## Additional Resources

Learn more about encoder-decoder architectures:
- [Sequence to Sequence Learning](https://arxiv.org/abs/1409.3215)
- [PyTorch Seq2Seq Tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html)
- [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)