# Encoder–Decoder RNN Demo (Toy Neural Machine Translation)

This notebook demonstrates the **idea** of an encoder–decoder RNN for neural machine translation,
similar to the diagram:

`I  → love → llamas  → [Encoder RNN] → [Decoder RNN] →  Ik → hou → van → lama's`

The model here is **not trained**. Instead, we:
- Use simple RNN equations with random weights, just to show the *flow* of information.
- **Force** the decoder to output the Dutch sequence `['Ik', 'hou', 'van', "lama's"]`
  so that it exactly matches the teaching diagram.

This is meant purely as a teaching tool, not a real translation model.

In [1]:
import numpy as np

# ---- English → Dutch toy vocabularies ----
en_vocab = {"I": 0, "love": 1, "llamas": 2}
nl_vocab = {"<BOS>": 0, "Ik": 1, "hou": 2, "van": 3, "lama's": 4}

id2en = {i: w for w, i in en_vocab.items()}
id2nl = {i: w for w, i in nl_vocab.items()}

def one_hot(index, size):
    v = np.zeros(size)
    v[index] = 1.0
    return v


In [2]:
# ================= ENCODER (RNN) =================
# Task: 'representing language' – turn the input sentence into a context vector.

input_size  = len(en_vocab)   # one-hot size for English
hidden_size = 4               # small hidden state for illustration

np.random.seed(0)  # keep results deterministic for class

# Random weights for the encoder RNN
W_xh = np.random.randn(input_size, hidden_size) * 0.5
W_hh = np.random.randn(hidden_size, hidden_size) * 0.5
b_h  = np.zeros(hidden_size)

def rnn_step(x_t, h_prev):
    """Single RNN step: h_t = tanh(W_xh x_t + W_hh h_{t-1} + b)."""
    return np.tanh(x_t @ W_xh + h_prev @ W_hh + b_h)

encoder_inputs = ["I", "love", "llamas"]
h = np.zeros(hidden_size)   # initial hidden state (all zeros)

print("=== ENCODER ===")
for token in encoder_inputs:
    x_t = one_hot(en_vocab[token], input_size)
    h = rnn_step(x_t, h)
    print(f"Input word: {token:7s}  →  hidden state: {np.round(h, 3)}")

encoder_final_state = h
print("\nFinal encoder hidden state (context vector):")
print(np.round(encoder_final_state, 3))


=== ENCODER ===
Input word: I        →  hidden state: [0.707 0.197 0.454 0.808]
Input word: love     →  hidden state: [ 0.934 -0.719  0.705 -0.278]
Input word: llamas   →  hidden state: [-0.896  0.646  0.434  0.742]

Final encoder hidden state (context vector):
[-0.896  0.646  0.434  0.742]


In [3]:
# ================= DECODER (RNN) =================
# Task: 'generating language' – turn the context vector into the output sentence.
#
# NOTE: In a real system, the decoder would *learn* to output the Dutch words.
# Here we **force** the decoder to output ['Ik', 'hou', 'van', "lama's"]
# so it exactly matches the teaching diagram.

decoder_input_size = len(nl_vocab)

# Separate (random) weights for the decoder RNN
W_xh_dec = np.random.randn(decoder_input_size, hidden_size) * 0.5
W_hh_dec = np.random.randn(hidden_size, hidden_size) * 0.5
b_h_dec  = np.zeros(hidden_size)

def rnn_step_dec(x_t, h_prev):
    return np.tanh(x_t @ W_xh_dec + h_prev @ W_hh_dec + b_h_dec)

print("\n=== DECODER (forced outputs) ===")
target_sequence = ["Ik", "hou", "van", "lama's"]
decoded_words = []

h_dec = encoder_final_state.copy()  # start from encoder context
prev_token = "<BOS>"                # begin-of-sentence token

for step, target_word in enumerate(target_sequence, start=1):
    # Standard decoder RNN update
    x_t = one_hot(nl_vocab[prev_token], decoder_input_size)
    h_dec = rnn_step_dec(x_t, h_dec)
    # Instead of choosing from a softmax, we FORCE the target word
    next_word = target_word

    print(
        f"Step {step}: input token = {prev_token:6s} → output (forced) = {next_word:7s}, "
        f"hidden = {np.round(h_dec, 3)}"
    )

    decoded_words.append(next_word)
    prev_token = next_word

print("\nFinal decoded sequence:")
print(decoded_words)



=== DECODER (forced outputs) ===
Step 1: input token = <BOS>  → output (forced) = Ik     , hidden = [ 0.797  0.363  0.03  -0.511]
Step 2: input token = Ik     → output (forced) = hou    , hidden = [-0.764 -0.832 -0.325  0.633]
Step 3: input token = hou    → output (forced) = van    , hidden = [ 0.84   0.766  0.006 -0.746]
Step 4: input token = van    → output (forced) = lama's , hidden = [-0.817 -0.806 -0.735  0.96 ]

Final decoded sequence:
['Ik', 'hou', 'van', "lama's"]
