## **What problem does Seq2Seq solve?**

Many tasks need input sequence → output sequence with different lengths:

- translation (“i love cats” → “ich liebe katzen”)
- summarization
- question answering

A single RNN/LSTM that outputs one label per input token can’t do this cleanly.
Seq2Seq uses two RNNs/LSTMs:

1. Encoder: reads the whole input and produces a summary (final hidden & cell state).
2. Decoder: starts from that summary and generates the output tokens one by one.

Think: the encoder writes a summary note; the decoder reads that note and writes the translation.

In [190]:
# ================================================================
# 1. Imports and Setup
# ================================================================
import numpy as np
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense
import tensorflow as tf


# Simple dataset: digits to words
pairs = [
    ("1", "one"),
    ("2", "two"),
    ("3", "three"),
    ("4", "four"),
    ("5", "five"),
    ("6", "six"),
    ("7", "seven"),
    ("8", "eight"),
    ("9", "nine"),
]


In [191]:

# ================================================================
# 2. Character-level tokenization
# ================================================================

input_texts = [inp for inp, _ in pairs]  #Builds a list of all input strings from your (input, target) pairs.
target_texts = ["\t" + tar + "\n" for _, tar in pairs]  #Builds a list of target strings, but wrapped with special start ("\t") and end ("\n") tokens.

input_chars = sorted(set("".join(input_texts))) #sorted set of unique characters
target_chars = sorted(set("".join(target_texts)))

num_encoder_tokens = len(input_chars)
num_decoder_tokens = len(target_chars)
max_encoder_seq_length = max(len(txt) for txt in input_texts)
max_decoder_seq_length = max(len(txt) for txt in target_texts)

input_token_index = {char: i for i, char in enumerate(input_chars)} #input character → integer index (0…V-1).
target_token_index = {char: i for i, char in enumerate(target_chars)} #target character → integer index.
reverse_target_char_index = {i: char for char, i in target_token_index.items()} ##index → character for targets.


In [192]:

# ================================================================
# 3. Data vectorization
# ================================================================

encoder_input_data = np.zeros(
    (len(input_texts), max_encoder_seq_length, num_encoder_tokens), dtype="float32"
)
decoder_input_data = np.zeros(
    (len(input_texts), max_decoder_seq_length, num_decoder_tokens), dtype="float32"
)
decoder_target_data = np.zeros(
    (len(input_texts), max_decoder_seq_length, num_decoder_tokens), dtype="float32"
)

#Filling Those Arrays With One-Hot Encoded Characters

for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
    for t, char in enumerate(input_text):
        encoder_input_data[i, t, input_token_index[char]] = 1.0
    for t, char in enumerate(target_text):
        decoder_input_data[i, t, target_token_index[char]] = 1.0
        if t > 0: #Prepare the shifted targets (“teacher forcing”)
            decoder_target_data[i, t - 1, target_token_index[char]] = 1.0


In [193]:

# ================================================================
# 4. Define Encoder–Decoder Model
# ================================================================
latent_dim = 32

encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation="softmax")
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])


In [194]:

# ================================================================
# 5. Train
# ================================================================
model.fit(
    [encoder_input_data, decoder_input_data],
    decoder_target_data,
    batch_size=2,
    epochs=300,
    verbose=0,
)
print("Training complete!")


Training complete!


One training step conceptually:

1. Input: The encoder sees a digit like "3".
→ Encodes it into a fixed-length context vector (state_h, state_c).
2. Decoder: Starts from the start token "\t"
→ Predicts the next character "t".

- The model compares predicted "t" vs actual "t" → calculates error.
- Using backpropagation through time (BPTT),
- the model adjusts its weights to reduce that error.
- Repeat for all characters in "three", then for all pairs.
- After many epochs, the model learns mappings like:

In [195]:

# ================================================================
# 8. Test Model
# ================================================================
for seq_index in range(len(input_texts)):
    input_text = input_texts[seq_index]
    input_seq = np.zeros((1, max_encoder_seq_length, num_encoder_tokens))
    for t, char in enumerate(input_text):
        input_seq[0, t, input_token_index[char]] = 1.0
    decoded_sentence = decode_sequence(input_seq)
    print(f"{input_text} → {decoded_sentence}")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
1 → one
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 55ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
2 → two
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[