# Sequence-to-Sequence (Seq2Seq) Model Basics

## 1. Introduction
- **Seq2Seq** models are used when both input and output are sequences.
- Examples:
  - Machine Translation (English → French)
  - Chatbots
  - Text Summarization
  - Speech-to-text

### Key Components:
1. **Encoder**: Reads the input sequence and produces a context vector (hidden state).
2. **Decoder**: Takes the context vector and generates the output sequence.

Often implemented with RNNs, LSTMs, or GRUs.

## 2. Workflow
- Input sequence → Encoder → Context Vector → Decoder → Output sequence
- Training uses **Teacher Forcing**: feeding the actual previous output as input to the decoder.
- At inference, the decoder predicts step by step.

## 3. Example: Toy Seq2Seq with Keras (Number Reversal)
Here we’ll build a simple model that learns to **reverse a sequence of numbers**.

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense

# Parameters
num_samples = 1000
timesteps = 5
input_dim = 10  # numbers 0-9

# Generate toy dataset (sequence → reversed sequence)
X = np.random.randint(0, input_dim, size=(num_samples, timesteps, 1))
Y = np.flip(X, axis=1)  # reverse the sequence

# One-hot encode
X_onehot = tf.keras.utils.to_categorical(X, num_classes=input_dim)
Y_onehot = tf.keras.utils.to_categorical(Y, num_classes=input_dim)

# Encoder
encoder_inputs = Input(shape=(timesteps, input_dim))
encoder_lstm = LSTM(64, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(timesteps, input_dim))
decoder_lstm = LSTM(64, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(input_dim, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

# Train (teacher forcing: decoder input = shifted Y)
decoder_input_data = np.zeros_like(Y_onehot)
decoder_input_data[:,1:,:] = Y_onehot[:,:-1,:]  # shift right by 1

model.fit([X_onehot, decoder_input_data], Y_onehot, batch_size=32, epochs=5, verbose=1)

## 4. Key Notes
- Seq2Seq can handle input and output sequences of **different lengths**.
- Commonly used with **attention mechanisms** for better performance.
- Foundation for modern **transformers** (BERT, GPT).