# C3: SEQUENCT TO SEQUENCE LEARNING

## Seq2Seq

- A sequence to sequence model takes a sequence as inout and produces another sequence as output
- Usually built usinf Encode-Decoder architecture

### Need

- Text classification: useful when we want to label a sequence
- Machine translation: Translating text from one language to another
- Text generation: auto complete, code generating, question generation, dialogue geberation etc.

## Encoder-Decoder architecture

- It is a model to transform from one sequence into another
- It consists of teo main components
    - Encoder
    - Decoder

### Encoder

- Processes the input sequence into a fixed representation (context, vector or hidden states)
- Reads input tokens one by one
- At the end, produces
    - A final hidden state
    - A sequence of hidden states

### Decoder

- Generates the output sequence step by step using that representation
- Takes the encoders output as context
- Generates th e token step by step
- Each output depends on 
    - The previous generated tokens
    - The encoders representation
- Uses mechanism like teacher forcing during training

### Attention mechanhism

- Solves the problem of information break from fixed length context vectors
- Lets the decoder focus on different parts of the input sequence dynamically

### Applications

- RNN, LSTM, GRU
- Attention based Seq2Seq
- Transformers

In [7]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, SimpleRNN, Dense, Embedding

# Example: Predict next word (toy example)
model = Sequential([
    Input(shape=(10,)),
    Embedding(input_dim=1000, output_dim=64),
    SimpleRNN(128),
    Dense(1000, activation="softmax")
])

model.compile(loss="categorical_crossentropy", optimizer="adam")
print(model.summary())


None


## Recurrent neural network

- A type of neural network designed to handle sequential data (time series, text, speech etc)
- Unlike standard feedfoward networks, it has connections that loop back giving it a form of memory

### Key points

- Sequential processing: Takes inputs one step at a time
- Hidden state: Stores information about previous inputs, allowing context awareness
- Variants:
    - Long short temr memory (LSTM): Handles long range dependencies better
    - Gated recurrent unit (GRU): Simplified version of LSTM with fewer parameters

### Applications

- Natural language processing: Machine translation, text generation, sentiment analysis
- Speech recognition
- Time series forecasting

### Drawbacks

- Vanishing/Exploding gradient problem: 
    - Training long sequences is difficult because gradients shrink or blow up during backpropogation
- Difficulty learning long term dependencies: 
    - Struggles to remember information from far back in the sequence
- Sequential processing (Slow): 
    - Each step depends on the previous one, which means no parallellisation across time steps
- Limited memory:
    - Hidden state can't capture all past information, especially in complex sequences
- Prone to overfitting:
    - Due to large parameter space especially when dealing with small datasets
- Replaced by better models: 
    - LSTM, GRU and especially transformers handle context and long range dependencies much better

In [9]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense, Input

# Random input data: 100 samples, sequence length 10, features 5
X = np.random.random((100, 10, 5))
y = np.random.random((100, 1))

model = Sequential()
model.add(Input(shape=(10, 5)))
model.add(SimpleRNN(20))
model.add(Dense(1, activation="linear"))

model.compile(optimizer="adam", loss="mse")
print(model.summary())


None
