# Assignment Overview
This assignment focuses on understanding and applying four key approaches to processing sequential text data:

Recurrent Neural Networks (**RNN**s) - Basic sequential processing

Long Short-Term Memory (**LSTM**) - Advanced sequential processing with memory

**ELMo** - Context-aware embeddings using bidirectional LSTMs

**Transformers** - Attention-based parallel processing

# Base code

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score, mean_squared_error
from sklearn.preprocessing import MinMaxScaler
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")

print("📚 Assignment Environment Setup Complete!")
print("🎯 Ready to tackle RNN, LSTM, ELMo, and Transformers!")

# Architecture Comparison

| Aspect | RNN | LSTM | ELMo | Transformers |
| :--- | :---: | :---: | :---: | :---: |
| Processing Type | Sequential | Sequential | Sequential | Parallel |
| Memory Mechanism | Hidden state | Gated memory cells | Contextualized embeddings via BiLSTM | Self-attention mechanism |
| Parallel Training | No | No | No | Yes |

# Explanation

**Processing Type**:
RNN, LSTM, and ELMo process input sequentially, which limits parallelization. Transformers process input in parallel using self-attention.

**Memory Mechanism**:
RNN uses a hidden state to carry information. LSTM improves this with gated cells to manage long-term dependencies. ELMo uses BiLSTM layers to generate contextual embeddings. Transformers use self-attention to capture dependencies across the entire sequence.

**Parallel Training**:
RNN, LSTM, and ELMo are inherently sequential, making parallel training difficult. Transformers allow parallelization, which significantly speeds up training.

# Problem Analysis
For each scenario below, choose the most appropriate model (RNN, LSTM, ELMo, or Transformers) and explain.

---

### **Scenario A**  
**Task:** Building a simple next-word prediction system for short sentences (5–10 words) with limited computational resources.  
**My Choice:** **RNN**  
**Explanation:** RNNs are lightweight and suitable for short sequences where complex memory mechanisms aren’t necessary. They require fewer resources and can perform adequately for simple tasks like next-word prediction in short texts.

---

### **Scenario B**  
**Task:** Analyzing sentiment in movie reviews where the sentiment often depends on words that appear far apart in the text.  
**My Choice:** **LSTM**  
**Explanation:** LSTMs are designed to capture long-range dependencies in text, making them ideal for sentiment analysis where key phrases may be separated by many words. Their gated memory cells help retain relevant information over longer sequences.

---

### **Scenario C**  
**Task:** Creating context-aware word embeddings for a domain-specific corpus (medical texts) where the same word has different meanings.  
**My Choice:** **ELMo**  
**Explanation:** ELMo generates dynamic, context-sensitive embeddings using BiLSTMs, which is crucial in domains like medicine where word meaning heavily depends on context. It captures semantic nuances better than static embeddings.

---

### **Scenario D**  
**Task:** Building a state-of-the-art question-answering system that needs to understand complex relationships between all words in long documents.  
**My Choice:** **Transformers**  
**Explanation:** Transformers use self-attention to model relationships between all words in a sequence, regardless of distance. This makes them ideal for tasks requiring deep understanding of context and long-range dependencies, like question answering.

---

### **Scenario E**  
**Task:** Processing time series data (stock prices) where you need to remember patterns from many time steps ago.  
**My Choice:** **LSTM**  
**Explanation:** LSTMs are well-suited for time series tasks due to their ability to retain information over long sequences. Their memory cells help capture temporal patterns and trends that are crucial in financial forecasting.

# Practical Implementation
*RNN*s

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

def generate_sequences(n_samples=200, seq_len=5):
    X, y = [], []
    for _ in range(n_samples-15):
        seq = np.random.randint(0, 10, size=seq_len)
        label = int(np.all(np.diff(seq) > 0))  # 1 if strictly increasing
        X.append(seq)
        y.append(label)

    for i in range(15): # Introduce examples that are increasing
      seq = np.array([i for i in range(5)])
      label = 1
      X.append(seq)
      y.append(label)

    return np.array(X), np.array(y)

X, y = generate_sequences()
X_tensor = torch.tensor(X, dtype=torch.long)
y_tensor = torch.tensor(y, dtype=torch.long)

class SimpleRNNClassifier(nn.Module):
    def __init__(self, vocab_size=10, hidden_size=16):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, hidden_size)
        self.rnn = nn.RNN(hidden_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, 2)

    def forward(self, x):
        x = self.embed(x)  # [batch_size, seq_len, hidden_size]
        _, h_n = self.rnn(x)  # h_n: [1, batch_size, hidden_size]
        out = self.fc(h_n.squeeze(0))  # [batch_size, 2]
        return out

# Training setup
model = SimpleRNNClassifier()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
n_epochs = 20
for epoch in range(n_epochs):
    optimizer.zero_grad()
    outputs = model(X_tensor)
    loss = criterion(outputs, y_tensor)
    loss.backward()
    optimizer.step()
    print(f"Epoch {epoch+1}/{n_epochs}, Loss: {loss.item():.4f}")

*LSTM*

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# Dataset: small text
text = "hello world"
chars = list(set(text))
char2idx = {c: i for i, c in enumerate(chars)}
idx2char = {i: c for c, i in char2idx.items()}

seq = [char2idx[c] for c in text]

X = torch.tensor(seq[:-1]).unsqueeze(0)  # input shape: [1, seq_len]
y = torch.tensor(seq[1:]).unsqueeze(0)   # target shape: [1, seq_len]

# Model definition
class CharLSTM(nn.Module):
    def __init__(self, vocab_size, hidden_size):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, hidden_size)
        self.lstm = nn.LSTM(hidden_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, vocab_size)

    def forward(self, x, h=None):
        x = self.embed(x)  # [batch_size, seq_len, hidden_size]
        out, h = self.lstm(x, h)  # out: [batch_size, seq_len, hidden_size]
        out = self.fc(out)  # [batch_size, seq_len, vocab_size]
        return out, h

# Training setup
model = CharLSTM(len(chars), hidden_size=16)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
n_epochs = 50
for epoch in range(n_epochs):
    model.train()
    optimizer.zero_grad()

    output, _ = model(X)
    loss = criterion(output.view(-1, len(chars)), y.view(-1))

    loss.backward()
    optimizer.step()

    print(f"Epoch {epoch+1}/{n_epochs}, Loss: {loss.item():.4f}")


*ELMo*

In [None]:
import tensorflow_hub as hub
import tensorflow as tf
import numpy as np

# Load ELMo model
elmo = hub.load("https://tfhub.dev/google/elmo/3")

# Sentences
sentences = [
    "He draw a portrait of his wife",
    "The football match ended in a draw"
]

# Compute ELMo embeddings
embeddings = elmo.signatures["default"](tf.constant(sentences))["elmo"]  # shape: [batch_size, seq_len, 1024]

# Extract embeddings for the word 'bank'
target_word = "draw"
for i, sentence in enumerate(sentences):
    words = sentence.split()
    try:
        index = words.index(target_word)
        word_embedding = embeddings[i][index].numpy()
        print(f"Embedding for '{target_word}' in sentence {i+1}:")
        print(word_embedding[:10])  # print first 10 values for brevity
    except ValueError:
        print(f"'{target_word}' not found in sentence {i+1}")


**What happened if we are trying to found the embedings of a word that doesn't appear in the input sentences?**

In [None]:
# Sentences
sentences = [
    "He draw a portrait of his wife",
    "The football match ended in a draw"
]

# Compute ELMo embeddings
embeddings = elmo.signatures["default"](tf.constant(sentences))["elmo"]  # shape: [batch_size, seq_len, 1024]

# Extract embeddings for the word 'bank'
target_word = "bank"
for i, sentence in enumerate(sentences):
    words = sentence.split()
    try:
        index = words.index(target_word)
        word_embedding = embeddings[i][index].numpy()
        print(f"Embedding for '{target_word}' in sentence {i+1}:")
        print(word_embedding[:10])  # print first 10 values for brevity
    except ValueError:
        print(f"'{target_word}' not found in sentence {i+1}")

*Transformer*


In [None]:
from transformers import BertTokenizer, BertForMaskedLM
import torch

# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForMaskedLM.from_pretrained("bert-base-uncased")

# Input sentence with a masked token
sentence = "The cat sat on the [MASK]."

# Tokenize input
inputs = tokenizer(sentence, return_tensors="pt")

# Get model predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits

# Find the index of the [MASK] token
mask_token_index = torch.where(inputs.input_ids == tokenizer.mask_token_id)[1]

# Get the top predicted token at the mask position
top_token_id = predictions[0, mask_token_index].argmax(dim=-1)
predicted_token = tokenizer.decode(top_token_id)

print(f"Predicted word: {predicted_token}")