# Lab 5: Deep Learning & LLMs for NLP

**Course:** Natural Language Processing


**Objectives:**
- Understand RNN, LSTM, GRU architectures for sequence modeling
- Use pre-trained Transformers for NER
- Interact with LLMs via API for text generation

---

## Instructions

1. Complete all exercises marked with `# YOUR CODE HERE`
2. **Answer all written questions** in the designated markdown cells
3. Save your completed notebook
4. **Push to your Git repository and send the link to: yoroba93@gmail.com**

---

## Lab Structure

| Part | Model | Task |
|------|-------|------|
| A | RNN | Character-level Language Model |
| B | LSTM | Sentiment Analysis |
| C | GRU | News Classification |
| D | Transformer | Named Entity Recognition | 
| E | LLM (Mistral) | Text Generation & QA |

---

## Setup

In [None]:
# Install required libraries (uncomment if needed)
# !pip install torch transformers datasets requests numpy pandas matplotlib

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import warnings
warnings.filterwarnings('ignore')

# Check device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
print(f"PyTorch version: {torch.__version__}")

---

# PART A: RNN - Character-Level Language Model (10 min)

**Use Case:** Predict the next character for autocomplete.

**Dataset:** Tiny Shakespeare

In [None]:
# Load Tiny Shakespeare dataset
from datasets import load_dataset

shakespeare = load_dataset("tiny_shakespeare", split="train")
text = shakespeare['text'][0][:10000]  # Use first 10K chars for speed

print(f"Text length: {len(text)} characters")
print(f"Sample: {text[:200]}")

In [None]:
# Create character vocabulary
chars = sorted(list(set(text)))
vocab_size = len(chars)
char_to_idx = {ch: i for i, ch in enumerate(chars)}
idx_to_char = {i: ch for i, ch in enumerate(chars)}

print(f"Vocabulary size: {vocab_size}")
print(f"Characters: {''.join(chars[:30])}...")

In [None]:
# Prepare sequences
seq_length = 30
X, y = [], []

for i in range(len(text) - seq_length):
    X.append([char_to_idx[c] for c in text[i:i+seq_length]])
    y.append(char_to_idx[text[i+seq_length]])

X = torch.tensor(X, dtype=torch.long)
y = torch.tensor(y, dtype=torch.long)

print(f"Sequences: {X.shape[0]}, Sequence length: {seq_length}")

In [None]:
# Simple RNN model
class CharRNN(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.rnn = nn.RNN(embed_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, vocab_size)
    
    def forward(self, x):
        x = self.embedding(x)
        out, _ = self.rnn(x)
        out = self.fc(out[:, -1, :])  # Last timestep
        return out

# Create model
rnn_model = CharRNN(vocab_size, embed_dim=32, hidden_dim=64).to(device)
print(f"RNN Parameters: {sum(p.numel() for p in rnn_model.parameters()):,}")

### Exercise A.1: Train the RNN

In [None]:
# TODO: Complete the training loop

# Hyperparameters
batch_size = 128
epochs = 5
learning_rate = ___  # YOUR CHOICE: 0.001-0.01

# DataLoader
dataset = TensorDataset(X[:5000], y[:5000])  # Use subset for speed
loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(rnn_model.parameters(), lr=learning_rate)

# Training loop
losses = []
for epoch in range(epochs):
    total_loss = 0
    for batch_X, batch_y in loader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)
        
        # YOUR CODE HERE
        # 1. Zero gradients
        # 2. Forward pass
        # 3. Compute loss
        # 4. Backward pass
        # 5. Update weights
        
        pass  # Remove and add your code
        
    avg_loss = total_loss / len(loader)
    losses.append(avg_loss)
    print(f"Epoch {epoch+1}/{epochs}, Loss: {avg_loss:.4f}")

In [None]:
# Generate text
def generate_text(model, start_str, length=100):
    model.eval()
    chars_generated = list(start_str)
    input_seq = [char_to_idx.get(c, 0) for c in start_str[-seq_length:]]
    
    with torch.no_grad():
        for _ in range(length):
            x = torch.tensor([input_seq[-seq_length:]]).to(device)
            output = model(x)
            pred_idx = torch.argmax(output, dim=1).item()
            chars_generated.append(idx_to_char[pred_idx])
            input_seq.append(pred_idx)
    
    return ''.join(chars_generated)

# Test generation
print("Generated text:")
print(generate_text(rnn_model, "To be or not", length=100))

---

# PART B: LSTM - Sentiment Analysis 

**Use Case:** Classify movie review sentiment.

**Dataset:** IMDB Reviews

In [None]:
# Load IMDB dataset
imdb = load_dataset("imdb")

# Small sample for quick training
train_texts = imdb['train']['text'][:1000]
train_labels = imdb['train']['label'][:1000]
test_texts = imdb['test']['text'][:200]
test_labels = imdb['test']['label'][:200]

print(f"Train: {len(train_texts)}, Test: {len(test_texts)}")

In [None]:
# Simple tokenization and vocabulary
from collections import Counter
import re

def tokenize(text):
    return re.findall(r'\b\w+\b', text.lower())[:100]  # Max 100 tokens

# Build vocabulary from training data
all_tokens = [tok for text in train_texts for tok in tokenize(text)]
vocab = {word: idx+2 for idx, (word, _) in enumerate(Counter(all_tokens).most_common(5000))}
vocab['<PAD>'] = 0
vocab['<UNK>'] = 1

print(f"Vocabulary size: {len(vocab)}")

In [None]:
# Encode texts
def encode_text(text, max_len=100):
    tokens = tokenize(text)
    encoded = [vocab.get(t, 1) for t in tokens]  # 1 = UNK
    padded = encoded[:max_len] + [0] * (max_len - len(encoded))
    return padded[:max_len]

X_train = torch.tensor([encode_text(t) for t in train_texts])
y_train = torch.tensor(train_labels)
X_test = torch.tensor([encode_text(t) for t in test_texts])
y_test = torch.tensor(test_labels)

print(f"Train shape: {X_train.shape}")

### Exercise B.1: Complete the LSTM Model

In [None]:
# TODO: Complete the LSTM classifier

class LSTMClassifier(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_classes):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
        
        # YOUR CODE HERE: Define LSTM layer
        # Hint: nn.LSTM(input_size, hidden_size, batch_first=True)
        self.lstm = ___
        
        self.fc = nn.Linear(hidden_dim, num_classes)
        self.dropout = nn.Dropout(0.3)
    
    def forward(self, x):
        x = self.embedding(x)
        x = self.dropout(x)
        
        # YOUR CODE HERE: Pass through LSTM and get final hidden state
        # Hint: lstm_out, (hidden, cell) = self.lstm(x)
        
        out = self.fc(___)  # Use last hidden state
        return out

# Create model
lstm_model = LSTMClassifier(
    vocab_size=len(vocab),
    embed_dim=64,
    hidden_dim=64,
    num_classes=2
).to(device)

print(f"LSTM Parameters: {sum(p.numel() for p in lstm_model.parameters()):,}")

In [None]:
# Quick training
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(lstm_model.parameters(), lr=0.001)

# Train for 3 epochs
for epoch in range(3):
    lstm_model.train()
    total_loss = 0
    for batch_X, batch_y in train_loader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)
        optimizer.zero_grad()
        output = lstm_model(batch_X)
        loss = criterion(output, batch_y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {total_loss/len(train_loader):.4f}")

# Evaluate
lstm_model.eval()
with torch.no_grad():
    test_output = lstm_model(X_test.to(device))
    preds = torch.argmax(test_output, dim=1).cpu()
    acc = (preds == y_test).float().mean()
    print(f"\nTest Accuracy: {acc:.4f}")

---

# PART C: GRU - News Classification

**Use Case:** Classify news articles by topic.

**Why GRU?** Fewer parameters than LSTM, faster training.

In [None]:
# Load AG News
ag_news = load_dataset("ag_news")
ag_train = ag_news['train'].shuffle(seed=42).select(range(2000))
ag_test = ag_news['test'].shuffle(seed=42).select(range(500))

ag_labels = {0: 'World', 1: 'Sports', 2: 'Business', 3: 'Sci/Tech'}
print(f"Classes: {list(ag_labels.values())}")

### Exercise C.1: Build GRU Classifier

In [None]:
# TODO: Create a GRU classifier (similar to LSTM but using nn.GRU)

class GRUClassifier(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_classes):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
        
        # YOUR CODE HERE: Define GRU layer
        self.gru = ___
        
        self.fc = nn.Linear(hidden_dim, num_classes)
    
    def forward(self, x):
        x = self.embedding(x)
        
        # YOUR CODE HERE: GRU forward pass
        # Note: GRU returns (output, hidden) - no cell state unlike LSTM
        
        out = self.fc(___)  # Use last hidden state
        return out

# Build vocabulary and encode (reuse tokenize function)
ag_tokens = [tok for item in ag_train for tok in tokenize(item['text'])]
ag_vocab = {word: idx+2 for idx, (word, _) in enumerate(Counter(ag_tokens).most_common(5000))}
ag_vocab['<PAD>'] = 0
ag_vocab['<UNK>'] = 1

def encode_ag(text, vocab, max_len=50):
    tokens = tokenize(text)
    encoded = [vocab.get(t, 1) for t in tokens]
    return (encoded[:max_len] + [0] * max_len)[:max_len]

X_ag_train = torch.tensor([encode_ag(item['text'], ag_vocab) for item in ag_train])
y_ag_train = torch.tensor([item['label'] for item in ag_train])
X_ag_test = torch.tensor([encode_ag(item['text'], ag_vocab) for item in ag_test])
y_ag_test = torch.tensor([item['label'] for item in ag_test])

print(f"AG News - Train: {X_ag_train.shape}, Test: {X_ag_test.shape}")

In [None]:
# Create and train GRU model
gru_model = GRUClassifier(
    vocab_size=len(ag_vocab),
    embed_dim=64,
    hidden_dim=64,
    num_classes=4
).to(device)

print(f"GRU Parameters: {sum(p.numel() for p in gru_model.parameters()):,}")
print(f"(Compare to LSTM: GRU has fewer parameters!)")

---

# PART D: Transformer - Named Entity Recognition

**Use Case:** Extract entities from text.

**Dataset:** CoNLL-2003

In [None]:
# Use pre-trained NER model from Hugging Face
from transformers import pipeline

# Load NER pipeline (uses BERT-based model)
print("Loading NER model...")
ner_pipeline = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
print("Model loaded!")

In [None]:
# Example NER
text = "Apple Inc. was founded by Steve Jobs in Cupertino, California. Tim Cook is the current CEO."

entities = ner_pipeline(text)
print(f"Text: {text}\n")
print("Entities found:")
for ent in entities:
    print(f"  {ent['word']:20} -> {ent['entity_group']:10} (score: {ent['score']:.3f})")

### Exercise D.1: NER on Your Own Texts

In [None]:
# TODO: Write 3 sentences and extract entities
# Include: people, organizations, locations

my_sentences = [
    "___",  # YOUR SENTENCE 1
    "___",  # YOUR SENTENCE 2
    "___",  # YOUR SENTENCE 3
]

for sent in my_sentences:
    if sent != "___":
        print(f"\nText: {sent}")
        entities = ner_pipeline(sent)
        for ent in entities:
            print(f"  {ent['word']:20} -> {ent['entity_group']}")

---

# PART E: LLM - Text Generation with Mistral API

**Use Case:** Conversational AI and Question Answering.

**Setup:** Get a free API key from https://console.mistral.ai/

In [None]:
# TODO: Enter your Mistral API key
# Get free key at: https://console.mistral.ai/

MISTRAL_API_KEY = "___"  # YOUR API KEY HERE

In [None]:
import requests

def query_mistral(prompt, max_tokens=150):
    """Query Mistral API."""
    url = "https://api.mistral.ai/v1/chat/completions"
    headers = {
        "Authorization": f"Bearer {MISTRAL_API_KEY}",
        "Content-Type": "application/json"
    }
    data = {
        "model": "mistral-small-latest",
        "messages": [{"role": "user", "content": prompt}],
        "max_tokens": max_tokens
    }
    
    response = requests.post(url, headers=headers, json=data)
    if response.status_code == 200:
        return response.json()['choices'][0]['message']['content']
    else:
        return f"Error: {response.status_code} - {response.text}"

# Test (only if API key is set)
if MISTRAL_API_KEY != "___":
    response = query_mistral("What is NLP in one sentence?")
    print(f"Mistral: {response}")
else:
    print("Please set your MISTRAL_API_KEY above.")

### Exercise E.1: Compare LLM with Traditional Models

In [None]:
# TODO: Ask Mistral to perform sentiment analysis and compare with our LSTM

test_review = "This movie was absolutely terrible. The acting was bad and the plot made no sense."

# LLM approach
if MISTRAL_API_KEY != "___":
    prompt = f"""Classify the sentiment of this review as 'positive' or 'negative'. 
Just respond with one word.

Review: {test_review}

Sentiment:"""
    
    llm_result = query_mistral(prompt, max_tokens=10)
    print(f"LLM Sentiment: {llm_result}")

# Traditional LSTM approach (if model trained)
try:
    encoded = torch.tensor([encode_text(test_review)]).to(device)
    lstm_model.eval()
    with torch.no_grad():
        lstm_pred = torch.argmax(lstm_model(encoded)).item()
    print(f"LSTM Sentiment: {'positive' if lstm_pred == 1 else 'negative'}")
except:
    print("LSTM model not available")

In [None]:
# TODO: Use LLM for summarization (something traditional models can't easily do)

long_text = """
Natural language processing (NLP) is a subfield of linguistics, computer science, 
and artificial intelligence concerned with the interactions between computers and 
human language, in particular how to program computers to process and analyze large 
amounts of natural language data. The result is a computer capable of understanding 
the contents of documents, including the contextual nuances of the language within them.
"""

if MISTRAL_API_KEY != "___":
    summary_prompt = f"Summarize this in one sentence:\n\n{long_text}"
    summary = query_mistral(summary_prompt, max_tokens=50)
    print(f"Summary: {summary}")

---

## Final Written Questions (Personal Interpretation)

Answer these questions based on YOUR experiments:

### Question 1: Model Architecture Comparison

Compare the parameter counts you observed:
- RNN: ___ parameters
- LSTM: ___ parameters  
- GRU: ___ parameters

**Why does LSTM have more parameters than GRU?** (Hint: think about gates)

**YOUR ANSWER:**

...

### Question 2: RNN vs LSTM for Long Sequences

**Why would LSTM perform better than vanilla RNN for sentiment analysis on long reviews?** Explain the vanishing gradient problem.

**YOUR ANSWER:**

...

### Question 3: Traditional Models vs LLMs

Based on your experiments:
1. **What can LLMs do that LSTM/GRU cannot?**
2. **What are the disadvantages of using LLM APIs?** (Think: cost, latency, privacy)
3. **When would you choose a traditional model over an LLM?**

**YOUR ANSWER:**

1. LLM advantages: ...

2. LLM disadvantages: ...

3. When to use traditional models: ...

---

## Summary

| Model | Strength | Weakness | Best For |
|-------|----------|----------|----------|
| RNN | Simple, fast | Vanishing gradients | Short sequences |
| LSTM | Long-term memory | More parameters | Long text classification |
| GRU | Efficient, fast | Less expressive | When speed matters |
| Transformer | Parallel, contextual | Expensive | NER, QA, many tasks |
| LLM | Versatile, zero-shot | API cost, latency | Complex reasoning |

---

## Submission

- [ ] All code exercises completed
- [ ] All written questions answered
- [ ] Mistral API tested (or explained why not)
- [ ] **Push to Git and send link to: yoroba93@gmail.com**