<a href="https://colab.research.google.com/github/debojit11/ml_nlp_dl_transformers/blob/main/DL_week_13.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 13: RNNs & LSTMs (Sequence Modeling in NLP)

# **SECTION 1: Welcome & Objectives**

In [1]:
print("Welcome to Week 13!")
print("This week, you'll:")
print("- Learn how RNNs and LSTMs model sequences")
print("- Understand why they are useful for NLP tasks")
print("- Build a simple LSTM to predict the next word in a sentence")

Welcome to Week 13!
This week, you'll:
- Learn how RNNs and LSTMs model sequences
- Understand why they are useful for NLP tasks
- Build a simple LSTM to predict the next word in a sentence


# **SECTION 2: What are RNNs & LSTMs?**

### Why RNNs & LSTMs?
In NLP, order matters.
Sentences are sequences of words — and each word depends on the previous ones.

Unlike traditional models, RNNs and LSTMs maintain a hidden state that carries forward information.

# 🧠 Week 13 – RNNs & LSTMs for NLP

---

## 🚀 Why Neural Networks for Sequences?

Unlike traditional ML models, neural nets can **learn from sequences**, making them ideal for:

- Text generation
- Language modeling
- Machine translation
- Speech recognition

---

## 🔁 Recurrent Neural Networks (RNNs)

RNNs are designed to handle **sequential data** by maintaining a **hidden state** that gets updated at each time step.

Imagine reading a sentence word by word — RNNs "remember" the context as they read.

---

## 🔧 How RNNs Work

At time step $( t $):

- Input: $( x_t $)
- Previous hidden state: $( h_{t-1} $)
- Output: $( h_t = \tanh(Wx_t + Uh_{t-1} + b) $)

They share weights across time, allowing them to process any length of input!

---

## 🧨 Problem: Vanishing Gradients

RNNs struggle to learn **long-term dependencies** — gradients shrink during backpropagation.

---

## 💡 LSTM to the Rescue

LSTMs (Long Short-Term Memory networks) are a special kind of RNN that can **learn long-term dependencies** using gates:

- **Forget Gate**: What to forget
- **Input Gate**: What new info to add
- **Output Gate**: What to output

They maintain a **cell state** that flows with minimal modification.

---

## 🧠 LSTM Cell Diagram

```
               Forget Gate     Input Gate      Output Gate
                   |               |                 |
 h(t-1), x(t) ---> o ---> tanh --> o ---> tanh --> o ---> h(t)
                   |                              /
                   |---------------------------->
                            Cell State
```

---

## ✍️ Example: Predict Next Word

Toy corpus: `"hello how are you hello how is"`

We’ll train an LSTM to predict the next word given a sequence:

- `hello how → are`
- `how are → you`

This is called a **language model**.

---

## ⚙️ Code Breakdown

- Tokenize text
- Create input-output pairs (sequence to next word)
- Use an embedding layer + LSTM + dense output
- Predict next word

---


# **SECTION 3: Simple Token Sequence Example**

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

In [3]:
# Toy corpus: "hello how are you hello how is"
# Goal: given "hello how", predict "are" or "is"
corpus = "hello how are you hello how is"
vocab = list(set(corpus.split()))
vocab_size = len(vocab)
word2idx = {w: i for i, w in enumerate(vocab)}
idx2word = {i: w for w, i in word2idx.items()}


In [4]:
# Prepare dataset
sequence_length = 2
data = []
words = corpus.split()
for i in range(len(words) - sequence_length):
    seq = words[i:i+sequence_length]
    target = words[i+sequence_length]
    data.append((seq, target))

In [5]:
# Convert to tensors
def vectorize(seq):
    return torch.tensor([word2idx[w] for w in seq], dtype=torch.long)

In [6]:
def get_batch():
    inputs = torch.stack([vectorize(x) for x, y in data])
    targets = torch.tensor([word2idx[y] for x, y in data], dtype=torch.long)
    return inputs, targets

X, y = get_batch()

# **SECTION 4: Build a Simple LSTM Model**

In [7]:
class NextWordPredictor(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, vocab_size)

    def forward(self, x):
        x = self.embedding(x)
        _, (h_n, _) = self.lstm(x)
        out = self.fc(h_n[-1])
        return out

In [8]:
model = NextWordPredictor(vocab_size, embed_dim=10, hidden_dim=20)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# **SECTION 5: Training**

In [9]:
for epoch in range(300):
    optimizer.zero_grad()
    out = model(X)
    loss = loss_fn(out, y)
    loss.backward()
    optimizer.step()
    if epoch % 50 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

Epoch 0, Loss: 1.6239
Epoch 50, Loss: 0.2822
Epoch 100, Loss: 0.2792
Epoch 150, Loss: 0.2785
Epoch 200, Loss: 0.2781
Epoch 250, Loss: 0.2779


# **SECTION 6: Test the Model**

In [10]:
def predict_next(words):
    input_tensor = vectorize(words).unsqueeze(0)  # Add batch dim
    output = model(input_tensor)
    predicted_idx = torch.argmax(output, dim=1).item()
    return idx2word[predicted_idx]

print("Test: hello how ->", predict_next(["hello", "how"]))
print("Test: how are ->", predict_next(["how", "are"]))
print("Test: how is ->", predict_next(["how", "is"]))

Test: hello how -> is
Test: how are -> you
Test: how is -> you


## 🧠 Why This Matters in NLP

LSTMs are used in:

| Task                  | Role of LSTM                        |
|-----------------------|--------------------------------------|
| Language Modeling     | Predict next word in a sentence     |
| Text Generation       | Generate text one word at a time    |
| Named Entity Recognition | Label each token in a sequence   |
| Machine Translation   | Encode source → Decode target       |

---

## ✅ Pros of LSTM

- Learns long-term dependencies
- Good for sequential data
- Better than vanilla RNNs for text

## ⚠️ Limitations

- Sequential → slow to train
- Hard to parallelize
- Still can struggle with very long texts

---

## 🔮 What’s Next?

In Week 14, we’ll introduce **word embeddings** like **Word2Vec and GloVe**, which encode words into dense vectors capturing semantic meaning — a game changer for NLP!

Stay tuned 👇

➡️ Week 14: Word Embeddings 🔤📐


# **SECTION 7: What’s Coming Next?**

### Coming Up: Week 14 - Word Embeddings
Now that you’ve built your first LSTM...
➡️ Next week, we’ll dive into **Word Embeddings** (Word2Vec, GloVe) to represent words in dense, meaningful vector spaces.
These are critical to feeding real-world NLP data into deep models 🔍🧠