<a href="https://colab.research.google.com/github/RichardJPovinelli/Neural_Networks_Course/blob/main/LSTM_vs_GRU_MicroLab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Today’s Lab: LSTM vs GRU (Colab‑friendly)

**Goal:** Train both models on a tiny synthetic sequence task and compare behavior as the dependency length grows.

**How to use this notebook**
1. *(Optional)* In Colab: **Runtime → Change runtime type → GPU**. If none is available, stay on CPU.
2. Change **ONE** parameter in the **Config** cell (`seq_len`, `gap_k`, `hidden_size`, or `lr`).
3. Set seeds to 0/1/2 and **Run all**.
4. Fill the small results table and write **two sentences** in the final cell.
5. Submit a screenshot of your results (or download this notebook) via the LMS.

> If Colab is slow: reduce `hidden_size` (e.g., 32) or `epochs` (e.g., 3). CPU is fine for this task.

---

*Instructor note:* Once you upload this to GitHub, you can add an “Open in Colab” badge like:

`[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YOUR_ORG/YOUR_REPO/blob/main/path/to/LSTM_vs_GRU_MicroLab.ipynb)`


In [1]:
# Environment sanity for Colab/local
import sys, platform, random, math
import numpy as np
import torch, torch.nn as nn, torch.optim as optim
from torch.utils.data import Dataset, DataLoader

def seed_all(s=0):
    random.seed(s); np.random.seed(s); torch.manual_seed(s)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(s)

seed_all(0)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Python:', sys.version.split()[0], '| Torch:', torch.__version__, '| Device:', device)


Python: 3.12.12 | Torch: 2.8.0+cu126 | Device: cpu


In [2]:
# ---- Config (change ONE for the activity) ----
cell_type    = 'lstm'   # 'lstm' or 'gru'
seq_len      = 30       # try 10, 30, 60
gap_k        = 8        # how far back the label depends on
hidden_size  = 64
lr           = 1e-3
epochs       = 5
batch_size   = 64
seeds        = [0,1,2]  # activity: run three seeds
device       = 'cuda' if torch.cuda.is_available() else 'cpu'

print({k:v for k,v in list(locals().items()) if k in ['cell_type','seq_len','gap_k','hidden_size','lr','epochs','batch_size','seeds','device']})


{'device': 'cpu', 'cell_type': 'lstm', 'seq_len': 30, 'gap_k': 8, 'hidden_size': 64, 'lr': 0.001, 'epochs': 5, 'batch_size': 64, 'seeds': [0, 1, 2]}


In [3]:
# ---- Synthetic dataset: binary sequences; label depends on token k steps back ----
# Rule: label_t = XOR(current_bit, bit_from_k_steps_ago). Many-to-many labeling.
import torch

def make_sequence(seq_len, gap_k):
    x = torch.randint(0, 2, (seq_len,))
    y = torch.zeros(seq_len, dtype=torch.long)
    for t in range(seq_len):
        prev = x[t-gap_k] if t-gap_k >= 0 else 0
        y[t] = int(x[t].item() ^ int(prev))
    return x.float(), y

class BitSeq(Dataset):
    def __init__(self, n_samples, seq_len, gap_k):
        self.data = [make_sequence(seq_len, gap_k) for _ in range(n_samples)]
    def __len__(self):
        return len(self.data)
    def __getitem__(self, idx):
        x,y = self.data[idx]
        return x.unsqueeze(-1), y  # shapes [T,1], [T]

def make_dataloaders(seq_len, gap_k, batch_size, seed=0):
    seed_all(seed)
    train_ds = BitSeq(2000, seq_len, gap_k)
    val_ds   = BitSeq(400,  seq_len, gap_k)
    train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
    val_dl   = DataLoader(val_ds,   batch_size=batch_size)
    return train_dl, val_dl


In [4]:
# ---- Model with LSTM/GRU toggle ----
class SeqModel(nn.Module):
    def __init__(self, input_size=1, hidden_size=64, cell='lstm'):
        super().__init__()
        self.cell_type = cell
        if cell == 'lstm':
            self.rnn = nn.LSTM(input_size, hidden_size, batch_first=True)
        elif cell == 'gru':
            self.rnn = nn.GRU(input_size, hidden_size, batch_first=True)
        else:
            raise ValueError("cell must be 'lstm' or 'gru'")
        self.head = nn.Linear(hidden_size, 2)
    def forward(self, x):
        out, _ = self.rnn(x)            # [B,T,H]
        logits = self.head(out)         # [B,T,2]
        return logits


In [5]:
loss_fn = nn.CrossEntropyLoss()

def run_epoch(model, opt, dl, train=True, device='cpu'):
    if train: model.train()
    else: model.eval()
    total, correct, total_loss = 0, 0, 0.0
    for xb, yb in dl:
        xb, yb = xb.to(device), yb.to(device)
        if train:
            opt.zero_grad()
        logits = model(xb)
        loss = loss_fn(logits.view(-1, 2), yb.view(-1))
        if train:
            loss.backward(); opt.step()
        total_loss += loss.item() * yb.numel()
        preds = logits.argmax(-1)
        correct += (preds == yb).sum().item()
        total += yb.numel()
    return total_loss/total, correct/total

def train_once(cell_type, seq_len, gap_k, hidden_size, lr, epochs, batch_size, seed, device='cpu'):
    train_dl, val_dl = make_dataloaders(seq_len, gap_k, batch_size, seed)
    model = SeqModel(hidden_size=hidden_size, cell=cell_type).to(device)
    opt = optim.Adam(model.parameters(), lr=lr)
    tr_hist, va_hist = [], []
    for ep in range(1, epochs+1):
        tr_loss, tr_acc = run_epoch(model, opt, train_dl, True, device)
        va_loss, va_acc = run_epoch(model, opt, val_dl, False, device)
        tr_hist.append((tr_loss, tr_acc)); va_hist.append((va_loss, va_acc))
        print(f"ep {ep:02d} | train acc {tr_acc:.3f} | val acc {va_acc:.3f}")
    return va_hist[-1][1]  # final val accuracy


In [6]:
# Quick demo (optional): one run to show the loop working
print('Demo run with current config ...')
acc = train_once(cell_type, seq_len, gap_k, hidden_size, lr, epochs, batch_size, seed=seeds[0], device=device)
print(f'Final val acc (seed {seeds[0]}): {acc:.3f}')


Demo run with current config ...
ep 01 | train acc 0.598 | val acc 0.598
ep 02 | train acc 0.629 | val acc 0.630
ep 03 | train acc 0.633 | val acc 0.630
ep 04 | train acc 0.636 | val acc 0.639
ep 05 | train acc 0.659 | val acc 0.680
Final val acc (seed 0): 0.680


In [7]:
import csv
from pathlib import Path

print('\n=== Activity: run three seeds and record results ===')
results = []
for s in seeds:
    print(f"\n--- {cell_type.upper()} | seed {s} ---")
    acc = train_once(cell_type, seq_len, gap_k, hidden_size, lr, epochs, batch_size, seed=s, device=device)
    results.append({'cell': cell_type, 'seed': s, 'val_acc': float(acc)})

print('\nResults table:')
print('cell\tseed\tval_acc')
for r in results:
    print(f"{r['cell']}\t{r['seed']}\t{r['val_acc']:.3f}")

# Save CSV summary (works on Colab or local)
out_dir = '/content' if 'google.colab' in str(getattr(sys.modules.get('google'), '__name__', '')) else '.'
out_path = Path(out_dir)/'lstm_gru_activity_summary.csv'
with open(out_path, 'w', newline='') as f:
    w = csv.DictWriter(f, fieldnames=['cell','seed','val_acc'])
    w.writeheader(); w.writerows(results)
print(f"\nSaved CSV summary → {out_path}")



=== Activity: run three seeds and record results ===

--- LSTM | seed 0 ---
ep 01 | train acc 0.598 | val acc 0.598
ep 02 | train acc 0.629 | val acc 0.630
ep 03 | train acc 0.633 | val acc 0.630
ep 04 | train acc 0.636 | val acc 0.639
ep 05 | train acc 0.659 | val acc 0.680

--- LSTM | seed 1 ---
ep 01 | train acc 0.552 | val acc 0.629
ep 02 | train acc 0.632 | val acc 0.645
ep 03 | train acc 0.633 | val acc 0.645
ep 04 | train acc 0.638 | val acc 0.649
ep 05 | train acc 0.650 | val acc 0.667

--- LSTM | seed 2 ---
ep 01 | train acc 0.535 | val acc 0.607
ep 02 | train acc 0.623 | val acc 0.636
ep 03 | train acc 0.635 | val acc 0.636
ep 04 | train acc 0.638 | val acc 0.638
ep 05 | train acc 0.658 | val acc 0.672

Results table:
cell	seed	val_acc
lstm	0	0.680
lstm	1	0.667
lstm	2	0.672

Saved CSV summary → lstm_gru_activity_summary.csv


### Submit here

**Paste your table here and write two sentences:**
1) What changed when you modified the one parameter you selected?
2) Which cell (LSTM or GRU) handled the change better? Offer a short hypothesis why.

*If you finish early (stretch):*
- Try `bidirectional=True` in the RNN init and explain why it helps here but is invalid for real-time streaming.
- Switch to **many-to-one** prediction by using only `logits[:,-1]` and predicting the final label.
- Replace Adam with SGD+momentum and compare convergence speed for the same wall-clock budget.


### Quick checks for understanding
1) In an LSTM, which gate primarily controls **exposure** of cell state to the hidden state at time *t*?
2) If you double `seq_len` (holding `gap_k` fixed), what failure pattern do you expect in a vanilla RNN vs an LSTM/GRU? Why?
3) With `gap_k=15` and `hidden_size=16`, which model (LSTM/GRU) is more likely to maintain >70% accuracy? Defend your pick.
4) In the training code, why do we use `logits.view(-1, 2)` and `yb.view(-1)` before computing loss?
