👁️ Memory time. Let's build a brain that **remembers**, not just computes.

---

# 🧪 `08_lab_memory_augmented_net_tiny_tasks.ipynb`  
### 📁 `04_advanced_architectures`  
> Implement a **tiny Neural Turing Machine (NTM)** or **Memory-Augmented Neural Network** to solve classic tasks like **copy**, **repeat**, or **pattern recall**.  
Understand how **external memory + attention** creates **differentiable memory systems**.

---

## 🎯 Learning Goals

- Build a **differentiable memory controller**  
- Learn how NTMs solve **copy/repeat tasks**  
- Train on **simple toy tasks** (low memory/GPU needs)  
- Visualize **read/write attention over time**

---

## 💻 Runtime Design

| Spec                | Setting             |
|---------------------|---------------------|
| Dataset             | Synthetic copy task  
| Device              | ✅ Colab / CPU / GPU  
| Memory              | ✅ <1GB  
| Duration            | 🏃 Fast (1–2 mins)  
| Architecture        | RNN + External Mem  

---

## 🧠 Section 1: Setup & Imports

```python
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
import numpy as np
```

---

## 📄 Section 2: Create Toy Copy Task Dataset

```python
class CopyDataset(Dataset):
    def __init__(self, seq_len=5, num_samples=1000):
        self.seq_len = seq_len
        self.samples = []
        for _ in range(num_samples):
            seq = torch.randint(0, 8, (seq_len,))
            self.samples.append(seq)

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        seq = self.samples[idx]
        # Input = seq + zeros + delimiter token (9)
        inp = torch.cat([seq, torch.tensor([9]), torch.zeros(len(seq), dtype=torch.long)])
        target = torch.cat([torch.zeros(len(seq)+1, dtype=torch.long), seq])
        return inp, target

dataset = CopyDataset()
loader = DataLoader(dataset, batch_size=1, shuffle=True)
```

---

## 🔧 Section 3: Define Memory-Augmented Model (Simplified NTM)

```python
class TinyNTM(nn.Module):
    def __init__(self, vocab_size, hidden_size=32, memory_size=16, memory_width=16):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, hidden_size)
        self.rnn = nn.GRU(hidden_size + memory_width, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size + memory_width, vocab_size)

        # Memory
        self.memory = torch.randn(1, memory_size, memory_width)

        # Read/Write attention
        self.read_head = nn.Linear(hidden_size, memory_size)

    def forward(self, x):
        B, T = x.size()
        x = self.embed(x)

        memory = self.memory.repeat(B, 1, 1)  # [B, N, W]
        read = torch.zeros(B, 1, memory.size(2)).to(x.device)  # initial read vector

        outputs = []

        h = torch.zeros(1, B, x.size(2)).to(x.device)
        for t in range(T):
            xt = x[:, t].unsqueeze(1)
            rnn_in = torch.cat([xt, read], dim=2)
            out, h = self.rnn(rnn_in, h)

            # Read head: attention weights
            attn = torch.softmax(self.read_head(out), dim=2)
            read = torch.bmm(attn, memory)

            # Output
            out = torch.cat([out, read], dim=2)
            logits = self.fc(out)
            outputs.append(logits)

        return torch.cat(outputs, dim=1)  # [B, T, V]
```

---

## 🏋️ Section 4: Train on Copy Task

```python
device = "cuda" if torch.cuda.is_available() else "cpu"
model = TinyNTM(vocab_size=10).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

for epoch in range(10):
    model.train()
    total_loss = 0
    for inp, tgt in loader:
        inp, tgt = inp.to(device), tgt.to(device)
        out = model(inp)
        loss = criterion(out.view(-1, 10), tgt.view(-1))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f"Epoch {epoch+1}: Loss = {total_loss:.4f}")
```

---

## 👀 Section 5: Visualize Prediction

```python
model.eval()
sample = dataset[0][0].unsqueeze(0).to(device)
preds = model(sample).argmax(dim=-1).cpu().squeeze()

print("Input:     ", sample.cpu().squeeze().tolist())
print("Predicted: ", preds.tolist())
```

---

## ✅ Wrap-Up Recap

| Feature                         | ✅ |
|----------------------------------|----|
| External memory used             | ✅ |
| Differentiable attention access  | ✅ |
| Copy task solved via memory      | ✅ |
| CPU/Colab safe, fast training    | ✅ |
| Easy to scale to complex tasks   | ✅ |

---

## 🧠 What You Learned

- Memory-Augmented networks **extend RNNs** with read/write capacity  
- Attention = soft pointer to memory slots  
- Simple NTM can **copy sequences**, **recall patterns**, even simulate Turing-style behavior  
- This is the **foundation of differentiable memory agents**

---

Next lab up is `09_lab_diffusion_model_toy_image_gen.ipynb` —  
Shall we build a **tiny denoising diffusion pipeline** and visualize how it generates MNIST digits from pure noise?