# Mini-Batch Stochastic Gradient Descent (SGD) in PyTorch

This notebook demonstrates **mini-batch SGD** using a simple logistic-regression model on a toy dataset.

## 🔹 Learning Goals
- Understand the difference between **vanilla SGD** and **mini-batch SGD**.
- Train with PyTorch’s **`DataLoader`** (primary) and see a **manual mini-batch** implementation (appendix).
- Use tidy helpers: `train_one_epoch(...)` and `evaluate(...)`.
- Track **loss** and **accuracy** succinctly.


In [2]:
import math
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

torch.manual_seed(0)

# Hyperparameters
lr       = 0.1
epochs   = 100
batch_sz = 16


In [9]:
# 2D toy data
N, D = 100, 2
X = torch.randn(N, D)

y = (X[:, 0] + X[:, 1] > 0).float().unsqueeze(1)

# Dataset / DataLoader
ds     = TensorDataset(X, y)
loader = DataLoader(ds, batch_size=batch_sz, shuffle=True, drop_last=False)


In [10]:
# Logistic regression
model    = nn.Linear(D, 1)
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=lr)


In [5]:
def train_one_epoch(loader, model, criterion, optimizer):
    model.train()
    total, n = 0.0, 0
    for xb, yb in loader:
        optimizer.zero_grad()
        loss = criterion(model(xb), yb)
        loss.backward()
        optimizer.step()
        total += loss.item() * xb.size(0)
        n += xb.size(0)
    return total / n

@torch.no_grad()
def evaluate(X, y, model):
    model.eval()
    preds = (torch.sigmoid(model(X)) >= 0.5).float()
    return (preds == y).float().mean().item()


## 3A. Mini-Batch SGD with `DataLoader` (Primary)

We train using PyTorch’s built-in **`DataLoader`**, which handles:
- shuffling each epoch
- efficient mini-batching

**SGD update:**
$$
\theta \leftarrow \theta - \eta \, \nabla_\theta \mathcal{L}(\theta;\text{batch})
$$


In [6]:
# Re-init model/optim for a clean run
model     = nn.Linear(D, 1)
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=lr)

for epoch in range(1, epochs + 1):
    avg_loss = train_one_epoch(loader, model, criterion, optimizer)
    if epoch % 10 == 0:
        acc = evaluate(X, y, model) * 100
        print(f"Epoch {epoch:3d} | loss={avg_loss:.4f} | acc={acc:.2f}%")


Epoch  10 | loss=0.3581 | acc=91.00%
Epoch  20 | loss=0.2695 | acc=95.00%
Epoch  30 | loss=0.2317 | acc=96.00%
Epoch  40 | loss=0.2086 | acc=96.00%
Epoch  50 | loss=0.1925 | acc=98.00%
Epoch  60 | loss=0.1806 | acc=98.00%
Epoch  70 | loss=0.1712 | acc=98.00%
Epoch  80 | loss=0.1636 | acc=98.00%
Epoch  90 | loss=0.1570 | acc=98.00%
Epoch 100 | loss=0.1515 | acc=98.00%


## 3B. Manual Mini-Batch (No `DataLoader`)

For learning only: re-implement mini-batching by hand (shuffle indices, slice batches).  
This mirrors what `DataLoader` does, but is more verbose.


In [8]:
# Manual mini-batching (no DataLoader), with optimizer-based updates (still PyTorch SGD)

# Fresh model to compare fairly
model2     = nn.Linear(D, 1)
criterion2 = nn.BCEWithLogitsLoss()
optimizer2 = torch.optim.SGD(model2.parameters(), lr=lr)

num_batches = math.ceil(N / batch_sz)

for epoch in range(1, 21):  # shorter appendix run
    perm = torch.randperm(N)
    total_loss = 0.0

    for i in range(0, N, batch_sz):
        idx = perm[i:i+batch_sz]
        xb, yb = X[idx], y[idx]

        optimizer2.zero_grad()
        logits = model2(xb)
        loss   = criterion2(logits, yb)
        loss.backward()
        optimizer2.step()

        total_loss += loss.item() * xb.size(0)

    if epoch % 5 == 0 or epoch == 1:
        avg_loss = total_loss / N
        acc = evaluate(X, y, model2) * 100
        print(f"[Manual] Epoch {epoch:02d} | loss={avg_loss:.4f} | acc={acc:.2f}%")


[Manual] Epoch 01 | loss=0.8691 | acc=44.00%
[Manual] Epoch 05 | loss=0.4847 | acc=90.00%
[Manual] Epoch 10 | loss=0.3526 | acc=96.00%
[Manual] Epoch 15 | loss=0.2987 | acc=94.00%
[Manual] Epoch 20 | loss=0.2672 | acc=94.00%


## ✅ Results & Notes

- Both **3A (DataLoader)** and **3B (manual)** converge quickly on this toy dataset.
- Prefer **3A** in real projects: it’s cleaner, faster to write, and less error-prone.
- This notebook shows the **mechanics of mini-batch SGD** and a tidy training pattern using helpers.
