# Training & Validation Guide

### 1) Overview: why PyTorch uses explicit loops

TensorFlow's model.fit is convenient: a high-level API that hides training details. PyTorch gives you lower-level control by design: you explicitly write the training loop. This is powerful for debugging, custom losses, per-batch logic, dynamic graphs, complex regularization, and research experiments.

Writing the loop yourself forces you to understand the lifecycle of data, model, gradient computation, and optimization. It looks more code initially, but it's explicit and flexible.

### 2) Quick PyTorch 

Tensors: `torch.Tensor` is the central object. Move to GPU with `.to(device)` or `.cuda()`.

Autograd: When you perform operations on tensors with `requires_grad=True`, PyTorch builds a dynamic computation graph. Calling `.backward()` computes gradients and stores them in .grad.

Modules: `torch.nn.Module` is a base class for models. Use `nn.Sequential`, or define class `MyModel(nn.Module)` and implement `forward(self, x)`.

Optimizers: From `torch.optim`. They update the model parameters which are returned by `model.parameters()`.

DataLoader: Wraps a Dataset to provide minibatches, shuffling, and parallel data loading. Use `torch.utils.data.DataLoader`.

Device management: device = `torch.device('cuda' if torch.cuda.is_available() else 'cpu')`.

### STEPS NEEDED TO TRAIN A MODEL:

1️⃣ Have our data, moving to GPU/CPU is optional

`inputs, targets = inputs.to(device), targets.to(device)`

2️⃣ Zero out old gradients

`optimizer.zero_grad()`

3️⃣ Forward pass (compute predictions)

`outputs = model(inputs)`

4️⃣ Compute the loss

`loss = criterion(outputs, targets)`

5️⃣ Backward pass (compute gradients)

`loss.backward()`  

6️⃣ Update weights

`optimizer.step()`


In [None]:
# Minimal but complete PyTorch training + validation example (CIFAR10)
import os
import random
import numpy as np
from pathlib import Path
from tqdm.notebook import tqdm


import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision
from torchvision import transforms


# ------------------- Helpers -------------------


def set_seed(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)


set_seed(42)

In [None]:
# ------------------- Device -------------------


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)


# ------------------- Data -------------------


transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
])


transform_val = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
])


train_ds = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
val_ds = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_val)


train_loader = DataLoader(train_ds, batch_size=128, shuffle=True, num_workers=4, pin_memory=True)
val_loader = DataLoader(val_ds, batch_size=256, shuffle=False, num_workers=4, pin_memory=True)

In [None]:
# ------------------- Model -------------------


# Simple model — use torchvision.models for more serious work
class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.net = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(64 * 8 * 8, 256),
            nn.ReLU(inplace=True),
            nn.Linear(256, num_classes),
)


    def forward(self, x):
        return self.net(x)


model = SimpleCNN(num_classes=10).to(device)

In [None]:
# ------------------- Loss, Optimizer, Scheduler -------------------


criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

In [None]:
from collections import defaultdict


def train_one_epoch(model, dataloader, criterion, optimizer, device, scaler=None, grad_clip=None):
    model.train()
    running_loss = 0.0
    running_correct = 0
    total = 0

    for inputs, targets in tqdm(dataloader, desc='train', leave=False):
        inputs = inputs.to(device, non_blocking=True)
        targets = targets.to(device, non_blocking=True)


        optimizer.zero_grad()


        if scaler is not None:
            # mixed-precision
            with torch.cuda.amp.autocast():
                outputs = model(inputs)
                loss = criterion(outputs, targets)
            scaler.scale(loss).backward()
            if grad_clip is not None:
                scaler.unscale_(optimizer)
                torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
            scaler.step(optimizer)
            scaler.update()
        else:
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            if grad_clip is not None:
                torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
            optimizer.step()

        running_loss += float(loss.item()) * inputs.size(0)
        preds = outputs.argmax(dim=1)
        running_correct += (preds == targets).sum().item()
        total += inputs.size(0)


    epoch_loss = running_loss / total
    epoch_acc = running_correct / total
    return epoch_loss, epoch_acc


def validate(model, dataloader, criterion, device):
    model.eval()
    running_loss = 0.0
    running_correct = 0
    total = 0


    with torch.no_grad():
        for inputs, targets in tqdm(dataloader, desc='val', leave=False):
            inputs = inputs.to(device, non_blocking=True)
            targets = targets.to(device, non_blocking=True)


            outputs = model(inputs)
            loss = criterion(outputs, targets)


            running_loss += float(loss.item()) * inputs.size(0)
            preds = outputs.argmax(dim=1)
            running_correct += (preds == targets).sum().item()
            total += inputs.size(0)


    epoch_loss = running_loss / total
    epoch_acc = running_correct / total
    return epoch_loss, epoch_acc

In [None]:
# ------------------- Training loop -------------------


num_epochs = 3
scaler = torch.cuda.amp.GradScaler() if torch.cuda.is_available() else None
best_val_acc = 0.0
history = defaultdict(list)


for epoch in range(num_epochs):
    print(f"Epoch {epoch+1}/{num_epochs}")
    train_loss, train_acc = train_one_epoch(model, train_loader, criterion, optimizer, device, scaler=scaler, grad_clip=5.0)
    val_loss, val_acc = validate(model, val_loader, criterion, device)


    scheduler.step()


    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)


    print(f" train_loss: {train_loss:.4f} train_acc: {train_acc:.4f}")
    print(f" val_loss: {val_loss:.4f} val_acc: {val_acc:.4f}")


    # Save best model
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        torch.save({'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'epoch': epoch}, 'best_checkpoint.pth')
        print(' Saved best checkpoint')


print('Training finished')

### 3) Detailed training loop explained

Core steps per training batch:

`model.train()` — enable training mode (affects dropout, batchnorm, etc.)

Move inputs and targets to device: `inputs = inputs.to(device)`

`optimizer.zero_grad()` — clear gradients from previous batch

Forward: `outputs = model(inputs)`

Compute loss: `loss = criterion(outputs, targets)`

Backward: `loss.backward()` (or `scaler.scale(loss).backward()` for AMP)

Optionally clip gradients

`optimizer.step()` — update parameters

Update any metrics & accumulate loss

Why `zero_grad()`? PyTorch accumulates gradients by default — each backward() adds to .grad.

Why `model.train()` vs `model.eval()`? Certain layers like dropout and batchnorm behave differently: `train()` enables training behavior; `eval()` freezes it.

### 4) Validation loop, metrics, and torch.no_grad()

Validation should not modify model parameters or track gradients. Use with `torch.no_grad()`: to disable gradient tracking (saves memory and compute). Also call `model.eval()` to set evaluation mode.

Compute metrics (accuracy, precision/recall, F1) as required. For multi-class classification, argmax on logits is simple accuracy.

Example metrics: running loss, accuracy, confusion matrix; for regression: MAE, MSE.

## Your turn!



Select a dataset from internet and code a Pytorch model on it, you can choose if it will be a regression or classification task!


The goal is to get used to the Pytorch way of coding