<div style="text-align:left;">
  <a href="https://code213.tech/" target="_blank">
    <img src="code213.PNG" alt="code213">
  </a>
  <p><em>prepared by Latreche Sara</em></p>
</div>

# Training Loops in PyTorch

Training a neural network involves repeatedly performing a **forward pass**, computing a **loss**, performing **backpropagation**, and updating the model parameters.  

PyTorch provides flexibility to write **custom training loops**, which is useful for:  
- Debugging models  
- Implementing custom optimizers or learning schedules  
- Controlling how batches are processed  

In this notebook, we will cover:  
- Writing a simple training loop from scratch  
- Using batches with **DataLoader**  
- Updating weights using an **optimizer**  
- Monitoring loss during training  


## Table of Contents

- [1 - Components of a Training Loop](#1)
- [2 - Writing a Basic Training Loop](#2)
- [3 - Using Optimizers](#3)
- [4 - Using DataLoader for Mini-Batches](#4)
- [5 - Monitoring Loss and Accuracy](#5)
- [6 - Practice Exercises](#6)


<a name='1'></a>
## 1 - Components of a Training Loop

A typical PyTorch training loop has four main components:

1. **Forward Pass**  
   - Pass input data through the model to get predictions.  
   - Example: `outputs = model(inputs)`

2. **Loss Computation**  
   - Compare predictions with the target labels using a loss function.  
   - Example: `loss = loss_fn(outputs, labels)`

3. **Backward Pass (Backpropagation)**  
   - Compute gradients of the loss with respect to model parameters.  
   - Example: `loss.backward()`

4. **Optimizer Step (Parameter Update)**  
   - Update model parameters based on computed gradients.  
   - Example:  
     ```python
     optimizer.step()
     optimizer.zero_grad()  # Reset gradients for next iteration
     ```

**Key Notes:**  
- Always reset gradients (`optimizer.zero_grad()`) before the next forward pass.  
- Loss and metrics can be logged to monitor training progress.  


<a name='2'></a>
## 2 - Writing a Basic Training Loop

Here we demonstrate a minimal training loop in PyTorch using a simple **linear model** and **MSE loss**.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

# Sample dataset: y = 2x + 1
x_train = torch.tensor([[1.0], [2.0], [3.0], [4.0]])
y_train = torch.tensor([[3.0], [5.0], [7.0], [9.0]])

# Define a simple linear model
model = nn.Linear(1, 1)

# Loss function
loss_fn = nn.MSELoss()

# Optimizer
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop for 100 epochs
for epoch in range(100):
    # Forward pass
    outputs = model(x_train)
    loss = loss_fn(outputs, y_train)
    
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    
    # Update parameters
    optimizer.step()
    
    if (epoch+1) % 20 == 0:
        print(f"Epoch [{epoch+1}/100], Loss: {loss.item():.4f}")

# Test the model
print("Predicted output after training:", model(x_train).detach())


Epoch [20/100], Loss: 0.0205
Epoch [40/100], Loss: 0.0074
Epoch [60/100], Loss: 0.0066
Epoch [80/100], Loss: 0.0058
Epoch [100/100], Loss: 0.0052
Predicted output after training: tensor([[2.8843],
        [4.9439],
        [7.0036],
        [9.0632]])


<a name='3'></a>
## 3 - Using Optimizers

Optimizers are responsible for **updating model parameters** based on computed gradients.  
PyTorch provides many built-in optimizers in `torch.optim`.



### 1. Stochastic Gradient Descent (SGD)

- Update rule:  

$$
\theta = \theta - \eta \cdot \nabla_\theta L
$$  

where:  
- $\theta$ = model parameters  
- $\eta$ = learning rate  
- $\nabla_\theta L$ = gradient of the loss w.r.t parameters  



### 2. SGD with Momentum

- Update rule with momentum:

$$
v_t = \gamma v_{t-1} + \eta \nabla_\theta L \\
\theta = \theta - v_t
$$  

where $\gamma$ is the momentum factor.



### 3. Adam (Adaptive Moment Estimation)

- Combines momentum and adaptive learning rates:

$$
m_t = \beta_1 m_{t-1} + (1-\beta_1) \nabla_\theta L \\
v_t = \beta_2 v_{t-1} + (1-\beta_2) (\nabla_\theta L)^2 \\
\hat{\theta} = \theta - \eta \frac{m_t / (1-\beta_1^t)}{\sqrt{v_t / (1-\beta_2^t)} + \epsilon}
$$


### Key Points

- Always pass `model.parameters()` to the optimizer.  
- Call `optimizer.zero_grad()` **before each backward pass** to avoid gradient accumulation.  
- Choose optimizer and learning rate carefully depending on the problem.  


In [2]:
import torch.nn as nn
import torch.optim as optim

# Example model
model = nn.Linear(2, 1)

# 1. SGD
optimizer_sgd = optim.SGD(model.parameters(), lr=0.01)

# 2. SGD with Momentum
optimizer_momentum = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# 3. Adam
optimizer_adam = optim.Adam(model.parameters(), lr=0.001)

<a name='4'></a>
## 4 - Using DataLoader for Mini-Batches

Training on the **entire dataset at once** can be inefficient, especially for large datasets.  
Instead, we use **mini-batches** to:  
- Reduce memory usage  
- Stabilize training with gradient estimates  
- Increase convergence speed  

PyTorch provides **`torch.utils.data.DataLoader`** to handle batching and shuffling.



### Key Concepts

1. **Dataset**
- A PyTorch `Dataset` stores samples and their corresponding labels.  
- Must implement `__len__()` and `__getitem__()`.

2. **DataLoader**
- Wraps a `Dataset` to create an iterable over **mini-batches**.  
- Key parameters:
  - `batch_size`: number of samples per batch  
  - `shuffle`: whether to shuffle data each epoch  



**Training Loop with Mini-Batches**
- Iterate over the DataLoader instead of the full dataset.
- Forward pass, compute loss, backward pass, and optimizer step for each batch.


In [4]:
import torch
from torch.utils.data import Dataset, DataLoader

# Sample dataset
class MyDataset(Dataset):
    def __init__(self):
        self.x = torch.randn(10, 2)
        self.y = torch.randint(0, 2, (10, 1)).float()
    
    def __len__(self):
        return len(self.x)
    
    def __getitem__(self, idx):
        return self.x[idx], self.y[idx]

dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=3, shuffle=True)

# Example: iterate through mini-batches
for batch_idx, (x_batch, y_batch) in enumerate(dataloader):
    print(f"Batch {batch_idx+1}")
    print("x:", x_batch)
    print("y:", y_batch)
    print("---")


Batch 1
x: tensor([[-0.7705, -1.1865],
        [-1.6320, -0.7430],
        [-0.4538, -1.1619]])
y: tensor([[1.],
        [1.],
        [0.]])
---
Batch 2
x: tensor([[-0.2831, -0.3035],
        [-0.4125, -0.2890],
        [-0.1959, -0.8946]])
y: tensor([[1.],
        [0.],
        [0.]])
---
Batch 3
x: tensor([[-1.3389, -0.3129],
        [ 0.7052,  1.3549],
        [ 0.6676, -0.0868]])
y: tensor([[1.],
        [0.],
        [0.]])
---
Batch 4
x: tensor([[-0.9322, -1.4997]])
y: tensor([[1.]])
---


<a name='5'></a>
## 5 - Monitoring Loss and Accuracy

Monitoring **loss** and **accuracy** during training is essential to:  
- Check if the model is learning  
- Detect overfitting or underfitting  
- Tune hyperparameters  



### Key Points

1. **Loss Tracking**  
- Compute loss for each batch or epoch.  
- Plot or print loss to observe convergence.

2. **Accuracy Tracking (for classification)**  
- Compare predicted labels with true labels:  

$$
\text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total predictions}}
$$

- Can be computed per batch or per epoch.  



**Tips**
- Accumulate loss over all batches to compute **epoch loss**.  
- For accuracy, round or take `argmax` of model outputs for class predictions.


In [5]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset

# Sample data: 10 samples, 2 features, 2 classes
x = torch.randn(10, 2)
y = torch.randint(0, 2, (10,))

dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=3, shuffle=True)

# Simple model
model = nn.Linear(2, 2)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop with loss and accuracy tracking
for epoch in range(3):  # small number of epochs for demo
    epoch_loss = 0
    correct = 0
    total = 0
    
    for x_batch, y_batch in dataloader:
        # Forward
        outputs = model(x_batch)
        loss = loss_fn(outputs, y_batch)
        
        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # Track loss
        epoch_loss += loss.item() * x_batch.size(0)
        
        # Track accuracy
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == y_batch).sum().item()
        total += y_batch.size(0)
    
    epoch_loss /= total
    accuracy = correct / total
    print(f"Epoch {epoch+1}: Loss={epoch_loss:.4f}, Accuracy={accuracy:.2f}")


Epoch 1: Loss=0.6485, Accuracy=0.60
Epoch 2: Loss=0.6474, Accuracy=0.60
Epoch 3: Loss=0.6429, Accuracy=0.70


<a name='6'></a>
## 6 - Practice Exercises

Try the following exercises to reinforce your understanding of **training loops, optimizers, and mini-batches**.



### **Exercise 1: Forward and Backward Pass**
- Create a simple linear model with input size 3 â†’ output size 1.  
- Generate random input data (`batch_size=4`) and targets.  
- Perform a forward pass, compute **MSE loss**, and run a backward pass.



### **Exercise 2: Optimizer Step**
- Use **SGD** with learning rate 0.01.  
- Update the model parameters using `optimizer.step()`.  
- Print the **loss before and after the optimizer step**.


### **Exercise 3: Mini-Batch Training**
- Create a dataset of 12 samples, input size 2, output size 1.  
- Use `DataLoader` with `batch_size=4`.  
- Train the model for **2 epochs**, computing and printing loss for each batch.



### **Exercise 4: Accuracy Tracking (Optional Challenge)**
- Modify the dataset to have **2 classes**.  
- Use **CrossEntropyLoss** and a linear model with 2 output neurons.  
- Compute and print **accuracy per epoch** over 2 epochs.


In [6]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, TensorDataset
import torch.optim as optim

# ----------------------------
# Exercise 1: Forward and Backward Pass
# ----------------------------
model = nn.Linear(3, 1)
x = torch.randn(4, 3)
y = torch.randn(4, 1)
loss_fn = nn.MSELoss()

# Forward
outputs = model(x)
loss = loss_fn(outputs, y)
print("Exercise 1 - Loss:", loss.item())

# Backward
optimizer = optim.SGD(model.parameters(), lr=0.01)
optimizer.zero_grad()
loss.backward()
optimizer.step()

# ----------------------------
# Exercise 2: Optimizer Step
# ----------------------------
outputs_after = model(x)
loss_after = loss_fn(outputs_after, y)
print("Exercise 2 - Loss after optimizer step:", loss_after.item())

# ----------------------------
# Exercise 3: Mini-Batch Training
# ----------------------------
x_data = torch.randn(12, 2)
y_data = torch.randn(12, 1)
dataset = TensorDataset(x_data, y_data)
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)

model = nn.Linear(2, 1)
loss_fn = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

for epoch in range(2):
    for batch_idx, (x_batch, y_batch) in enumerate(dataloader):
        outputs = model(x_batch)
        loss = loss_fn(outputs, y_batch)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        print(f"Epoch {epoch+1}, Batch {batch_idx+1}, Loss: {loss.item():.4f}")

# ----------------------------
# Exercise 4: Accuracy Tracking (Optional)
# ----------------------------
# Dataset with 2 classes
x_cls = torch.randn(8, 2)
y_cls = torch.randint(0, 2, (8,))
dataset_cls = TensorDataset(x_cls, y_cls)
dataloader_cls = DataLoader(dataset_cls, batch_size=4, shuffle=True)

model_cls = nn.Linear(2, 2)  # 2 output neurons
loss_fn_cls = nn.CrossEntropyLoss()
optimizer_cls = optim.SGD(model_cls.parameters(), lr=0.01)

for epoch in range(2):
    correct = 0
    total = 0
    for x_batch, y_batch in dataloader_cls:
        outputs = model_cls(x_batch)
        loss = loss_fn_cls(outputs, y_batch)
        
        optimizer_cls.zero_grad()
        loss.backward()
        optimizer_cls.step()
        
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == y_batch).sum().item()
        total += y_batch.size(0)
    
    accuracy = correct / total
    print(f"Epoch {epoch+1}, Accuracy: {accuracy:.2f}")


Exercise 1 - Loss: 3.2175486087799072
Exercise 2 - Loss after optimizer step: 3.009896755218506
Epoch 1, Batch 1, Loss: 0.8909
Epoch 1, Batch 2, Loss: 1.7080
Epoch 1, Batch 3, Loss: 0.6876
Epoch 2, Batch 1, Loss: 1.4694
Epoch 2, Batch 2, Loss: 1.0620
Epoch 2, Batch 3, Loss: 0.5712
Epoch 1, Accuracy: 0.62
Epoch 2, Accuracy: 0.62
