# PyTorch Training Loops

A PyTorch training loop is an essential part of building a neural network model, which helps us teach the computer how to make predictions or decisions based on data. By using this loop, we gradually improve our model's accuracy through a process of learning from its mistakes and adjusting.

[PyTorch Quickstart](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html)

## Technical Terms Explained:
**Training Loop:** The cycle that a neural network goes through many times to learn from the data by making predictions, checking errors, and improving itself.

**Batches:** Batches are small, evenly divided parts of data that the AI looks at and learns from each step of the way.

**Epochs:** A complete pass through the entire training dataset. The more epochs, the more the computer goes over the material to learn.

**Loss functions:** They measure how well a model is performing by calculating the difference between the model's predictions and the actual results.

**Optimizer:** Part of the neural network's brain that makes decisions on how to change the network to get better at its job.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

Below, we will make a model that "predicts" continuous values that are just the addition of two other values. **This is a bad use of a statistical model, but it can be an example for understanding the architecture and applying the tool.** At the end of our training loop, the model will not *know* how to add something like $3 + 7$.

Create a Number Sum Dataset

In [2]:
class NumberSumDataset(Dataset):
    def __init__(self, data_range:tuple=(1,10)):
        """
        Create a list of number from data_range[0] to data_range[1]
        """
        self.numbers = list(range(data_range[0], data_range[1]))
        return


    def __getitem__(self, index):
        """
        Return a row of data with num0, num1, and the sum of them where num0 and num1 take values based on the index argument
        """
        num0 = float(self.numbers[index // len(self.numbers)])
        num1 = float(self.numbers[index % len(self.numbers)])
        return torch.tensor([num0, num1]), torch.tensor([num0 + num1])


    def __len__(self):
        """
        Return the square of the number of elements
        """
        return len(self.numbers) ** 2

Inspect the dataset

In [3]:
dataset = NumberSumDataset(data_range=(1,100))

print(dataset.__len__())
for i in range(5):
    # print(next(iter(dataset)))
    print(dataset[i])

9801
(tensor([1., 1.]), tensor([2.]))
(tensor([1., 2.]), tensor([3.]))
(tensor([1., 3.]), tensor([4.]))
(tensor([1., 4.]), tensor([5.]))
(tensor([1., 5.]), tensor([6.]))


Define a simple "model"

In [4]:
class MLP(nn.Module):
    def __init__(self, input_size):
        super(MLP, self).__init__()
        self.hidden = nn.Linear(input_size, 128)
        self.output = nn.Linear(128, 1)
        self.activation = nn.ReLU()
        # self.softmax = nn.Softmax(dim=0)
        return

    def forward(self, x):
        x = self.hidden(x)
        x = self.activation(x)
        x = self.output(x)
        # x = self.softmax(x)
        return x

Instantiate components needed for training

In [5]:
dataset = NumberSumDataset(data_range=(0,100))
dataloader = DataLoader(dataset=dataset, batch_size=100, shuffle=True)
model = MLP(input_size=2)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Create a training loop

In [6]:
for epoch in range(20):
    total_loss = 0.0
    # Iterate over the batches
    for X, Y in dataloader:
        # Computer model output
        preds = model(X)
        # Compute the loss
        loss = loss_function(preds, Y)
        # Backpropogation
        loss.backward()
        # Update the params
        optimizer.step()
        # Zero out the gradients
        optimizer.zero_grad()
        # Add the loss of this batch
        total_loss += loss.item()
    print(f"Epoch {epoch}: Sum of Batch Loss: {total_loss:.5f}")

Epoch 0: Sum of Batch Loss: 291285.17354
Epoch 1: Sum of Batch Loss: 7855.38131
Epoch 2: Sum of Batch Loss: 1005.95261
Epoch 3: Sum of Batch Loss: 43.83301
Epoch 4: Sum of Batch Loss: 19.75632
Epoch 5: Sum of Batch Loss: 11.46027
Epoch 6: Sum of Batch Loss: 7.60782
Epoch 7: Sum of Batch Loss: 5.72906
Epoch 8: Sum of Batch Loss: 3.56152
Epoch 9: Sum of Batch Loss: 2.05779
Epoch 10: Sum of Batch Loss: 1.88450
Epoch 11: Sum of Batch Loss: 1.81295
Epoch 12: Sum of Batch Loss: 1.75916
Epoch 13: Sum of Batch Loss: 1.68918
Epoch 14: Sum of Batch Loss: 1.62608
Epoch 15: Sum of Batch Loss: 1.56905
Epoch 16: Sum of Batch Loss: 1.50384
Epoch 17: Sum of Batch Loss: 1.44078
Epoch 18: Sum of Batch Loss: 1.38032
Epoch 19: Sum of Batch Loss: 1.33054


Test the model

In [7]:
X0, X1 = 3.0, 7.0
print(model(torch.tensor([X0, X1])).item())
7+3 == model(torch.tensor([X0, X1])).item()

10.093177795410156


False