# 1.1 Neural network basics

In this notebook, we construct a basic neural network in PyTorch. We attempt to only provide a bare-bone example for clarity. For another such example, please see [PyTorch's homepage](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)

<!--
- [Building the network](#Building-the-network)
- [Training the network](#Training-the-network)
- [Testing the network](#Testing-the-network)
- [Exercises](#Exercises)
-->

First, we import the libraries and load the data. You will explore this further in notebook 1.2.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Load the MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

# We load a training set
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)

# And withhold some data for testing
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform, download=True)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=32, shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



## Building the network

Below, we construct a fully connected neural network. Here, we will limit ourselves to a single hidden layer, i.e., it's a shallow rather than deep neural network. To construct a neural network in PyTorch, we define a class that inherits from the nn.Module class. 

The class contains two functions:

1) init: Defines all the objects we will need.

2) forward: puts all of these objects together, defining the network architecture. Basically, x starts out as your input tensor and is transformed step by step as it passes through each layer of the network. In this notebook, we consider MNIST images as our input tensors. Each image contains 28x28 pixels. The fully connected layers of the neural network, however, can only deal with one-dimensional feature vectors. So, we flatten the tensor, x, which is then passed on to the first fully connected layer, fc1. After the first fully connected layer, a Rectified Linear Unit (ReLU) activation function is applied element-wise to the tensor. The tensor is then passed through the second fully connected layer, fc2. After passing through the second fully connected layer, the transformed x is returned. The final tensor (x) represents the output of the neural network, which can be used for tasks like classification (you can rename x if you don't want to call the input and the output the same or if you want to distinguish between feature extraction and the final classification).

In [2]:
# Define a simple neural network
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.flatten(start_dim=1)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

You can now create an instance of the class.

In [3]:
model = SimpleNet()

## Training the network

We now want to train the weights of the neural network. For this purpose, we first need to define what a good fit to data entails.

In [4]:
criterion = nn.CrossEntropyLoss()

Secondly, we need to decide how we want to optimise the network.

In [5]:
optimizer = optim.Adam(model.parameters(), lr=0.01)

Now, we are ready to train the neural network. Let's train for 5 epochs. Note that fixing the number of epochs in this manner is not a good choice and will lead to under- or overfitting. But in this notebook, we merely want to understand the basic concepts behind training a neural network.

In each epoch, for each batch from the training set, we

1) clear the accumulated gradients of the model parameters (accumulated during backpropagation),

2) perform a forward run (i.e. we predict labels for the batch of images),

3) compute the loss for these predictions,

3) perform backpropagation,

5) and use this to optimise the weights and biases of our model.

In [6]:
num_epochs = 5

# Training loop
model.train()  # Set the model to training mode
for epoch in range(num_epochs):
    running_loss = 0.0
    
    for images, labels in train_loader:
        optimizer.zero_grad() # clear gradients
        outputs = model(images) # forward run
        loss = criterion(outputs, labels) # compare to ground truth
        loss.backward() # back propagation
        optimizer.step() # update weights and biases
        
        running_loss += loss.item()
    
    # Print average loss for the epoch
    average_loss = running_loss / len(train_loader)
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {average_loss}')

Epoch [1/5], Loss: 0.43157690029640994
Epoch [2/5], Loss: 0.3554805261954665
Epoch [3/5], Loss: 0.33244464016780256
Epoch [4/5], Loss: 0.31666396109859146
Epoch [5/5], Loss: 0.3237694415308535


## Testing the network

Having trained the network, we can now test it on unseen data (the test set).

In [7]:
model.eval()  # Set the model to evaluation mode
correct = 0
total = 0

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        

accuracy = correct / total
print(f'Accuracy on the test set: {100 * accuracy}%')

Accuracy on the test set: 90.18%


That's a pretty high accuracy, reflecting that the MNIST dataset is relatively simple and well-behaved. Indeed, you can do much better. CNNs can reach an accuracy of more than 99 per cent (see [Kaggle](https://www.kaggle.com/code/cdeotte/how-to-choose-cnn-architecture-mnist)). To achieve such a high accuracy, we need to improve the architecture. We will do so in notebook 2.1.

## Exercises

**Exercise 1**: Here, we use the adam optimiser. Find out what other optimisers are available.

**Exercise 2**: What do the hyperparameters (e.g. lr in adam) mean? Change lr to a higher value (e.g. 1) and rerun the notebook. Or to a lower value. What happens? Why?

**Exercise 3**: Here, we use the cross entropy as the loss function. What is a loss function? And what is the cross entropy?

**Exercise 4**: Integrate data augmentation techniques into the data preprocessing pipeline. This helps improve the model's generalisation.

**Exercise 5**: Change the architecture of the neural network. Add more layers, increase the number of neurons, or try a different activation function. 