# Artificial Neural Networks with PyTorch

PyTorch is a library that allows us to build quickly neural networks. Many steps are already implemented, including the partial derivates, gradients and backpropagation function (which takes advantage of autodifferentiation methods, `autograd`).

In [None]:
import torch
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

Neural networks can be constructed using the `torch.nn` package. An `nn.Module` contains layers and a method `forward(input)` that computes and returns the `output`.


## Define the network

Let's define a network similar to the one in the following image for the problem of **classification of digit images**:

![convnet](https://pytorch.org/tutorials/_static/img/mnist.png)

It is a CNN network that takes an image as input, feeds it through convolutional layers and then a few fully connected layers.


In [None]:
import torch.nn as nn
import torch.nn.functional as F
        
class ANN(nn.Module):

    def __init__(self):
        super(ANN, self).__init__()
        # two convolutional layers
        self.conv1 = nn.Conv2d(1, 6, 5) # 1 input image channel, 6 output channels, 5x5 conv. kernel
        self.conv2 = nn.Conv2d(6, 16, 5) # 6 input channels, 16 output channels, 5x5 conv. kernel
        # two fully connected layers
        self.fc1 = nn.Linear(16 * 4 * 4, 100)  # input of 4*4 from image dimension and 16 channels, 100 neurons 
        self.fc2 = nn.Linear(100, 10) # output layer with 10 neurons (1 neuron per digit)

    '''
    Each input image has dimension 28x28x1
    '''
    def forward(self, input):
        # Convolution layer C1: 1 input image channel, 6 output channels,
        # 5x5 square convolution, it uses RELU activation function, and
        # outputs a Tensor with size (N, 6, 24, 24), where N is the batch size
        c1 = F.relu(self.conv1(input))
        # Subsampling layer S2: 2x2 grid, purely functional (it has no parameters), and 
        # outputs a (N, 6, 12, 12) Tensor
        s2 = F.max_pool2d(c1, (2, 2))
        # Convolution layer C3: 6 input channels, 16 output channels,
        # 5x5 square convolution, it uses RELU activation function, and
        # outputs a (N, 16, 8, 8) Tensor
        c3 = F.relu(self.conv2(s2))
        # Subsampling layer S4: 2x2 grid, purely functional (it has no parameters), and 
        # outputs a (N, 16, 4, 4) Tensor
        s4 = F.max_pool2d(c3, 2)
        # Flatten operation: purely functional, outputs a (N, 256) Tensor
        s4 = torch.flatten(s4, 1)
        # Fully connected layer F5: (N, 256) Tensor input, it uses RELU activation function, and
        # outputs a (N, 100) Tensor
        f5 = F.relu(self.fc1(s4))
        # OUTPUT layer (softmax): (N, 100) Tensor input, and
        # outputs a (N, 10) Tensor
        output = F.softmax(self.fc2(f5), dim=1)
        return output

my_ann = ANN()
print(my_ann)

**Note that** we just define the `forward` propagation function. It uses *Tensors*, which are a sort of generalization of matrices, and *Tensor operations*.


The `backward` propagation function and the gradients are automatically defined using `autograd`. 


Let's have a look at the learnable parameters of this model:

In [None]:
params = list(my_ann.parameters())
print("Length of the list of parameters:",len(params))
print("Dimensions of the parameters of the first layer:",params[0].size())  # conv1's .weight

Let's try to run this model (with a random input):

In [None]:
input = torch.randn(1, 1, 28, 28)
output = my_ann(input)
print(output)

y_real = torch.rand(1,10)  # create a random "real output" too

# Measure loss
lossFunc = nn.MSELoss()
loss = lossFunc(output, y_real)

print(loss)

We can follow the series of calls of `loss` in the backward direction using its `.grad_fn` attribute:

``` {.sourceCode .sh}
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> flatten -> linear -> relu -> linear -> relu -> linear -> softmax
      -> MSELoss
      -> loss
```

In [None]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # softmax
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # linear

## Backpropagation

To backpropagate the error all we have to do is to `loss.backward()`. When we make this call, the whole network is differentiated w.r.t. the model's parameters.

You need to clear the existing gradients or gradients will be accumulated to previous ones.


In [None]:
my_ann.zero_grad()     # zeroes the gradient buffers of all parameters

print("Let's see the bias term of the first layer before backpropagation:")
print(my_ann.conv1.bias.grad)

loss.backward()

print('The same parameter, after backpropagation, is now:')
print(my_ann.conv1.bias.grad)

Now that we have seen how to use loss functions and backpropagation, we can run the example:

In [None]:
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split

# Set pytorch's seed
torch.manual_seed(17)

# Learning method's hyperparameters
batch_size = 64
learning_rate = 0.01
epochs = 20

# Apply these transformation to the data
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))  # Normalize to [-1, 1]
])

# Downloading and opening data files
dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)

# Split training data into training and validation sets
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

# Use data loaders to handle the batches
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Define the model, the loss and the optimization method
model = ANN()
lossFunc = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

l_train_loss = []
l_val_loss = []
l_train_acc = []
l_val_acc = []

# Training loop
for epoch in range(epochs):
    # Train step
    model.train()
    train_loss = 0.0
    correct_pred = 0.0
    for batch_samples, batch_real_labels in train_loader:
        optimizer.zero_grad()
        pred_probs = model(batch_samples)
        loss = lossFunc(pred_probs, batch_real_labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
        
        _, pred_labels = torch.max(pred_probs, 1)
        correct_pred += (pred_labels == batch_real_labels).sum().item()

    train_loss = train_loss / train_size
    l_train_loss.append(train_loss)
    train_acc = 100*correct_pred / train_size
    l_train_acc.append(train_acc)

    # Validation step
    model.eval()
    val_loss = 0.0
    correct_pred = 0.0
    with torch.no_grad():
        for batch_samples, batch_real_labels in val_loader:
            pred_probs = model(batch_samples)
            loss = lossFunc(pred_probs, batch_real_labels)
            val_loss += loss.item()

            _, pred_labels = torch.max(pred_probs, 1)
            correct_pred += (pred_labels == batch_real_labels).sum().item()

    val_loss = val_loss / val_size
    val_acc = 100*correct_pred / val_size
    l_val_loss.append(val_loss)
    l_val_acc.append(val_acc)

    print(f'Epoch [{epoch+1}/{epochs}], Train Loss: {train_loss:.4f}, Validation Loss: {val_loss:.4f}, Train Acc.: {train_acc:.4f}, Validation Acc.: {val_acc:.4f}')


In [None]:
from sklearn.metrics import confusion_matrix,ConfusionMatrixDisplay

# Configuration: do you want to see examples of failed instances?
show_failed_examples = False

# Evaluation on a separated test set
model.eval()
correct_pred = 0.0
cf_matrix = np.zeros((10,10))
with torch.no_grad():
    for batch_samples, batch_real_labels in test_loader:
        pred_probs = model(batch_samples)
        _, pred_labels = torch.max(pred_probs, 1)
        correct_pred += (pred_labels == batch_real_labels).sum().item()
        cf_matrix += confusion_matrix(batch_real_labels, pred_labels, labels=np.arange(10))


        failed_idxs = pred_labels != batch_real_labels
        failed = batch_samples[failed_idxs]
        if len(failed)>0 and show_failed_examples:
            plt.imshow(failed[0][0],aspect='auto')
            plt.axis('equal')
            plt.axis(False)
            plt.text(-3,-2.5, f'Real: {batch_real_labels[failed_idxs][0]} - Predicted: {pred_labels[failed_idxs][0]}')
            plt.show()


accuracy = 100 * correct_pred / len(test_dataset)
print(f'Accuracy on the test set: {accuracy:.2f}%')

cf_matrix/=np.sum(cf_matrix,axis=1)
disp = ConfusionMatrixDisplay(confusion_matrix=cf_matrix)
fig, ax = plt.subplots(figsize=(9,9))
disp.plot(ax=ax)

# Plot train and validation loss and accuracy
fig, axs = plt.subplots(1, 2, figsize=(15,5), facecolor='w', edgecolor='k')
axs[0].plot(range(1, epochs + 1), l_train_loss, 'tab:cyan', label="Train loss")
axs[0].plot(range(1, epochs + 1), l_val_loss, 'tab:brown', label="Validation loss")
axs[0].set_xlabel("Epochs")
axs[0].set_ylabel("Loss")
axs[0].legend()

axs[1].plot(range(1, epochs + 1), l_train_acc, 'tab:cyan', label="Train accuracy")
axs[1].plot(range(1, epochs + 1), l_val_acc, 'tab:brown', label="Validation accuracy")
axs[1].set_xlabel("Epochs")
axs[1].set_ylabel("Accuracy")
axs[1].legend()
plt.show()

# Questions
- How does the learning rate affect the learning process?
- Is cross entropy the most appropriate loss function?
- Can the architecture of the ANN be improved?