In [1]:
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torchvision
import torch.nn.functional as F
import torchvision.transforms as transforms

# Activities

## 1. How many parameters does the "vanilla" feedforward network have? And how many parameters does the CNN have?

- Feedforward
    - 28 x 28, (200, 200), 10 -2 hidden layers and 10 nodes in the output layer
    - 28 x 28 x 200 + 200 x 200 + 200 x 10 = 198800
    - bias  = 200+200+10
    - 199210

- Convolutional
     - 

## 2. Why does the CNN outperform the "vanilla" feedforward network, even though it has a smaller number of parameters?



## 3. You will now test how both the feed forward network and the CNN handle images that are not centered. To that purpose, create a new data loader using the code snippet:
```
nasty_transformator = transforms.Compose(
      [transforms.ToTensor(),
       transforms.RandomAffine(0, translate=[0.1, 0]),
       transforms.Normalize((0.5,), (0.5,))])
nasty_set = torchvision.datasets.FashionMNIST(root="./", download=True, 
                                              train=False,
                                              transform=nasty_transformator)
nasty_loader = torch.utils.data.DataLoader(nasty_set, batch_size=10000)
```
and evaluate both networks in that set. What can you observe? Can you explain?

# Convolutional Neural Networks

## Preparation

We will start by importing the libraries that we will use throughout. Most of these you encountered in the previous lab, already.

We will again use the FashionMNIST dataset, which you encountered in the previous lab:

![<Image not loaded>](https://miro.medium.com/max/1400/1*RCXpLibVCgoRYckEd2kU8Q.png)

Recall that FashionMNIST consists of a dataset of 60,000 training images and 10,000 test images. Each example is a 28&times;28 grayscale image, associated with a label from 10 classes, corresponding to a piece of clothing:

| Label   |  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  | 
| ------- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | 
| Object  | T-shirt | Trousers | Pull-over | Dress | Coat | Sandal | Shirt | Sneaker | Bag | Boot |

We start by loading the dataset and then creating the associated dataloaders, which will be used to sample batches of images during training. However, unlike in the previous lab, we will now be careful to normalize the data, since this helps in training.

In [2]:
# The images, once loaded, are transformed in two ways: 
#  - They are turned to tensors
#  - They are normalized

# We construct a "transformator" to perform those two things.
transformator = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# We can now import the training and test data
train_set = torchvision.datasets.FashionMNIST(root="./", download=True, 
                                              train=True,
                                              transform=transformator)

valid_set = torchvision.datasets.FashionMNIST(root="./", download=True, 
                                              train=False,
                                              transform=transformator)

# We now create our dataloaders
train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)
valid_loader = torch.utils.data.DataLoader(valid_set, batch_size=10000)

In [3]:
# new train set with images off center
nasty_transformator = transforms.Compose([
    transforms.ToTensor(),
    transforms.RandomAffine(0, translate=[0.1, 0]),
    transforms.Normalize((0.5,), (0.5,))
])

nasty_set = torchvision.datasets.FashionMNIST(root="./", download=True, 
                                              train=False,
                                              transform=nasty_transformator)
nasty_loader = torch.utils.data.DataLoader(nasty_set, batch_size=10000)

In [4]:
def evaluate(net, Xbatch, ybatch, loss):
  
    # Compute batch size
    batch_size = Xbatch.size(0)

    # We first set the network to "evaluation mode". This is useful, for 
    # example, in dropout layers, which should behave differently in training 
    # and evaluation.
    net.eval()

    # We compute both scores and labels
    outputs = net(Xbatch)
    _, labels = torch.max(outputs, dim=1)

    # Compute loss
    l = loss(outputs, ybatch).item()

    # Compute accuracy
    acc = torch.sum(labels == ybatch.data).double().item() / batch_size
    
    # We reset the network back to training mode
    net.train()
    
    return l, acc

We also define a general function to be used when training a neural network. We call the function `train_network`. The function shall receive, as arguments:

* The neural network
* The loss function
* The optimizer
* The training data loader

We'll also include an optional parameter to specify the number of epochs (we'll use 20 as the default value).

In [5]:
def train_network(net, loss, optimizer, trainloader, testloader, num_epochs=20):
    
    # We start by initializing two lists, to track the loss and accuracy during
    # training.
    train_losses = []
    train_accuracies = []

    valid_losses = []
    valid_accuracies = []

    for ep in range(num_epochs):
        print('\n- Training epoch: %i -' % ep)

        # We use auxiliary variables to keep track of loss and accuracy within 
        # an epoch
        running_loss = 0.
        running_acc  = 0.
        dataset_size = 0

        for Xbatch, ybatch in trainloader:
            batch_size = Xbatch.size(0)
            dataset_size += batch_size
            
            # We zero-out the gradient
            optimizer.zero_grad()

            # Compute output
            outputs = net(Xbatch)

            # Our outputs are *scores*, so we also compute the predicted labels, 
            # since we need them to check the accuracy
            #
            # To that purpose, we compute the class that maximizes the score. 
            # The max function returns both the maximum value, and the 
            # maximizing entry. We care only about the latter, so we ignore the 
            # first output. 
            #
            # Also, recall that the dimensions of the output are 
            # (batch size, n. classes). We take the maximum over the first 
            # dimension
            _, labels = torch.max(outputs, dim=1)

            # Get loss
            l = loss(outputs, ybatch)

            # Compute gradient
            l.backward()
            
            # Perform optimization step
            optimizer.step()

            # Update total running loss. We account for the number of points 
            # in the batch
            running_loss += l.item() * batch_size
             
            # Update the accuracy
            running_acc += torch.sum(labels == ybatch.data).double().item()

        train_losses += [running_loss / dataset_size]
        train_accuracies += [running_acc / dataset_size]

        # Loss and accuracy in the validation set
        val_x, val_y = next(iter(testloader))
        with torch.no_grad():
            aux_l, aux_a = evaluate(net, val_x, val_y, loss)

        valid_losses += [aux_l]
        valid_accuracies += [aux_a]

        print(f'Training loss: {train_losses[-1]:.4f}')
        print(f'Training accuracy: {train_accuracies[-1]:.1%}')
        print(f'Validation loss: {valid_losses[-1]:.4f}')
        print(f'Validation accuracy: {valid_accuracies[-1]:.1%}')

    return net, train_losses, train_accuracies, valid_losses, valid_accuracies

## Feed-forward neural network

In [6]:
class FeedForwardNetwork(nn.Module):
    def __init__(self, input_size, layers, output_size):
        ''' The layers parameter is a list containing the number of units in 
            each layer of the network. The number of elements in the list 
            corresponds to the number of layers in the network.'''
        super().__init__()
        
        layer_list = [nn.Flatten()]
        for i in range(len(layers)):
            if i == 0:
                input_dim = input_size
            else:
                input_dim = layers[i-1]

            output_dim = layers[i]
            layer_list += [nn.Linear(input_dim, output_dim), nn.ReLU()]

        # We add one final output layer
        layer_list += [nn.Linear(layers[-1], output_size)]

        # We can now create our module list
        self.layers = nn.ModuleList(layer_list)

    def forward(self, x):
        h = x
        for layer in self.layers:
            h = layer(h)

        return h

### Initial training set

We now train our neural network on the FashionMNIST dataset. The process is, in all aspects, similar to the one from last week's lab, and is abstracted into the function `train_network` defined above. We have only to define the loss and optimizer.

In [7]:
# We create a network with two hidden layers of 200 units each
two_layer_net_1 = FeedForwardNetwork(28 * 28, [200, 200], 10)

# We use the cross-entropy loss
loss = nn.CrossEntropyLoss()

sgd = torch.optim.Adam(two_layer_net_1.parameters(), lr=0.01)

two_layer_net_1, sgd_lss_trn, sgd_acc_trn, sgd_lss_vld, sgd_acc_vld = train_network(two_layer_net_1, loss, sgd, train_loader, valid_loader, 5)


- Training epoch: 0 -
Training loss: 0.5583
Training accuracy: 80.1%
Validation loss: 0.5113
Validation accuracy: 82.2%

- Training epoch: 1 -
Training loss: 0.4594
Training accuracy: 83.8%
Validation loss: 0.5086
Validation accuracy: 82.9%

- Training epoch: 2 -
Training loss: 0.4321
Training accuracy: 84.5%
Validation loss: 0.4701
Validation accuracy: 82.9%

- Training epoch: 3 -
Training loss: 0.4151
Training accuracy: 85.3%
Validation loss: 0.4539
Validation accuracy: 84.1%

- Training epoch: 4 -
Training loss: 0.4072
Training accuracy: 85.4%
Validation loss: 0.5834
Validation accuracy: 80.7%


We can plot the accuracy and loss during training, and see how they evolve.

### Nasty training set

In [8]:
val_x, val_y = next(iter(nasty_loader))
ux_l, aux_a = evaluate(two_layer_net_1, val_x, val_y, loss)
print(f'Validation loss: {ux_l:.4f}')
print(f'Validation accuracy: {aux_a:.1%}')

Validation loss: 1.4181
Validation accuracy: 60.8%


## Convolutional neural network

We now create a convolutional neural network to process precisely the same dataset. To that purpose, we create a class `ConvolutionalNetwork` containing two convolutional layers and a single fully connected layer.

In [9]:
class ConvolutionalNetwork(nn.Module):
    def __init__(self, ):

        # As before, we call the constructor of the parent class
        super().__init__()
        
        # Since the architecture is fixed, we can easily use the Sequential 
        # module to define our architecture.
        self.layers = nn.Sequential(
            nn.Conv2d(1, 10, kernel_size=5),    # 1 channel in, 10 out
            nn.MaxPool2d(2),
            nn.ReLU(),
            nn.Conv2d(10, 20, kernel_size=5),   # 10 channels in, 20 out
            nn.MaxPool2d(2),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(320, 50),
            nn.ReLU(),
            nn.Linear(50, 10))
    
    def forward(self, x):
        return self.layers(x)

### Initial training set

In [10]:
conv_net = ConvolutionalNetwork()

adam_conv = torch.optim.Adam(conv_net.parameters(), lr=0.001)

conv_net, conv_lss_trn, conv_acc_trn, conv_lss_vld, conv_acc_vld = train_network(conv_net, loss, adam_conv, train_loader, valid_loader, 5)


- Training epoch: 0 -
Training loss: 0.5808
Training accuracy: 79.2%
Validation loss: 0.4387
Validation accuracy: 84.4%

- Training epoch: 1 -
Training loss: 0.3775
Training accuracy: 86.3%
Validation loss: 0.3761
Validation accuracy: 86.1%

- Training epoch: 2 -
Training loss: 0.3260
Training accuracy: 88.2%
Validation loss: 0.3442
Validation accuracy: 87.6%

- Training epoch: 3 -
Training loss: 0.3009
Training accuracy: 89.2%
Validation loss: 0.3218
Validation accuracy: 88.4%

- Training epoch: 4 -
Training loss: 0.2810
Training accuracy: 89.8%
Validation loss: 0.3057
Validation accuracy: 89.2%


### Nasty training set

In [11]:
conv_ux_l, conv_aux_a = evaluate(conv_net, val_x, val_y, loss)
print(f'Validation loss: {conv_ux_l:.4f}')
print(f'Validation accuracy: {conv_aux_a:.1%}')

Validation loss: 0.5047
Validation accuracy: 82.4%
