# Deep Learning and Neural Networks with PyTorch

<p align="center">
  <img src="notebook_images/ImageNet.jpg", height="500", width="500" />
</p>

Above, you see some images from ImageNet. ImageNet is a large dataset of over 14 million labeled images in 20000 different categories. Each year, the ImageNet Large Scale Visual Recognition Challenge is held where participants try to classify the images in ImageNet to the categories they belong to. In 2011 the number 1 entry achieved a top 5 error rate (number of times the correct classification is in the top 5 guesses) of 25%. The following year, in 2012, the number 1 entry called AlexNet scored **16%**, leaving that year's number 2 entry far behind.

<p align="center">
  <img src="notebook_images/ImageNetResults.png", height="500", width="409" />
</p>

How did the score in this competition improve so massively in only one year? Given the title of this workshop it should come as no surprise that AlexNet was a neural network, more specifically a deep learning model. Nowadays, deep learning models are widely used in all sorts of areas, and they are especially successful in most computer vision and natural language processing tasks.

## What is a neural network?

Let's start with the basics. Neural networks are a specific type of machine learning model loosely based on the idea of the neurons in our brain (hence the name). A neural network consists of layers of nodes. You could think of each node as a cell that stores a number. This number is also called the node's activation.

Layers of nodes are connected. Specifically, in a feedforward neural network, each node has incoming connections from the previous layer and outgoing connections to the next layer. Connections between nodes have values, known as weights. These weights are multiplied with the activation in a node in one layer to determine the activation in a receiving node in the next layer. The input a node gets from a node in a preceding layer can be as simple as:

<p align="center">
<b> weight $\times$ activation of previous node </b>
</p>

By connecting layers of nodes into a final layer of a desired output format, we can build machine learning models capable of many tasks. All we have to do is figure out what the weights should be so that an input pattern results in our desired output.

<p align="center">
  <img src="notebook_images/NeuralNetwork.gif" />
</p>

To take a less abstract look at the concept, let's consider the example above. Here's a graphical representation of how a neural network works. In the example, an image of the number seven is converted into a vector of activation values. Each element of the vector represents the light intensity in a single pixel of that image. This vector forms the input for the first layer of nodes in our network. The input is then passed through the layers of nodes, where we can see the connections between the nodes lighting up as yellow lines. By having trained the network, the weights of the connections have gotten the right values so that the nodes in the input layer eventually excite the right node in the output layer. Note that the output layer consists of 10 nodes, one node for each possible digit.

## Deep learning
Now that we understand what a neural network is, let's talk about deep learning. Deep learning really is no more than using neural networks with many layers. Historically, deep learning models were very computationally intensive to train. In recent years, the use of graphical processing units (GPUs) has massively accelerated the efficiency with which these models can be trained. GPUs are processors normally used for video processing within a computer, but their capability to rapidly perform many calculations in parallel can also be leveraged to train deep learning models.

<p align="center">
  <img src="notebook_images/Features.png" />
</p>

One way to look at the advantage of deep learning is by considering what research has shown from dissecting deep learning image classifiers. In these models, the nodes in the first few layers of the network tend to be reactive to very small, low-level features of the image, like lines and edges. However, the nodes in later layers are reactive to progressively complex features. Thus, by using many layers in succession, we can extract increasingly complex feature to, hopefully, improve the performance of the model.

In [None]:
import numpy as np
import os

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder
import torchvision.transforms as transforms
from torchvision.io import read_image
from torchvision.utils import make_grid
import torchvision.transforms.functional as F

import matplotlib.pyplot as plt
from skimage import io, transform
from PIL import Image

## Welcome to PyTorch

Time to get hands-on and start building a neural network ourselves. [PyTorch](https://pytorch.org/) is an open-source deep learning framework available in Python. Today, we will be using it to build our own image classifier.

Let's start with the basics: PyTorch uses its own version of a NumPy array, called a [tensor](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html), to store data.

In [None]:
tensor = torch.tensor([[1, 2],[3, 4]])
tensor

### The value of tensors

As we can see above, a tensor is just like a NumPy array. However, a tensor has two advantages over standard NumPy arrays:

* First, a tensor can run on a GPU, unlike a NumPy array.
* Second, tensors have been optimalized for automatic differentiation. We'll get back to the importance of this in a little bit, but automatic differentiation plays an important role in training neural networks.

In [None]:
# If we'd like to pass a tensor to a gpu, here's how we'd do it:
if torch.cuda.is_available():
    tensor = tensor.to("cuda")

print(f'Tensor is currently running on: {tensor.device}')

Tensors will form the input to the neural network we will be building. In this case, we would like to build a classifier in which we classify from which of seven countries an image of a mailbox is. The countries we'll be using are Germany, Spain, France, Ireland, Japanthe Netherlands, the United Kingdom, and the United States.

In [None]:
classes = ["DE", "ES", "FR", "IE", "JP", "NL", "UK", "US"]

## Dataset

To build our classifier, we will need some data in the form of images of mailboxes from different countries. The dataset has already been provided for us. Mailboxes_Train contains images of mailboxes we will use for our train set, while Mailboxes_Val contains the images we will use for our validation set. For convenience, all images have already been scaled to a fixed size of 256 by 256 pixels. Because neural networks have a fixed number of nodes per layer, fixing the size of the images like this tends to be an important step in pre-processing our data.

Let's try and load an image from each class to see what these mailboxes look like.

In [None]:
image_list = []
countries = []

for folder in os.listdir('data/train'):
    file = os.listdir(f'data/train/{folder}')[0]
    image = Image.open(f'data/train/{folder}/{file}')
    image_list.append(image)
    countries.append(folder)

fig, axs = plt.subplots(ncols=len(image_list),
                        squeeze=False,
                        figsize=(13, 5))

for i, img in enumerate(image_list):
    axs[0, i].imshow(np.asarray(img))
    axs[0, i].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[], title=countries[i])

Now that we have some image data to work with, we'll need to instantiate our dataset. PyTorch offers the functionality to build your own [custom datasets](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#creating-a-custom-dataset-for-your-files), which offers us the ability to iterate through any data we could load as a tensor.

Luckily for us, PyTorch also readily implements [many options for image datasets](https://pytorch.org/vision/stable/datasets.html). In this case, we will be using the [ImageFolder class](https://pytorch.org/vision/stable/generated/torchvision.datasets.ImageFolder.html#torchvision.datasets.ImageFolder), which allows us to create a dataset from a folder, with the names of the labels as subfolders.

## Explain relation image <-> Tensor

In [None]:
# We will need this bit of code to transform our images into tensors
base_transformations = [
    transforms.Resize((64,64)),
    transforms.ToTensor(),
]

augmentation_transformations = [
#     transforms.RandomRotation(degrees=(-45, 45))
]

train_transformations = transforms.Compose(base_transformations + augmentation_transformations)
test_transformations = transforms.Compose(base_transformations)

train_set = ImageFolder("data/train", transform=train_transformations)
val_set = ImageFolder("data/test", transform=test_transformations)

To iterate through our dataset, we'll need PyTorch's [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), which will create an iterator capable of loading the mailbox images in batches in a random order. Let's create two, one for our train and one for our validation set.

We will also need to pass the batch size of our data loader, which will determine how many images are loaded per iteration. You might wonder why we do not just pass the full dataset at once. There are two reasons for this. First, neural networks tend to train faster when passing the data in batches. Second, it requires less memory when training the network. In the case of large, raw files like images this is especially helpful if we are not able to fit the full dataset at once in memory.

There is lots of discussion regarding the ideal batch size to use, and it is very much a parameter you can tune while building a neural network. For this use case, we will use a batch size of 32, which convention states is a good starting point.

In [None]:
train_dataloader = DataLoader(train_set, batch_size=32, shuffle=True, num_workers=0)
val_dataloader = DataLoader(val_set, batch_size=32, shuffle=False, num_workers=0)

In [None]:
def show_batch(img):
    npimg = img.numpy()
    plt.figure(figsize=[15, 15])
    plt.imshow(np.transpose(npimg, (1,2,0)), interpolation='nearest')

w = next(iter(train_dataloader))[0]
grid = make_grid(w, nrow=8)
show_batch(grid)

We should now be able to iterate through our dataset and load images in batches.

In [None]:
for x, y in train_dataloader:
    print(x.shape)
    print(y)
    break

# **Building the model**

In [None]:
# Get device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Now that we have a way to load in our data and pass it to our neural network in batches, we can start actually building the network. This is done by creating a new class, which inherits from PyTorch's [nn.Module class](https://pytorch.org/docs/stable/generated/torch.nn.Module.html).

Subsequently, we will need to actually build the layers of our neural network. We can use the building blocks from the [torch.nn module](https://pytorch.org/docs/stable/nn.html). This module contains layers we can use in our neural network. In the class we have created for our neural network, we can then define the layers we want to use in \_\_init\_\_.

Below, we have built a model consisting of [nn.Linear layers](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear), also known as fully connected layers. These are layers where each node in the layer has a connection to each node in the succeeding layer. Each fully connected layer is separated by a [nn.ReLU layer](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html). This layer is also known as an activation function and serves to modify the activation provided by the preceding linear layer. Specifically, a ReLU layer sets all inputs below 0 to 0 and leaves the rest of the inputs as is. The main reason to use such an activation function is to introduce non-linearities into the network.

You'll also notice an [nn.Flatten layer](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html). Because our fully connected layers will not be able to handle multi-dimensional tensors, like the three channels in our tensor representing the RGB values of the image, the flatten function will convert our input to a 1-dimensional tensor. While there are ways to deal with this more elegantly, these methods are beyond the scope of today's workshop.

We've wrapped some layers with [nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) to chain our layers together. This container allows us to chain together any number of layers that we would like to use in sequence.

The last piece of the puzzle is the forward() method. This will define through which layers the samples we pass to our network will go.

In [None]:
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            # The input to the first layer matches the size of 1 image
            nn.Linear(3*64*64, 64),
            nn.ReLU(),
            nn.Linear(64, 64),
            nn.ReLU(),
        )
        # The output of the last layer matches the number of classes
        self.last_layer = nn.Sequential(
            nn.Linear(64, 8)
        )

    def forward(self, x):
        x = self.flatten(x)
        x = self.linear_relu_stack(x)
        x = self.last_layer(x)
        return x

model = NeuralNetwork().to(device)
print(model)


# Torch Summary?

# **Training the model**

We have now defined a model: all that is left to do is train it.

To train our model, we will need a couple of things. First, we'll need a loss function, which will quantify the dissimilarity between the output from our neural network and our target. For classification problems like the one at our hands here, a common choice for a loss function is [nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html).

Once we know this quantity for a batch of samples, also known as the loss, we will need to update the model parameters to gradually bring it down. In a neural network, this is done via backpropagation. The exact technical details of this algorithm are not important for now. The basic idea is that backpropagation computes the gradient of the weights with respect to the loss function. If we know the gradient of these weights with regard to the loss function, we should also have a general idea of how to update our weights in order to bring our loss down.

<p align="center">
  <img src="notebook_images/GD.gif", width="700", height="397" />
</p>

In our case PyTorch will do the heavy lifting of for us via an optimization algorithm. Here we use [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). Pytorch implements such optimization algorithms in the [torch.optim](https://pytorch.org/docs/stable/optim.html) module. We'll also set a learning rate. This will determine how large the updates to the weights are. The larger the learning rate, the more they will change. If the learning rate is too large, we will not be able to find the global minimum as the weights will change too much on each update to settle on one minimum. But, if the learning rate is too small, we will not be able to get out of local minima. Thus, the learning rate has a large impact on training the network as well.

In [None]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

We'll create a function for our training loop. The training loop really consists of a few steps:

1. Create predictions for a single batch of images and calculate loss.
2. Set previously calculated gradients to zero
3. Backpropagate the loss through the layers of the network to calculate gradients
4. Update the weights



In [None]:
def train(dataloader, model, loss_fn, optimizer):
    num_batches = len(dataloader)

    # Set the model to train mode
    model.train()
    train_loss = 0

    for batch, (X, y) in enumerate(dataloader):
        # Push X and y tensors to relevant device (CPU or GPU)
        X, y = X.to(device), y.to(device)

        # 1. Create predictions for a single batch of images and calculate loss.
        pred = model(X)
        loss = loss_fn(pred, y)

        # 2. Set previously calculated gradients to zero
        optimizer.zero_grad()

        # 3. Backpropagate the loss through the layers of the network to calculate gradients
        loss.backward()

        # 4. Update the weights
        optimizer.step()

        train_loss += loss.item()

        # if (batch + 2) * len(X) > size:
    avg_loss = train_loss / num_batches
    return avg_loss

We'll also create a function for testing on our validation data. Here, we do not want to execute the backpropagation algorithm, as that would mean we would be optimizing our model on validation data. Thus, we leave out a couple of steps in this function.

In [None]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)

    # Set the model to eval mode
    model.eval()
    test_loss, correct = 0, 0

    # Disable backpropagation
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)

            # Generate predictions
            preds = model(X)

            # Calculate loss and number of correct predictions
            test_loss += loss_fn(preds, y).item()
            correct += (preds.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size

    return correct*100, test_loss

We'll log the progress of training in epochs. An epoch is when all the data in our dataset has passed the training loop for one iteration.

In [None]:
epochs = 50

best_score = 0.

for epoch in range(epochs):
    # Train the model on train dataset
    train_loss = train(train_dataloader, model, loss_fn, optimizer)
    
    # Test the model on the test dataset
    test_correct, test_loss = test(val_dataloader, model, loss_fn)
    
    print(f'Epoch: {epoch+1} | Train loss: {train_loss:>5f} | Test loss: {test_loss:>5f} | Correctly classified: {test_correct:>0.1f}%')
    best_score = max(best_score, test_correct)
    
    
print(f'Best score: {best_score:>0.1f}%')  

## Now it's your turn!

Now that we know how to build a neural network, it is your time to shine! Your goal is to create as accurate of a classifier as possible. Below you'll find all the code needed to build your own neural network. You're given 50 epochs to train your model, and your final score will be measured on the validation set after these 50 epochs.

We'll give you some tips to get started on building your own model:

* Number of layers
* Number of nodes
* Activation functions
* Regularisation/dropout
* Learning rate
* Batch size
* Optimization algorithm
* Different type of layers (especially convolutional with pooling)

* Last but not least, data preparation is very important. In your trainingset, you can add augmentation to creat artificiallly more variety in your dataset. One augmentation is already given. You can check here for more https://pytorch.org/vision/stable/auto_examples/plot_transforms.html#sphx-glr-auto-examples-plot-transforms-py

In [None]:
# We will need this bit of code to transform our images into tensors
base_transformations = [
    transforms.Resize((64,64)),
    transforms.ToTensor(),
]

augmentation_transformations = [
    transforms.RandomRotation(degrees=(-45, 45))
]

train_transformations = transforms.Compose(base_transformations + augmentation_transformations)
test_transformations = transforms.Compose(base_transformations)

train_set = ImageFolder("data/train", transform=train_transformations)
val_set = ImageFolder("data/test", transform=test_transformations)

In [None]:
train_dataloader = DataLoader(train_set, batch_size=32, shuffle=True, num_workers=0)
val_dataloader = DataLoader(val_set, batch_size=32, shuffle=False, num_workers=0)

In [None]:
# Show transformations
w = next(iter(train_dataloader))[0]
grid = make_grid(w, nrow=8)
show_batch(grid)

In [None]:
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            # The input to the first layer matches the size of 1 image
            nn.Linear(3*64*64, 64),
            nn.ReLU(),
            nn.Linear(64, 64),
            nn.ReLU()           
        )
        # The output of the last layer matches the number of classes
        self.last_layer = nn.Sequential(
            nn.Linear(64, 8)
        )

    def forward(self, x):
        x = self.flatten(x)
        x = self.linear_relu_stack(x)
        x = torch.flatten(x, 1)
        x = self.last_layer(x)
        return x

model = NeuralNetwork().to(device)
print(model)


In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

In [None]:
epochs = 50

best_score = 0.

for epoch in range(epochs):
    # Train the model on train dataset
    train_loss = train(train_dataloader, model, loss_fn, optimizer)
    
    # Test the model on the test dataset
    test_correct, test_loss = test(val_dataloader, model, loss_fn)
    
    print(f'Epoch: {epoch+1} | Train loss: {train_loss:>5f} | Test loss: {test_loss:>5f} | Correctly classified: {test_correct:>0.1f}%')
    best_score = max(best_score, test_correct)

print(f'Your final score is: {best_score:>0.1f}%')

In [None]:
image_list = []
labels = []
preds = []

for i, (x, y) in enumerate(val_set):
    image = read_image(val_set.imgs[i][0])
    image_list.append(image)
    labels.append(y)

    pred = model(x.unsqueeze(0))
    preds.append(pred.argmax(1))


fig, axs = plt.subplots(ncols=9,
                        nrows=9,
                        squeeze=False,
                        figsize=(20, 50))

for i, img in enumerate(image_list):
    img = F.to_pil_image(img)
    axs[i // 9, i % 9].imshow(img)
    axs[i // 9, i % 9].set(xticklabels=[], yticklabels=[], xticks=[], yticks=[], title=f"Real: {classes[labels[i]]}\nPred: {classes[preds[i]]}")