# D5 : Introduce Convolutional Neural Networks

Today we will learn how to make a Convolutional Neural Network (CNN) for classifying images in the CIFAR-10 dataset.

To start with, we will briefly introduce


*   why CNN for image classification

*   CNN pipeline, convolutional layer, and max pooling layer

*   CNN applications

*   CIFAR-10 dataset




## Why CNN
A convolutional neural network (CNN, or ConvNet), is an architecture commonly used for deep learning. The use of CNNs for deep learning has become increasingly popular due to three important factors:

*   It eliminates the need for manual feature extraction, i.e., the features are learnt directly by the CNN.

*   It produces state-of-the-art recognition results.

*   CNNs can be retrained for new recognition tasks and allow for building on pre-existing networks.


A convolutional neural network can have tens or hundreds of layers that each learn to detect different features of an image. Filters are applied to each training image at different resolutions and the output of each convolved image is used as the input to the next layer. The filters can start as very simple features, such as brightness and edges, and increase in complexity to features that uniquely define the object as the layers progress.


CNNs for Image classification:

One method for creating a convolutional neural network to perform image classification is to train a network from scratch. The architect is required to define the number of layers, the learning weights, and number of filters, along with other tunable parameters. Training an accurate model from scratch also requires massive amounts of data, on the order of millions of samples, which can take an immense amount of time to train.

A common alternative to training a CNN from scratch is to use a pre-trained model to automatically extract features from a new data set. This method, called transfer learning, is a convenient way to apply deep learning without a huge data set and long computation and training time.

## CIFAR-10 database
CIFAR-10 classification is a common benchmark problem in machine learning. The problem is to classify RGB 32*32 pixel images across 10 categories, i.e., airplane, automobile, bird, 
cat, deer, dog, frog, horse, ship, and truck. The CIFAR-10 dataset consists of 60000 images, with 6000 images per class. There are 50000 for training and the rest 10000 for test.
For more details refer to the CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html)

in which, X_train, X_test are arrays of RBG image data with shape (num_samples, 3, 32, 32), and y_train, y_test are arrays of category labels (0-9) with shape (num_samples, 1).

In PyTorch you can load CIFAR-10 with `torchvision.datasets.CIFAR10()`, which returns a datset for either the training or the test data, depending on the parameter `train=True` or `train=False`. Tha data hast the shape (num_samples, 32, 32, 3)

## Initial imports

In [None]:
import numpy as np
import os
import typing
import torch
import torchvision
from torch.nn import BatchNorm2d, Conv2d, MaxPool2d, Flatten, Linear, Dropout
import torch.nn.functional as F
from matplotlib import pyplot as plt

print(np.__version__)
print(torch.__version__)
print(torchvision.__version__)

# set up model to use GPU if available
device = "cuda:0" if torch.cuda.is_available() else "cpu"
print(device)

## Exercise 1: Build a CNN with PyTorch for CIFAR-10 classification (with the following structure as depicted in the following figure)
![cnn pipeline](https://cdn-images-1.medium.com/fit/t/1600/480/1*vkQ0hXDaQv57sALXAJquxA.jpeg)

Steps: (No steps. First, brainstorm-time to build your own CNN)
No need to implement the convolution computation, you may use `torch.nn.Conv2d` and `torch.nn.MaxPool2d` directly.
But still, you need to set (initialise) weights and bias properly and give correct shapes of tensors to make it rock.

### Your goal: more than 70% accuracy on the whole test set (be over 70% for 3+ epochs in a row).


In [None]:
BATCH_SIZE = 32

# Load the CIFAR-10 data
train_data = torchvision.datasets.CIFAR10(
    root="./data", 
    train=True, 
    transform=torchvision.transforms.ToTensor(), 
    download=True
)

test_data = torchvision.datasets.CIFAR10(
    root="./data", 
    train=False, 
    transform=torchvision.transforms.ToTensor(),
    download=True
)

train_loader = torch.utils.data.DataLoader(
    train_data,
    batch_size=BATCH_SIZE,
    shuffle=True,
    drop_last=False
)

test_loader = torch.utils.data.DataLoader(
    test_data,
    batch_size=1,
    shuffle=False,
)

X_train, y_train = torch.tensor(train_data.data), torch.tensor(train_data.targets)
X_test, y_test = torch.tensor(test_data.data), torch.tensor(test_data.targets)

assert X_train.shape == (50000, 32, 32, 3)
assert X_test.shape == (10000, 32, 32, 3)
assert y_train.shape == (50000,)
assert y_test.shape == (10000,)

X_train.shape, y_train.shape, X_test.shape, y_test.shape

## Provided code from the days before

In [None]:
BATCH_SIZE = 32

def single_model_step(
    model: torch.nn.Module, 
    optimizer: torch.optim.Optimizer,
    loss_function: typing.Callable,
    training: bool,
    X: torch.Tensor, 
    y: torch.Tensor,
    device: str
) -> torch.Tensor:
    r"""Single model training/evaluation step.

    Args:
        model: pytorch model to be trained
        optimizer: optimizer wrapping pytorch model
        loss_function: loss function
        training: flag controlling whether this is a training
            or an evaluation step
        X: pytorch tensor containing features
        y: pytorch tensor containing labels
        device: device to where model is located
        
    Returns:
        torch.Tensor: loss at current step
    """
    pred = model(X.to(device))
    loss = loss_function(pred, y.to(device))
    if training:
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    return loss.cpu().item()

def inference_model(
    model: torch.nn.Module, 
    loader: torch.utils.data.DataLoader,
    num_classes,
    device: str
):
    r"""Evaluate model on dataloader.
    
    Iterates over loader until it is exhausted 
    and stores the model output (logits) and labels
    in tensors.
    
    Args:
        model: model to be evaluated
        loader: dataloader containing data to evaluate model on
        device: string denoting device on which to run evaluation
        
    Returns:
        torch.Tensor: model outputs (logits)
        torch.Tensor: ground truth labels
    """
    model.eval()
    logits = torch.zeros((len(loader.dataset), num_classes))
    labels = torch.zeros((len(loader.dataset)))
    with torch.no_grad():
        for index, (X, y) in enumerate(loader):
            logits[index*BATCH_SIZE:(index+1)*BATCH_SIZE, :] = model(X.to(device)).cpu()
            labels[index*BATCH_SIZE:(index+1)*BATCH_SIZE] = y
    return logits, labels

def get_accuracy(labels, logits):
    r"""Compute accuracy.
    
    Args:
        labels: ground truth labels
        logits: model predictions
    
    Returns:
        torch.Tensor: single accuracy value
    """
    prediction_indices = torch.argmax(logits, axis=-1)
    correct_predictions = labels == prediction_indices
    accuracy = correct_predictions.sum() / len(correct_predictions)
    return accuracy.item()



def train_and_evaluate(
    model: torch.nn.Module, 
    loss_function: typing.Callable, 
    optimizer: torch.optim.Optimizer, 
    train_loader: torch.utils.data.DataLoader,
    test_loader: torch.utils.data.DataLoader,
    num_classes: int,
    epochs: int,
    device: str,
    verbose: bool,
):
    r"""Run training and evaluation.
    
    Args:
        model: pytorch model to be trained
        loss_function: loss function
        optimizer: optimizer wrapping pytorch model
        train_loader: dataloader containing training data
        test_loader: dataloader containing training data
        num_classes: number of classes for the classification task
        epochs: number of epochs for which to train model
        device: device to where model is located
        verbose: flag controlling whether to print information
        
    Returns:
        torch.Tensor: accuracy on test set after last epoch
    """
    for epoch in range(epochs):
        model.train()
        train_loss = 0
        for train_steps, (X, y) in enumerate(train_loader):
            train_loss += single_model_step(
                model=model, 
                loss_function=loss_function, 
                optimizer=optimizer, 
                training=True,
                X=X, 
                y=y,
                device=device
            )
        train_loss /= train_steps
        
        model.eval()
        test_loss = 0
        with torch.no_grad():
            for test_steps, (X, y) in enumerate(test_loader):
                test_loss += single_model_step(
                    model=model, 
                    loss_function=loss_function, 
                    optimizer=optimizer, 
                    training=False,
                    X=X, 
                    y=y,
                    device=device
                )
        test_loss /= test_steps
        
        
        print(f"\n---------- EPOCH {epoch + 1} ------------")
        print(f"Average Train Loss: {train_loss}")
        print(f"Average Test Loss: {test_loss}")
        logits, labels = inference_model(
            model=model, 
            loader=test_loader,
            num_classes=num_classes,
            device=device
        )
        accuracy = get_accuracy(
            labels=labels,
            logits=logits
        )
        print(f"Test Accuracy: {accuracy}")
        
    return accuracy

## Your turn, have fun! :)

Use the data loaders `train_loader` and `test_loader` from above for your training. The only thing you should have to do is to create your own model. For training and evaluating it you can simply use the provided functions from the last days (also incorporated above). Again a short reminder: you may use `torch.nn.Conv2d` and `torch.nn.MaxPool2d` directly. However, don't forget to set (initialise) weights and bias properly and give correct shapes of tensors.
Achieve a test accuracy of more than 70% on the whole test set for 3+ epochs in a row.


In [None]:
# Your Code

## 2. Shape and number of parameters

Compute the shapes of your convolutional and max pooling layers as well as the number of trainable parameters of your convolutions and fully-connected layers. Use the formulas below.

### Formulas for the calculation of the shapes and trainable parameters

#### Number of trainable parameters:

n_params_dense = n_input*n_output + n_output  , i.e.: n_weight_matrix + n_bias


n_params_conv = kernel_size_h \* kernel_size_w \* n_filters_in \* n_filters_out + n_filters_out 

Every region of (kernelwidth x kernelheight x input_filters) shares weights. Output for every region is (n_filters_out), afterwards, bias is added, therefore + n_filters_out

#### Output shape of a concolutional layer:
1. output_width = (input_width - kernel_size + 2 \* amount_zero_padding) / stride + 1
2. output_height = (input_height - kernel_size + 2 \* amount_zero_padding) / stride + 1
3. output_depth = n_filters

#### Output shape of a max pooling layer:
1. output_width = (input_width - kernel_size) / stride + 1
2. output_height = (input_height - kernel_size) / stride + 1
3. output_depth = n_filters

If the result is a decimal point, it is rounded down.

In [None]:
# Your code