# Convolutional ANN and Transfer learning – D7046E @ LTU.SE

# Introduction
The goal of this exercise is for you to get a better understanding of what convolution is, how it is leveraged to increase the usability and performance of neural networks. The exercise will also teach you about transfer learning and the differences between fine tuning/feature extraction. 

## Literature
This exercise will rely on the following sections in the [course book](https://www.deeplearningbook.org/).

- Chapter 9
    - Most of it
- Chapter 7
    - Section 7.4 - Dataset augmentation
- Chapter 15
    - Section 15.2 - Transfer learning
    
## Examination
Epochs are predefined to be 100, if this is taking too long on your pc you can decrease it. Just make sure that you use the same hyperparameter values on task 2, 3 and 4. **Make sure you have all examination requirements in order before presenting.**

### Task 1
1. Implementation of "same convolution".
2. Compute and illustrate the resulting image using 3 different convolution filters.

### Task 2
1. Implement a CNN defined, trained and validated on the given dataset. Don't forget to make the train/validation split of the dataset. This can be achieved programatically using https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split.
2. Report the training, validation and test accuracy. (Should beat randomly picking)
3. Calculate the multi-class [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix).
4. Implement a data augmentation technique that fits well with the data. Does this increase or decrease the validation accuracy?

### Task 3
1. Fine-tune a pre-trained model on the dataset.
2. Report the training, validation and test accuracy. (Should beat randomly picking)
3. Calculate the multi-class [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix).
4. Add some augmentation techniques which fits well with the data. Does this increase or decrease the validation accuracy?

### Task 4
1. Use a pre-trained model as a feature extractor on the dataset.
2. Report the training, validation and test accuracy. (Should beat randomly picking)
3. Calculate the multi-class [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix).
4. Add some augmentation techniques which fits well with the data. Does this increase or decrease the validation accuracy?

In [1]:
%matplotlib inline

# Convolution in Neural Networks
A convolutional neural network, CNN for short, is a type of ANN which consists of at least one convolutional layer. CNNs are often used where the input size may vary such as when we are dealing with image input. The arcitecture of CNNs were inspired by how the visual cortex functions in our brain.

## Task 1: Implement convolution
Implement 2d same convolution without using a built in convolution function. This should function as described in [this blogpost](https://jcbgamboa.github.io/2017/08/12/what-are-convolutions/). One of the great strengths of convolution is that it functions on any sized image, hence it is important that your implementation also does. Same convolution basically means that the dimensions of the output is the same as the dimensions of the input. This is achieved by padding the input.

Once you have implemented a function which performs 2d convolution, use that to perform convolution over all channels in this image. Show the result using 3 different filters.

To find the padding needed to get the input to be the same space as the output you can use the formula:

$$ n_{out} = \left \lfloor\frac{n_{in}+2p-k}{s} \right \rfloor+1 $$

where $n_{out}$ is the number of output features, $n_{in}$ is the number of input features, $k$ is the kernel size, $p$ is the padding size and $s$ is the stride size. You are allowed to define that the stride is always 1.

In [6]:
epochs = 100

In [7]:
import numpy as np
import math
import cv2
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams['figure.dpi'] = 100


# implement same convolution 
def conv(image, kernel, strides=1):
    pass

# test
inp = np.array([[1,1,1,1],[1,1,2,1],[1,-3,-4,1],[1,1,1,1]])
kernel = np.array([[0,1,0],[1,2,1],[0,1,0]])

# if all are TRUE the convolution is implemented correctly
ans = np.array([[4, 5, 6, 4], [5, 3, 3, 6], [1, -7, -7, 0], [4, 1, 0, 4]])
print(conv(inp, kernel) == ans)

[[False False False False]
 [False False False False]
 [False False False False]
 [False False False False]]


# Computer Vision
Computer vision (CV) is a task within the computer science field which aim is to extract high level information from static images or video. Such high level information can be, but is not limited to:
* Object detection - Detect and classify objects within input images
* Anomaly detection - Detect anomalies in the input images
* Segmantic segmentation - Classify each and every pixel in the input image into different classes
* Object recognition - Classifying an entire image depending on what it contains

CV has been studied for multiple decades where early solutions used handwritten feature extractors to extract information from the input. However with the increase of computing power together with the rise of deep learning algorithms, the main method used to solve CV problems is convolutional neural networks.

In this exercise we will be taking a closer look at object recognition by utilizing transfer learning. The dataset we will use for this exercise can be downloaded on canvas. It is a subset of [this dataset](http://www.vision.caltech.edu/Image_Datasets/Caltech101/). Remember to split the data into seperate training, validation and test set.

## Task 2: Implement a CNN and train it on the given dataset.

In [5]:
# Implement your own CNN and train it on the given dataset

# Transfer learning
Transfer learning refers to the practice to use a model which has already been pretrained on a large dataset to be able to solve task $T_1$, replace the output layer or a few of the upper layers within this model and retrain the model on a smaller dataset to be able to solve task $T_2$. Formally this can be described as the following:

__Def 1:__ Let $D_s$ be the source domain and $T_s$ be the corresponding source task. Let $D_t$ be the target domain and $T_t$ be the corresponding target task. Let $f_t$ be the predictive function for $T_s$. Thus transfer learning aims to improve the learning of $f_t$ in $D_t$ using the already learned knowledge in $D_s$ and $T_s$ where $D_s \neq D_t$ and $T_s \neq T_t$.

The benefit from using transfer learning is that we can train an accurate computer vision model with relatively small amounts of data and computing resources compared to the costly pretraining process of the full convolutional neural network (a few days using multiple gpus). 

## Fine-tuning and Feature extraction
There are two main ideas when it comes to transfer learning, fine-tuning and feature extraction. When using fine-tuning we allow all weights to be changed during the training phase. However when we use the pretrained model as a feature extractor we instead freeze earlier layers of the model, which means that the weights in those layers will not be updated during the training phase and we only update the weights in the upper layers that we have replaced. 

This works because low level information extracted from the input image is universal between tasks, examples of such information is edge detection, shape detection and pattern detection. This is what the early layers are optimized to do, as where later layers extract more abstract features relevant for the task. 

Most of the pre-trained models in pytorch are trained on [ImageNet](http://www.image-net.org/). 

## Task 3: Fine-tuning

In [3]:
# Fine-tune a model to the dataset

## Task 4: Feature extraction

In [4]:
# Use a predefined model as a feature extractor