# Assignment 2

The main idea of this assignment is to understand the convolutional neural networks and the basics of image filtering. Matrix convolution and convolutional layer will be implemented from scratch. 

All functions should be implemented in **NumPy** if no other notes are given.

## Table of contents

* [1. Recap](#1.-Recap)
* [2. Matrix Convolution](#2.-Matrix-Convolution)
* [3. Basic Kernels](#3.-Basic-Kernels)
* [4. Convolutional Layer](#4.-Convolutional-Layer)
* [5. MaxPooling Layer](#5.-MaxPooling-Layer)
* [6. CNN](#6.-CNN)
* [7. Experiments](#7.-Experiments)

# 1. Recap

During the previous assignment, you implemented the main building blocks of the neural networks: **Dense Layer**, nonlinearities, losses, and optimizers.
* Dense layer is useful enough
* Dense layer performs the following mapping of the input matrix $X$ (matrix of objects): 
$$
X \rightarrow XW + b
$$
* It allows one to build and train flexible models 
* Let's look precisely at image processing with Dense Layer
    * We have a grayscale image $x$ of size $N \times M$
    * We reshape it into a vector of length $NM$
    * Then we map it with a dense layer
    * And obtain the transformed vector $y$
    * Each element of $y$ depends on each element of $x$. That's why it is also called **Fully-Connected**
* When we work with images, we assume that each pixel is correlated with its neighbours and close pixels. Distant pixels are not corellated. Various experiments demonstrate that this assumption is correct.
* Dense layer captures these corellations, but it also captures *noisy* corellations. 
* There is a way to create **Locally-Connected** layer which will learn only local corellations with less number of parameters.
* This layer is called **Convolutional Layer** and it is based on **matrix convolution**

# 2. Matrix Convolution

It is easier to understand the convolution when you see the image. Here is the image. ![](https://camo.githubusercontent.com/709b7f5eb5203b41f9456f887787b6ea790878b5/68747470733a2f2f636f6d6d756e6974792e61726d2e636f6d2f6366732d66696c652f5f5f6b65792f636f6d6d756e6974797365727665722d626c6f67732d636f6d706f6e656e74732d7765626c6f6766696c65732f30302d30302d30302d32302d36362f343738362e636f6e762e706e67)

* We "put" the kernel on the matrix. Each element from the kernel is multiplied by the correcponding element of the source matrix. The results are summed and are written to the new matrix.

* The source matrix has smaller size than the source one. It is so because of the border effects. 
* In order to obtain the matrix of the same size, zero padding could be used. 

* We have a matrix $X$ of size $N \times M$ and a kernel K of size $(2p+1) \times (2q +1 )$. 
* We also define $X_{ij} = 0$ for $i > N, i < 1$ and $j > M, j < 1$. It is called **zero padding**

* Therefore the convolution of matrix with the kernel is defined as follows:

$$
Y = X \star K \\
Y_{ij} = \sum\limits_{\alpha=1}^{2p + 1} \sum\limits_{\beta=1}^{2q + 1}
K_{\alpha \beta} X_{i + \alpha, j+\beta}
$$

* In machine learning this operation is called **convolution** and in mathematics it is **cross-corellation**. 

In [2]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [5]:
import automark as am

username = 'sosnovik'
# if yor are not registered
am.register_id(username, ('ivan sosnovik', 'i.sosnovik@uva.nl'))

Username already registered.


Now you should implement matrix convolution

In [20]:
def conv_matrix(matrix, kernel):
    """Perform the convolution of the matrix with the kernel
    # Arguments
        matrix: input matrix np.array of size `(N, M)`
        kernel: kernel of the convolution 
            np.array of size `(2p + 1, 2q + 1)`
    # Output
        the result of the convolution
        np.array of size `(N, M)`
    """
    #################
    ### YOUR CODE ###
    #################
    return output

Let's test the function

$$
X = \begin{bmatrix}
1 & 2 & 3 \\
2 & 3 & 4 \\
3 & 4 & 5 \\
\end{bmatrix} \quad
K = 
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
\end{bmatrix} \quad 
X \star K = 
\begin{bmatrix}
4 & 6 & 3 \\
6 & 9 & 6 \\
3 & 6 & 8 \\
\end{bmatrix}
$$

In [21]:
X = np.array([
    [1, 2, 3],
    [2, 3, 4],
    [3, 4, 5]
])

K = np.eye(3)

In [22]:
print(conv_matrix(X, K))

[[ 4.  6.  3.]
 [ 6.  9.  6.]
 [ 3.  6.  8.]]


In [None]:
am.test_student_function(username, conv_matrix, ['matrix', 'kernel'])

# 3. Basic Kernels

Matrix convolution could be used to process the image: ro blur it, to shift the image, to get the edges etc. Here is the very interesting [article](http://setosa.io/ev/image-kernels/). It is interactive. So you are able to get the better understanding of convolutions. 

Let's play with convolutions

In [115]:
rgb_img = plt.imread('./images/dog.png')
plt.imshow(rgb_img)

We will convert it to grayscale. RGB image is a 3 dimensional tensor. But grayscale image could be represented as a matrix

In [116]:
img = rgb_img.mean(axis=2)
plt.imshow(img, cmap='gray')

First of all, let's blur the image with [box blur](https://en.wikipedia.org/wiki/Box_blur). It is just a convolution of a matrix with the kernel of size $N \times N$ of the following form:

$$
\frac{1}{N^2}
\begin{bmatrix}
1 & \dots  & 1\\
\vdots & \ddots & \vdots\\
1 & \dots  & 1\\
\end{bmatrix}
$$

In [37]:
def box_blur(image, box_size):
    """Perform the blur of the image
    # Arguments
        image: input matrix - np.array of size `(N, M)`
        box_size: the size of the blur kernel - int > 0  
            the kernel is of size `(box_size, box_size)`
    # Output
        the result of the blur
            np.array of size `(N, M)`
    """   
    #################
    ### YOUR CODE ###
    #################
    return output

In [117]:
blur_dog = box_blur(img, box_size=5)
plt.imshow(blur_dog, cmap='gray')

In [None]:
am.test_student_function(username, box_blur, ['image', 'box_size'])

Now we will get the vertical and horizontal gradients. To perform it we just calculate the convolution of the image with the following kernels:

$$
K_h = 
\begin{bmatrix}
-1 & 0  & 1\\
\end{bmatrix} \quad
K_v = 
\begin{bmatrix}
1 \\
0 \\
-1\\
\end{bmatrix} \\
X_h = X \star K_h \quad X_v = X \star K_v\\
$$

And then we just calculate the amplitude of the gradient in both directions:

$$
X_\text{grad} = \sqrt{X_h^2 + X_v^2}
$$

In [118]:
dog_h = conv_matrix(blur_dog, np.array([[-1, 0, 1]]))
dog_v = conv_matrix(blur_dog, np.array([[-1, 0, 1]]).T)
dog_grad = np.sqrt(dog_h ** 2 + dog_v ** 2)
plt.imshow(dog_grad, cmap='gray')

Now we have the edges we can work with. This is not the only way to get the edges. There are plenty of them:
* [Canny edge detection](https://en.wikipedia.org/wiki/Canny_edge_detector)
* [Sobel operator](https://en.wikipedia.org/wiki/Sobel_operator)
* [Prewitt operator](https://en.wikipedia.org/wiki/Prewitt_operator)

We fixed the kernels and used them. But we can also learn them such by minimizing sime loss and making the processing as effective as it is possible. To do it, we have to define **Convolutional layer**