# Learning Objectives

By the end of this lab, you will have

- Implemented the forward and backward pass for a Dot layer
- Gained an intuition for convolutional filters and the various types of actions they can perform
- Implemented the forward and backward pass for a Convolutional layer
- Verified the correctness of your implementations with gradient checking

Let's get started!

# Dot Layers

As a building block for implementing a convolutional layer, let's implement a Dot layer. A Dot layer takes as input two tensors and computes an elementwise product, followed by a total sum. It is identical to an inner product between two vectors, however both $\mathbf{x}$ and $\mathbf{w}$ are two-dimensional.

Here is the Dot layer in computational graph form

![Dot Layer Forwards](images/Dot%20Layer%20Forwards.png)
as well as in algebraic form:

$$
\text{Dot}(\mathbf{x}, \mathbf{w}) = \sum_{i=1}^F \sum_{j=1}^F \mathbf{x}_{i,j} * \mathbf{w}_{i,j}.
$$

## Implementing the Forward Pass

Like all layers, we must first define the forward pass, which computes the output of our layer given the inputs.

### Tasks

- Implement `dot_forward()`

In [1]:
def dot_forward(x, w):
    """Perform the forward pass on a dot product
    
    Parameters
    ----------
    x : a numpy 2darray
    w : a numpy 2darray
    
    `x` and `w` are assumed to have the same shape. Think of `w` as
    a convolutional filter and `x` a little region in the image that
    is being convolved. Returns a float.
    
    """
    assert x.shape == w.shape
    return np.sum(x * w)

## Implementing the Backward Pass

Now that we have defined the forward pass, all we have left is to define the backward pass. As a reminder, the backward pass takes the inputs and the gradient of the loss with respect to the output and uses the chain rule to compute the gradients of the input with respect to the loss.

Before writing any code, it's very helpful to annotate the computational graph with the gradients.

### Tasks

- What is $\nabla_\mathbf{x}?$ Draw it on the computational graph.
- What is $\nabla_\mathbf{w}?$ Draw it on the computational graph.
- Implement `dot_backward()`

### Hints

- Use the computational graph to guide your implementation

## Solution

![Dot Layer Backwawrd](images/Dot%20Layer%20Backward.png)

In [2]:
def dot_backward(x, w, da=None):
    """Perform the backward pass on a dot layer
    
    Parameters
    x : a numpy 2darray
    w : a numpy 2darray
    da : the gradient of the loss with respect to the output `a` of a Dot layer
    
    `x` and `w` are assumed to have the same shape. Think of `w` as
    a convolutional filter and `x` a little region in the image that
    is being convolved. Returns a dict of variable names to their
    gradients (e.g. {'x': dx, 'w', dw})
    
    """
    assert x.shape == w.shape
    da = 1 if da is None else da
    return {'x': w, 'w': x}

### Gradient-Checking Backward Pass

When writing code to compute gradients analytically, [numerical gradient checking with finite differences](http://cs231n.github.io/optimization-1/#numerical) is an indispensible way of debugging your code.

We've provided an api to perform gradient checking. To illustrate its use, consider the following example.

In [3]:
import numpy as np
from lib.checking import gradient_check

square = lambda x: x**2
grad_square = lambda x: {'x': 2*x}
x = np.array([2], dtype=np.float64)

gradient_check(forward_f=square, backward_f=grad_square, x=x)

ModuleNotFoundError: No module named 'lib'

`gradient_check()` performs gradient checking on a layer and requires

- A forward function `forward_f()`
- A backward function `backward_f()`
- A setting of the input(s) for checking the derivative.

In the example provided, we have $f(x) = x^2$, $\frac{\partial f}{\partial x}(x) = 2x$, and $x = 2$ for `forward_f()`, `backward_f()`, and `x`, respectively.

### Tasks

- Gradient check `dot_backward()` with `lib.checking.gradient_check()` 

### Hints

- You can use any `x` and `w` that conform to `dot_forward()`. I recommend `np.random.randn()`.
- Use the example for a reference

In [4]:
x = np.random.randn(3, 3)
w = np.random.randn(3, 3)

gradient_check(forward_f=dot_forward, backward_f=dot_backward, x=x, w=w)

NameError: name 'gradient_check' is not defined

# Convolutional Layers

### Implementing the Convolutional Forward Pass

Now that we've defined a Dot layer, let's use it to define a convolutional layer. A convolutional layer can be seen as sequentially applying Dot layers to the entire input image.

Let's implement the forward pass. As a reminder, a convolutional layer takes an image and a filter and *convolves* the filter with the image. A computational graph for a convolutional layer is depicted below:

![Conv Layer Forward](images/Conv%20Layer%20Forward.png)
For simplicity, you can assume the image only has a single channel and the filter is hard-coded to `3x3`.

### Tasks

- Implement `conv_forward()`

### Hints

- Use `dot_forward()` in your implementation

In [5]:
def conv_forward(X, w):
    """Perform a forward pass for a CNN layer
    
    Parameters
    ----------
    X : numpy 2darray with shape (N, N)
    w : numpy 2darray with shape (F, F)
    
    Assume that `w` is a 3x3 filter for simplicity.
    Returns a 2D tensor `A` with shape (N-F+1, N-F+1).
    
    """
    N, N = X.shape
    F, F = w.shape
    assert (F, F) == (3, 3)
    n = N-F + 1
    
    A = np.zeros([n, n])
    for i in range(n):
        for j in range(n):
            i_x, j_x = i+1, j+1
            i_start, i_end = i_x-1, i_x+1
            j_start, j_end = j_x-1, j_x+1
            
            x = X[i_start:i_end+1, j_start:j_end+1]
            A[i, j] = dot_forward(x, w)
    
    return A

In [13]:
X = np.array([[1,2],
             [3,2]])
w = np.array([[3,4,5],
             [3,5,2],
             [7,2,1]])

In [14]:
conv_forward(X,w)

array([], shape=(0, 0), dtype=float64)

## Forward Pass Verification

Let's compare your forward pass implementation against a reference implementation of 2D convolution which we know to be correct - `scipy.signal.correlate2d()`.

Why is it called `correlate2d()` and not `convole2d()`? Well, technically all along what we've been calling *convolution* is actually defined to be *correlation*, although the difference is not important.

From this point forward, we'll be using an example image from `mnist`, an image of a `4`.

In [6]:
from IPython.display import display
import numpy as np
from scipy import signal
from scipy.linalg import norm
from keras.datasets import mnist

[X_mnist, y], _ = mnist.load_data()
X = X_mnist[2].astype(np.float64)

A = conv_forward(X, w)
A_reference = signal.correlate2d(X, w, mode='valid')

print('Conv check passed!' if norm(A-A_reference, ord='fro') < 1e-5 else 'Conv check failed!')

Using TensorFlow backend.


Conv check passed!


## Uses for Convolution

Convolutional has many uses, including being a tool for sharpening or blurring an image. To see how this can be accomplished, there are some special choices for filters who, after being convolved with an input image, produce an activation map which, when plotted, achieves this effect. Let's explore the various activation maps produced by a few different filters.

### Tasks

Consider the filter
$
\mathbf{w_1} =
\begin{bmatrix}
    1 & 1 & 1 \\
    0 & 0 & 0 \\
    -1 & -1 & -1
\end{bmatrix}
$.

- What kind of image patch does the following filter respond most to?
- Execute the following code cell and inspect the resulting activation map. Were you right? Are you surprised by the result? Qualitatively, what image features is $\mathbf{w_1}$ looking for?

### Hints

- In the resulting activation map, black corresponds to low regions of excitement, grey to medium, and white to highest

In [7]:
from keras.preprocessing.image import array_to_img

w = np.array([[ 1,  1,  1],
              [ 0,  0,  0],
              [-1, -1, -1]], dtype=np.float)

img = array_to_img(np.expand_dims(X, axis=0), scale=False)
display(img.resize(size=[128, 128]))

A = signal.correlate2d(X, w, mode='valid')
img = array_to_img(np.expand_dims(A, axis=0))
img.resize(size=[128, 128])

ValueError: ('Unsupported channel number: ', 28)

### Tasks

Consider the filter
$
\mathbf{w_2} =
\begin{bmatrix}
    1 & 1 & 1 \\
    1 & 1 & 1 \\
    1 & 1 & 1
\end{bmatrix}.
$

- What kind of activation map will $\mathbf{X} \star \mathbf{w_2}$ produce? Qualitatively, what will its appearance be?
- Execute the following code cell and inspect the resulting activation map. Were you right? Are you surprised by the result?

In [8]:
from keras.preprocessing.image import array_to_img

w = np.array([[ 1,  1,  1],
              [ 1,  1,  1],
              [ 1,  1,  1]])

img = array_to_img(np.expand_dims(X, axis=0), scale=False)
display(img.resize(size=[128, 128]))

A = signal.correlate2d(X, w, mode='valid')
img = array_to_img(np.expand_dims(A, axis=0))
img.resize(size=[128, 128])

ValueError: ('Unsupported channel number: ', 28)

### Tasks

Consider the filter
$
\mathbf{w_3} =
\begin{bmatrix}
    1 & 1 & 1 \\
    1 & 1 & 1 \\
    1 & 1 & 1
\end{bmatrix}
$.

- What will the resulting image look like we take $(\mathbf{X} \star \mathbf{w_2}) + \frac{1}{9} (\mathbf{X} \star \mathbf{w_3}$)?
- Execute the following code cell and inspect the resulting activation map. Were you right? Are you surprised by the result?

In [9]:
from keras.preprocessing.image import array_to_img

w = np.array([[ 0,  0,  0],
              [ 0,  2,  0],
              [ 0,  0,  0]], dtype=np.float) - (1/9)*np.array([[1, 1, 1],
                                                               [1, 1, 1],
                                                               [1, 1, 1]])

img = array_to_img(np.expand_dims(X, axis=0), scale=False)
display(img.resize(size=[128, 128]))

A = signal.correlate2d(X, w, mode='valid')
img = array_to_img(np.expand_dims(A, axis=0))
img.resize(size=[128, 128])

ValueError: ('Unsupported channel number: ', 28)

## Implementing the Backward Pass

Now it's time to implement the backward pass. Once again, annotate the computational graph with the desired gradients. Note you do *not* have to solve for $\frac{\partial \mathbf{A}}{\partial \mathbf{X}}$ nor $\frac{\partial \mathbf{A}}{\partial \mathbf{w}}$ explicitly.

### Tasks

- Annotate the computational graph with expressions for $\nabla_\mathbf{X}$ and $\nabla_\mathbf{w}$
- Implement `conv_backward()`

### Hints

- Recall that a convolution operation is made up of repeated dots

![Conv Hint](images/Conv%20Hint.png)
- Recall the fact that when a parameter is used in multiple places in a computational graph, the chain rule tells us that its final gradient is the sum of its individual gradients

![Split](images/Split.png)
## Solution

![Split](images/Conv%20Layer%20Backward.png)

In [10]:
def conv_backward(X, w, dA=None):
    """Perform a backward pass for a CNN layer
    
    Parameters
    ----------
    X : numpy 2darray with shape (N, N)
    w : numpy 2darray with shape (F, F)
    dA : pre-padded numpy 2darray with shape (N-F+1, N-F+1)
    
    """
    N, N = X.shape
    F, F = w.shape
    dA = np.ones([N-F+1, N-F+1]) if dA is None else dA
    n, n = dA.shape
    assert n == N-F+1
    
    dX, dw = np.zeros_like(X), np.zeros_like(w)
    for i in range(n):
        for j in range(n):
            i_x, j_x = i+1, j+1            
            i_start, i_end = i_x-1, i_x+1+1
            j_start, j_end = j_x-1, j_x+1+1

            x = X[i_start:i_end, j_start:j_end]
            
            dX[i_start:i_end, j_start:j_end] += dot_backward(x, w)['x'] * dA[i, j]
            dw += dot_backward(x, w)['w']
    
    return {'X': dX, 'w': dw}

Now that you've implemented the forward and backward passes for a convolutional layer, ensure that you've implemented it correctly by performing gradient checking.

### Tasks

- Perform gradient checking on `conv_backward()`

In [None]:
from lib.checking import gradient_check

gradient_check(forward_f=conv_forward, backward_f=conv_backward, X=X, w=w)