# A look inside convolutional neural networks

This will be a look inside convolutional neural networks. We'll be building them from scratch.

In [1]:
from neural_net import conv_helpers

In [2]:
conv_helpers.hello_conv()

Hello convolution!


In [None]:
class Conv(Layer, ParamMixin):
    def __init__(self, n_feats, filter_shape, strides, weight_scale,
                 weight_decay=0.0, padding_mode='same', border_mode='nearest'):
        self.n_feats = n_feats
        self.filter_shape = filter_shape
        self.strides = strides
        self.weight_scale = weight_scale
        self.weight_decay = weight_decay
        self.padding_mode = padding_mode
        self.border_mode = border_mode

    def _setup(self, input_shape, rng):
        n_channels = input_shape[1]
        W_shape = (n_channels, self.n_feats) + self.filter_shape
        self.W = rng.normal(size=W_shape, scale=self.weight_scale)
        self.b = np.zeros(self.n_feats)

    def fprop(self, input):
        self.last_input = input
        self.last_input_shape = input.shape
        convout = np.empty(self.output_shape(input.shape))
        convout = conv_bc01(input, self.W, convout)
        return convout + self.b[np.newaxis, :, np.newaxis, np.newaxis]

    def bprop(self, output_grad):
        input_grad = np.empty(self.last_input_shape)
        self.dW = np.empty(self.W.shape)
        input_grad, self.dW = bprop_conv_bc01(self.last_input, output_grad,
                                              self.W, input_grad, self.dW)
        n_imgs = output_grad.shape[0]
        self.db = np.sum(output_grad, axis=(0, 2, 3)) / (n_imgs)
        self.dW -= self.weight_decay*self.W
        return input_grad

    def params(self):
        return self.W, self.b

    def param_incs(self):
        return self.dW, self.db

    def param_grads(self):
        # undo weight decay
        gW = self.dW+self.weight_decay*self.W
        return gW, self.db

    def output_shape(self, input_shape):
        if self.padding_mode == 'same':
            h = input_shape[2]
            w = input_shape[3]
        elif self.padding_mode == 'full':
            h = input_shape[2]-self.filter_shape[1]+1
            w = input_shape[3]-self.filter_shape[2]+1
        else:
            h = input_shape[2]+self.filter_shape[1]-1
            w = input_shape[3]+self.filter_shape[2]-1
        shape = (input_shape[0], self.n_feats, h, w)
        return shape


class Pool(Layer):
    def __init__(self, pool_shape=(3, 3), strides=(1, 1), mode='max'):
        self.mode = mode
        self.pool_h, self.pool_w = pool_shape
        self.stride_y, self.stride_x = strides

    def fprop(self, input):
        self.last_input_shape = input.shape
        self.last_switches = np.empty(self.output_shape(input.shape)+(2,),
                                      dtype=np.int)
        poolout = np.empty(self.output_shape(input.shape))
        poolout, switches = pool_bc01(input, poolout, self.last_switches, self.pool_h, self.pool_w,
                  self.stride_y, self.stride_x)
        self.last_switches = switches
        return poolout

    def bprop(self, output_grad):
        input_grad = np.empty(self.last_input_shape)
        input_grad = bprop_pool_bc01(output_grad, self.last_switches, input_grad)
        return input_grad

    def output_shape(self, input_shape):
        shape = (input_shape[0],
                 input_shape[1],
                 input_shape[2]//self.stride_y,
                 input_shape[3]//self.stride_x)
        return shape

# Notes

How are layers in a convolutional neural network connected?

If we start off with an image that is 32 x 32 x 3 - color images - how can we change this to a convolutional layer?

We've all see diagrams like this:

<img src="http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/">

How does the convolution actually happen?

Let's consider an individual 3 x 3 filter that is slid over an image. How will this filter be connected to the next layer?

The top left corner of the filter will be multiplied by the appropriate range of pixels in the output image (perhaps excluding the bottom right). 

In addition, if the input has three channels, the filter will be multiplied by each channel of each part of the image. 

The gradient that the filters will receive is of shape:

[1, 12, 28, 28]

Essentially, 12 channels.

The images being multiplied by these filters are of shape:

[1, 1 (of 3), 28, 28]

For the filters, the element:

[1 (of 3), 1 (of 12), 1, 1]

contributes to certain outputs in the next layer that are all the places where a pixel in the image was multiplied by a this particular filter value to get a particular value in the output. For image size of 28x28 and stride of 5, this would correspond to roughly a 24 x 24 patch. 

So, _the sum takes place for a given input channel and output channel_, across all the image locations.

For images, let's consider a pixel in input channel 1, and where affects the next layer. It is going to affect all 12 of the filters - and again, affect them based on the components of the filter that it has been multiplied by.

An insight: even though the channels differ in the convout, a given filter location and image location maps to one particular convout location. 

The converse is not true: a given convout location is mapped to by several filter-image location combinations. 

So, the overall strategy will be:
1. Fix a convout location (including the channel_out)
2. Find all the filter x and y locations that map to that (will usually be all, unless we're on an edge)
3. Find the image locations that correspond to that convout location (these should correspond to the filter locations)
4. Update the _filter gradient_ by looping over the _image locations_ that map to this convout location.
5. Update the _image gradient_ by looping over the _filter locations_ that map to this convout location. 
6. Do steps 4 and 5 for each input image channel.

## Next steps:

So, getting the filter gradient, and the image gradient, and the convout gradient to line up correctly involves looping over:
1. Fix a location in the convout:
2. 