# Goals of this Activity


## 1. Implement convolution
Given an n,m RGB image as a numpy array and a k x k x 3 convolution filter. Implement the
convolution operation with ‘same’ padding without using higher level functions like conv2d from tensorflow, or convolve from numpy.

In [5]:
import numpy as np

def convolve(image, kernel):
    # Get the input image shape
    img_h, img_w, img_d = image.shape
    # Get the kernel shape
    kern_h, kern_w = kernel.shape

    # Padding dimensions and then make a padded image by calculating and zeroing the padded area
    # (k - 1) / 2 from the lecture, just use floor division
    pad_h = (kern_h - 1) // 2 #kern_h // 2
    pad_w = (kern_w - 1) // 2 #kern_w // 2
    padded_img = np.pad(image, ((pad_h, pad_h), (pad_w, pad_w), (0, 0)), mode='constant')

    # Create the output shape, in same padding we need to make sure it's the same size
    out_h = img_h
    out_w = img_w
    # Zero the output image shape
    out = np.zeros((out_h, out_w, img_d))

    # Do some matrix multiplication by the kernel while moving across the image
    for i in range(out_h):
      for j in range(out_w):
        for d in range(img_d):
          # Output pixels are summed as padded image (i, j) multiplied by the kernel
          out[i, j, d] = np.sum(padded_img[i: i+kern_h, j: j+kern_w, d] * kernel)

    return out

im = np.array(
    [[[0, 9, 6], [4, 4, 0], [5, 4, 9], [0, 1, 1]],
     [[1, 9, 8], [9, 3, 8], [6, 6, 1], [2, 8, 5]],
     [[5, 2, 6], [9, 9, 7], [8, 4, 9], [9, 3, 0]],
     [[7, 5, 7], [3, 4, 7], [5, 4, 1], [8, 3, 0]]])
kernel = np.array(
     [[1, 0, -1],
      [1, 0, -1],
      [1, 0, -1]])

result = convolve(im, kernel)
print(result)

[[[-13.  -7.  -8.]
  [-10.   8.   4.]
  [ 11.  -2.   2.]
  [ 11.  10.  10.]]

 [[-22. -16. -15.]
  [-13.   6.   1.]
  [ 11.   4.   9.]
  [ 19.  14.  19.]]

 [[-21. -16. -22.]
  [ -6.   2.  10.]
  [  2.   2.  17.]
  [ 19.  14.  11.]]

 [[-12. -13. -14.]
  [ -1.  -1.   3.]
  [ -5.   7.  14.]
  [ 13.   8.  10.]]]


## 2. Implement max_pool
Given a grayscale image as a numpy array. We are also given a k by k max pooling filter with a stride of 1. Implement the max_pool operation without using higher level functions like block_reduce.

In [6]:
import numpy as np

def max_pool(image, k):
    # Get input shape
    img_h, img_w = image.shape #img_d unused

    # Output shape has stride 1
    # d - k + 1 from the slides
    out_h = img_h - k + 1
    out_w = img_w - k + 1
    out = np.zeros((out_h, out_w))

    # Get max value from each region by patching kernel
    for i in range(out_h):
      for j in range(out_w):
        # multiply each pixel by the kernel
        patch = image[i: i + k, j: j + k]
        out[i, j] = np.max(patch)
    return out

im = np.array([[1, 2, 0, 0], [5, 3, 0, 4], [0, 0, 0, 7], [9, 3, 0, 0]])
kernel_size = 2

result = max_pool(im, kernel_size)
print(result)

[[5. 3. 4.]
 [5. 3. 7.]
 [9. 3. 7.]]


# Analysis

1. Question 1 is simple in concept - we take the kernel and use it to calculate the size of the padded image using the formula from the lecture slides (you can in essence use any formula that works and won't throw an error here, so you can tweak things). We generate the output image and iterate over it sequentially which unfortunately is O(N*M*D) but D is constant with 3 channel RGB. There isn't a model per se, just a transformation happening. the output has the same size as the input but has different values for the "pixels."

2. Question 2 - similar to 1, but with a patch and we use the max function instead. This results in a "pooled" output (max) of the input values. The result is a pooled smaller matrix of "pixels."

3. You can do a lot with the functions, there's lots of tweaks you can do with stride and other mathematical transforms.


In [None]:
%%shell

jupyter nbconvert --to html /content/Lab03.ipynb